[PATCH shepherd] Allow replacement of services

  • Done
  • quality assurance status badge
Details
2 participants
  • Carlo Zancanaro
  • Ludovic Courtès
Owner
unassigned
Submitted by
Carlo Zancanaro
Severity
normal
C
C
Carlo Zancanaro wrote on 9 Aug 2018 14:42
(address . guix-patches@gnu.org)
87wosz4spx.fsf@zancanaro.id.au
This is a relatively simple patch adding a replacement slot to
services in the Shepherd. When stopping a service, the replacement
slot is checked and, if it has a value, is used to upgrade the
current service.

I've chosen to modify the existing service, rather than creating a
new one, but that was mostly because it was easier for me to
implement quickly, and I didn't have a huge amount of time.

I'm hopeful that this, or something like it, can be used by GuixSD
to allow people to restart services after reconfiguring without
rebooting (or remembering to stop, reconfigure, start).

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAltsNpoACgkQqdyPv9aw
IbzJ3A//TD+Z+9GuR+bSoRxFTS5wrMfQ0hHANj22sEPQQk8xaYSVKuPMHkPe4+RB
z+ECmxYKtQRUB5b4pJ/cg8G8ynvOW9ZAbKL1pGpIMfuyyHmCv8XdbugzSrphSLFD
Pq34HfTPWKPN9JuSeXZ445h5vVJ4i2XP0bwHU8l3by5cvOJesupBge1lvGYwnBhc
HIvfB0r+Cm7g/3m5jHF8k/mImHZpYANzKRHL9pRSFJz4ewk/+8i5wpIVrpm5KPFX
yX2c3wvxa4BRsyKPGVhJ0v34ltLT1BGeUHQhHDju3Iilkt3r/b3+DmWMiSVZ+D8v
Gn3CjPH1bhmTx6Ls3Kj7kZMxyhdam5S3Zl+TERQsYS0Kp5jZNvqY8fbk6LEcS9tX
o6B5WIo4oKnqA7PmOwMg5GzX+9BkLVY6CGnMkIAhdKWANjSYMBizZ94DA3ZmrImA
6/nrHsNCzcjhg2wyhS7cOtMeis+xWMP7CMO0IASjikpq6e3YWJU3vSqGJKZwInyZ
xZGBp45+v8wiiLDcc9q1+k3rSyLbAPc4UE4h8CQ0FBeA93Z12pCiZ2W2l+BkK9L+
S1kaZ3iLBu1qFPyzKT6nRvREawdgbwD3BeZoo2jqX2PF4HlstAH9noM/n2GfiL0v
m0QKotRUXcf97aCBhQHs2yRVMt2oJz936Q44zJ+yQqBeu7CyCb0=
=2Nj0
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Aug 2018 22:33
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 32408@debbugs.gnu.org)
87h8jobwwp.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (13 lines)
> This is a relatively simple patch adding a replacement slot to
> services in the Shepherd. When stopping a service, the replacement
> slot is checked and, if it has a value, is used to upgrade the current
> service.
>
> I've chosen to modify the existing service, rather than creating a new
> one, but that was mostly because it was easier for me to implement
> quickly, and I didn't have a huge amount of time.
>
> I'm hopeful that this, or something like it, can be used by GuixSD to
> allow people to restart services after reconfiguring without rebooting
> (or remembering to stop, reconfigure, start).

Awesome! This is exactly what we need to address the problem.

We’ll still want to be able to special-case things like nginx that can
be “hot-replaced”, though. So perhaps, in addition to this patch on the
Shepherd side, we’ll need extra stuff in (gnu services shepherd).

For instance, the ‘actions’ field of <shepherd-service> could, by
default, include an “upgrade” action that simply sets the ‘replacement’
slot. For nginx, we’d provide a custom “upgrade” action that does
“nginx -s restart” or whatever it is that needs to be done.

‘guix system reconfigure’ would automatically invoke the ‘upgrade’
action for each new service.

WDYT?

Toggle quote (13 lines)
> From e03290041c91813f1a301c7e9c4dbb9ee768b400 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Thu, 9 Aug 2018 22:30:38 +1000
> Subject: [PATCH] service: Add a replacement slot for delayed service
> replacement.
>
> * modules/shepherd/service.scm (<service>): Add replacement slot
> (replace-service): New procedure.
> (stop): Call replace-service after stopping a service.
> * tests/replacement.sh: Add a test for it.
> * Makefile.am (TESTS): Add the new test.
> * doc/shepherd.texi (Slots of services): Document it.

Overall LGTM. Some comments:

Toggle quote (2 lines)
> +(define (replace-service service)

Please add a docstring.

Toggle quote (14 lines)
> + (let ((replacement (slot-ref service 'replacement)))
> + (define (copy-slot! slot)
> + (slot-set! service slot (slot-ref replacement slot)))
> + (when replacement
> + (copy-slot! 'provides)
> + (copy-slot! 'requires)
> + (copy-slot! 'respawn?)
> + (copy-slot! 'start)
> + (copy-slot! 'stop)
> + (copy-slot! 'actions)
> + (copy-slot! 'running)
> + (copy-slot! 'docstring))
> + service))

Having a hardcoded list of slots sounds error-prone—surely we’ll forget
to update it down the road. I wonder what else could be done.

One option would be to grab the block asyncs and atomically replace the
service in the ‘%services’ hash table. Then we only need to copy the
‘last-respawns’ slot to the new service, I believe. (This changes the
object identity of the service but I think its OK.)

Another option would be to use GOOPS tricks to iterate over the list of
slots and have a list of slots *not* to copy. I’m not a big fan of this
option, though.

Toggle quote (1 lines)
> +cat - > "$rconf"<<EOF
^
You can remove the dash.

Toggle quote (4 lines)
> +(let ((service (lookup-running 'test)))
> + (slot-set! service 'replacement
> + (make <service>

I wonder if we should let users fiddle with ‘replacement’ directly, or
if we should provide a higher-level construct.

For instance, ‘register-services’ could transparently set the
‘replacement’ field for services already registered instead of doing:

(assert (null? (lookup-services (canonical-name new))))

Not sure if there are cases where this behavior would be undesirable,
though.

Thoughts?

Toggle quote (2 lines)
> +if test `cat $stamp` != "Goodbye"; then

Nitpick: "`cat $stamp`".

Thanks!

Ludo’.
C
C
Carlo Zancanaro wrote on 20 Aug 2018 23:16
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 32408@debbugs.gnu.org)
87k1okzqkf.fsf@zancanaro.id.au
Hey Ludo,

On Tue, Aug 21 2018, Ludovic Courtès wrote:
Toggle quote (5 lines)
> We’ll still want to be able to special-case things like nginx
> that can be “hot-replaced”, though. So perhaps, in addition to
> this patch on the Shepherd side, we’ll need extra stuff in (gnu
> services shepherd).

Yeah, if we expose the replacement field directly, then we'll need
some supporting code in (gnu services shepherd), even if it's just
to detect whether the service is stopped or not before doing a
replacement. Although ideally our interface wouldn't introduce
race conditions like that. (See below for more thoughts on this.)

Toggle quote (11 lines)
> For instance, the ‘actions’ field of <shepherd-service> could,
> by default, include an “upgrade” action that simply sets the
> ‘replacement’ slot. For nginx, we’d provide a custom “upgrade”
> action that does “nginx -s restart” or whatever it is that needs
> to be done.
>
> ‘guix system reconfigure’ would automatically invoke the
> ‘upgrade’ action for each new service.
>
> WDYT?

How many services can we meaningfully upgrade like this? My
understanding is that most of our system services a fed an
immutable configuration file, and thus restarting/reloading won't
actually upgrade them. In order to make an upgrade action work the
service would have to mutate itself into a new correct
configuration, as well as restarting/reloading the underlying
daemon. It's even trickier if the daemon itself has been upgraded,
because then the process will have to be restarted anyway.

At any rate, I think the replacement mechanism (this patch) is
just one way that a service can be reloaded. It would probably be
a good idea to create a higher-level abstraction over it. I think
other mechanisms (like a upgrade/reload action) should be handled
on the Guix side of things.

Toggle quote (28 lines)
>> + (let ((replacement (slot-ref service 'replacement)))
>> + (define (copy-slot! slot)
>> + (slot-set! service slot (slot-ref replacement slot)))
>> + (when replacement
>> + (copy-slot! 'provides)
>> + (copy-slot! 'requires)
>> + (copy-slot! 'respawn?)
>> + (copy-slot! 'start)
>> + (copy-slot! 'stop)
>> + (copy-slot! 'actions)
>> + (copy-slot! 'running)
>> + (copy-slot! 'docstring))
>> + service))
>
> Having a hardcoded list of slots sounds error-prone—surely we’ll
> forget to update it down the road. I wonder what else could be
> done.
>
> One option would be to grab the block asyncs and atomically
> replace the service in the ‘%services’ hash table. Then we only
> need to copy the ‘last-respawns’ slot to the new service, I
> believe. (This changes the object identity of the service but I
> think its OK.)
>
> Another option would be to use GOOPS tricks to iterate over the
> list of slots and have a list of slots *not* to copy. I’m not a
> big fan of this option, though.

My favourite option for this would be to separate the <service>
object into an immutable <service> and a mutable <service-state>.
The <service-state> object would have a reference to a <service>
object in order to invoke actions on it, and it could also hold a
second <service> object as a replacement. Then the swap would be
much more straightforward. I haven't done any real work towards
this, though.

In the short term, I'd rather replace it in the %services hash
table. I did it by copying slots because I wasn't sure I would get
the details of the swap right and didn't have time to properly
work out how to do it. I'll give it a go!

Toggle quote (18 lines)
>> +(let ((service (lookup-running 'test)))
>> + (slot-set! service 'replacement
>> + (make <service>
>
> I wonder if we should let users fiddle with ‘replacement’
> directly, or if we should provide a higher-level construct.
>
> For instance, ‘register-services’ could transparently set the
> ‘replacement’ field for services already registered instead of
> doing:
>
> (assert (null? (lookup-services (canonical-name new))))
>
> Not sure if there are cases where this behavior would be
> undesirable, though.
>
> Thoughts?

With this current patch the replacement field is only checked at
the point when the service is stopped, so the field could only be
set when the service is actually running. I think it makes the
most sense to just replace the service directly if it's not
stopped.

I can't think of any undesirable cases, but having a higher-level
interface is a good idea. At the very least we need to control the
inherent race condition involved in (if running? do-x do-y) for if
the service is stopped after the running? check. At the moment I
think the only thing we have to worry about there is signals, but
if we're going to move to have more parallelism through fibers
then we might need to be even more careful.

I'll try to send through an updated patch later this week.

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlt7L8AACgkQqdyPv9aw
IbyKtw//X2xkrVY9HAtGQ7ROk1k7PMGAEFpCJyxEFbcMh3BkbmJxbhdX5Gov1+3M
U67UX/t2D239Q07W28FPw5psm2PPowSZMx+QIcl+CA2H3efPuRmgdXlAPuMzn8gn
CZxLcDoS9lhB1vFGHIG6AkHVzGFj6OB2HhDNG/id/KyumUOPMRrqrNDTtt9+60pB
X/At8EpNp1QGtgq4ZOVXvmEufDX4TWzSdAFYh82AiiTuZZFeDhqZ7zDEk8jB/ng9
MPtyt8RdWfnqmxgzwHWf/Dniiow8nHw9Z9mYX+P4g7/6ZRWvDTLeGFKn5fjG9fVz
Ntg71O7DDyuM7bm24R2/JW7cN1Qv9gP4AYOAfrYxJgzNntNmnOVniZyFRS+dtm0p
z6iF/LqsDkit9eRnWW5K8MuEu9AwtV8HTANNmGgOnU4Oc7soVzBJnbvJYuxTPEm7
rfliFke/HNGBnvSfoXa5JzGPYJz8wVlnMdUc7CvATVobUqty4sJ4NbSLQDxNDlSE
9uTZH5irxE5E2RY1aPgsl+4uU1KbMw+rah5d40pe0ddVu1WI0llSmij0IpJTaFm9
+IzMoaiJhrk0ceocs8aEcmx9gjl09ELcUiEIvoorz/Z9NMdLLJ04eJHlWNXgTgu7
QA37arySGg6zxnQOiExJC+/B6l+7hw341+pG+YDrA6gBFIpp9ys=
=5vP+
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 21 Aug 2018 12:27
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 32408@debbugs.gnu.org)
87in44ovzm.fsf@gnu.org
Hey Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (14 lines)
> On Tue, Aug 21 2018, Ludovic Courtès wrote:
>> For instance, the ‘actions’ field of <shepherd-service> could, by
>> default, include an “upgrade” action that simply sets the
>> ‘replacement’ slot. For nginx, we’d provide a custom “upgrade”
>> action that does “nginx -s restart” or whatever it is that needs to
>> be done.
>>
>> ‘guix system reconfigure’ would automatically invoke the ‘upgrade’
>> action for each new service.
>>
>> WDYT?
>
> How many services can we meaningfully upgrade like this?

Probably very few.

Toggle quote (8 lines)
> My understanding is that most of our system services a fed an immutable
> configuration file, and thus restarting/reloading won't actually
> upgrade them. In order to make an upgrade action work the service
> would have to mutate itself into a new correct configuration, as well
> as restarting/reloading the underlying daemon. It's even trickier if
> the daemon itself has been upgraded, because then the process will
> have to be restarted anyway.

Yeah.

Toggle quote (6 lines)
> At any rate, I think the replacement mechanism (this patch) is just
> one way that a service can be reloaded. It would probably be a good
> idea to create a higher-level abstraction over it. I think other
> mechanisms (like a upgrade/reload action) should be handled on the
> Guix side of things.

Sounds good.

Toggle quote (34 lines)
>>> + (let ((replacement (slot-ref service 'replacement)))
>>> + (define (copy-slot! slot)
>>> + (slot-set! service slot (slot-ref replacement slot)))
>>> + (when replacement
>>> + (copy-slot! 'provides)
>>> + (copy-slot! 'requires)
>>> + (copy-slot! 'respawn?)
>>> + (copy-slot! 'start)
>>> + (copy-slot! 'stop)
>>> + (copy-slot! 'actions)
>>> + (copy-slot! 'running)
>>> + (copy-slot! 'docstring))
>>> + service))
>>
>> Having a hardcoded list of slots sounds error-prone—surely we’ll
>> forget to update it down the road. I wonder what else could be
>> done.
>>
>> One option would be to grab the block asyncs and atomically replace
>> the service in the ‘%services’ hash table. Then we only need to
>> copy the ‘last-respawns’ slot to the new service, I believe. (This
>> changes the object identity of the service but I think its OK.)
>>
>> Another option would be to use GOOPS tricks to iterate over the list
>> of slots and have a list of slots *not* to copy. I’m not a big fan
>> of this option, though.
>
> My favourite option for this would be to separate the <service> object
> into an immutable <service> and a mutable <service-state>. The
> <service-state> object would have a reference to a <service> object in
> order to invoke actions on it, and it could also hold a second
> <service> object as a replacement. Then the swap would be much more
> straightforward. I haven't done any real work towards this, though.

I agree that separating state is the right approach longer-term (it’s
beyond the scope of this patch.)

Toggle quote (5 lines)
> In the short term, I'd rather replace it in the %services hash
> table. I did it by copying slots because I wasn't sure I would get the
> details of the swap right and didn't have time to properly work out
> how to do it. I'll give it a go!

Alright!

Toggle quote (26 lines)
>>> +(let ((service (lookup-running 'test)))
>>> + (slot-set! service 'replacement
>>> + (make <service>
>>
>> I wonder if we should let users fiddle with ‘replacement’ directly,
>> or if we should provide a higher-level construct.
>>
>> For instance, ‘register-services’ could transparently set the
>> ‘replacement’ field for services already registered instead of
>> doing:
>>
>> (assert (null? (lookup-services (canonical-name new))))
>>
>> Not sure if there are cases where this behavior would be
>> undesirable, though.
>>
>> Thoughts?
>
> With this current patch the replacement field is only checked at the
> point when the service is stopped, so the field could only be set when
> the service is actually running. I think it makes the most sense to
> just replace the service directly if it's not stopped.
>
> I can't think of any undesirable cases, but having a higher-level
> interface is a good idea.

Right. Currently the Guix side of things does the equivalent of:

herd eval root '(register-services (load "a.scm") (load "b.scm"))'

for services currently stopped, which is okay but not great. IWBN if it
could directly use a higher-level action.

Toggle quote (7 lines)
> At the very least we need to control the inherent race condition
> involved in (if running? do-x do-y) for if the service is stopped
> after the running? check. At the moment I think the only thing we have
> to worry about there is signals, but if we're going to move to have
> more parallelism through fibers then we might need to be even more
> careful.

Indeed.

Thank you,
Ludo’.
C
C
Carlo Zancanaro wrote on 23 Aug 2018 15:45
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 32408@debbugs.gnu.org)
87o9dtqjry.fsf@zancanaro.id.au
Hey Ludo’,

I've attached an updated patch. I couldn't think of any unwanted
consequences, so I took your idea of making register-services
handle most of the details of replacement. With my patch,
something like
Toggle quote (2 lines)
> herd eval root '(register-services (load "a.scm") (load
> "b.scm"))'
will deal with a conflict by either replacing the old service (if
it's not running), arranging for the old service to be replaced
when it's stopped, or raising an error. This seems like a sensible
way for things to function.

Toggle quote (5 lines)
>> At the very least we need to control the inherent race
>> condition [...]
>
> Indeed.

Despite my desire to deal with the race condition, I haven't done
anything about it in this patch. The modification of %services
that was done in register-services was already racy, and I don't
think this patch will make it worse. If it hasn't been a problem
up until now, then I don't think this will make it a problem.

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlt+umEACgkQqdyPv9aw
Ibz9VQ/8CcWPyF3dAAxTQVS20gJs/b3NpTgQdTf4Vj+p9PTDrzY9NJgh2HMyvZJ4
OvoEZEzS7LfRV3InERRwJbEH8npN1PUrJz196EAD5AVwAynDHrwe/KwZDqk38ScP
BIEpcOr34Jzn/CZ2c+CnXyYZsXf7omJtcP6LHLdq2yBW/KkKB5UagAxNjNzf/wPZ
qzoZ1zb4XJ+tMzCaupUm+J4OyVgnEjqCvWy96L47QSD3TP9dl4701oJaZ7Vvzsm9
08+8ikDJ71ZKWhQ9PniKLj9pWrKibcBMMZ9PTY6zw69NJ2DHOH1AONWPzCmRKiy3
c776j3vkFXhQ2SUyprphzF7CfZ7OEgH2xwgy2TAphwNPZtHtamngQcR6MLECOQIP
fKlGzxYdQ44u8TLCKkeR/ngMrJLMy5FuCymKuMqO3NUOHopp8qnK5qiUnp8RcaFV
LFYOaWcugGqt0eugKTaGnHap5UtMVMXukmoQcLAzv6fBYYiyFOnFvXxCKOi0VXw0
6b+UDmjs2EuC5lmx/mMf3HlS3FxvyDfo41PdPoeKPXQBfDRzVepapBb+4nw6lqJj
WLu+h6YsYrwWJxJx+9vP9mmGCTLDCXIiEfCmry+pkZsMndC+d0Tt0X/xYd6p3yHh
bOW7NbNUG0/C3aYVjV62nIYTWSdpVQ3X14hd4OVYuAtYbrV7Vuk=
=yrpC
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 25 Aug 2018 16:20
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 32408@debbugs.gnu.org)
877ekecyu8.fsf@gnu.org
Hello!

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (10 lines)
> I've attached an updated patch. I couldn't think of any unwanted
> consequences, so I took your idea of making register-services handle
> most of the details of replacement. With my patch, something like
>> herd eval root '(register-services (load "a.scm") (load
>> "b.scm"))'
> will deal with a conflict by either replacing the old service (if it's
> not running), arranging for the old service to be replaced when it's
> stopped, or raising an error. This seems like a sensible way for
> things to function.

Awesome. Thus, the only thing we need to change in ‘guix system
reconfigure’ is to make the ‘register-services’ call unconditional
(currently it’s limited to services that are stopped.)

Toggle quote (6 lines)
> Despite my desire to deal with the race condition, I haven't done
> anything about it in this patch. The modification of %services that
> was done in register-services was already racy, and I don't think this
> patch will make it worse. If it hasn't been a problem up until now,
> then I don't think this will make it a problem.

Yeah, sounds reasonable.

Toggle quote (15 lines)
> From 9ec5c0000e9a45441417a6ee4138cdcbf1b1f2b2 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Thu, 9 Aug 2018 22:30:38 +1000
> Subject: [PATCH] service: Add a replacement slot for delayed service
> replacement.
>
> * modules/shepherd/service.scm (<service>): Add replacement slot
> (replace-service): New procedure.
> (stop): Call replace-service after stopping a service.
> (register-services): Replace existing services where possible, setting the new
> replacement slot if they are currently running.
> * tests/replacement.sh: Add a test for it.
> * Makefile.am (TESTS): Add the new test.
> * doc/shepherd.texi (Slots of services): Document it.

Awesome, please push! And sorry about the delay.

Thank you,
Ludo’.
L
L
Ludovic Courtès wrote on 2 Oct 2018 11:50
control message for bug #32408
(address . control@debbugs.gnu.org)
87r2h8r82r.fsf@gnu.org
tags 32408 fixed
close 32408
?