errors in shepherd service constructors are not logged and lead to misleading status

Florian Dold wrote on 3 Jan 2019 22:36

Recipients:(address . bug-guix@gnu.org)

Message-ID:7c7f7030-a0f2-5fd0-7d02-f203277d7bba@gmail.com

Hi Guix,

when defining a service type that extends shepherd-root-service-type and

the 'start' function of the shepherd-service definition contains an

error, the error is silently ignored. No log output is generated at all.

For example (full system definition is attached):

(define (errtest-shepherd-service c)

(list

(shepherd-service

(provision '(errtest))

(documentation "Errtest")

(requirement '(file-systems))

(modules `((shepherd support) (ice-9 match) ,@%default-modules))

(start #~(lambda args

(local-output "errtest start")

this-is-an-unbound-variable

(local-output "errtest end")

#t)))))

The log message "errtest start" appears in /var/log/messages, as

expected. The next line contains an error, and aborts execution of the

start function.

The error only becomes apparent when manually doing a "herd restart

errtest", which shows an error message (but without any error location

or stack trace). But the error (regarding the unbound variable) is not

logged, and there is no indication in the log that the service couldn't

be started in any log.

Furthermore the "herd status" of a service that encountered an error in

the start function is very misleading:

root@errtest ~# herd status errtest

Status of errtest:

It is stopped.

It is enabled.

Provides (errtest).

Requires (file-systems).

Conflicts with ().

Will be respawned.

It shows "Will be respawned", which is wrong.

I'd be happy to work on a patch, but it seems like there is some design

discussion necessary, in particular how the "Will be respawned" should

be handled. Services have a "respawn?" flag, but of course respawning

can only work if the start function executed successfully (and only the

service process itself failed) in the first place.

I generally feel like the state machine for services needs some work.

In particular, it would be useful to distinguish between "failed" and

"completed" services instead of conflating both states into "stopped".

Or maybe have some more general mechanism for storing state about the

service, instead of just the slot that usually contains the PID?

- Florian

(use-modules (gnu)) (use-service-modules networking ssh shepherd) (define (errtest-shepherd-service c) (list (shepherd-service (provision '(errtest)) (documentation "Errtest") (requirement '(file-systems)) (modules `((shepherd support) (ice-9 match) ,@%default-modules)) (start #~(lambda args (local-output "errtest start") this-is-an-unbound-variable (local-output "errtest end") #t))))) (define errtest-service-type (service-type (name 'errtest) (extensions (list (service-extension shepherd-root-service-type errtest-shepherd-service))) (default-value #t))) (operating-system (host-name "errtest") (timezone "Europe/Berlin") (locale "en_US.utf8") (kernel-arguments (list "console=ttyS0" "console=tty0")) (bootloader (bootloader-configuration (bootloader grub-bootloader) (target "/dev/sdX"))) (file-systems (cons (file-system (device (file-system-label "my-root")) (mount-point "/") (type "ext4")) %base-file-systems)) (services (cons* (service errtest-service-type) %base-services)))

Maxim Cournoyer wrote on 20 May 2021 05:00

control message for bug #33968

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87o8d6gldu.fsf@gmail.com

severity 33968 important

quit

Maxim Cournoyer wrote on 20 May 2021 05:00

Re: bug#33968: errors in shepherd service constructors are not logged and lead to misleading status, hang boot

Recipients:(name . Florian Dold)(address . florian.dold@gmail.com)(address . 33968@debbugs.gnu.org)

Message-ID:87mtsqglcn.fsf@gmail.com

Hi Florian,

I stumbled upon this problem with https://issues.guix.gnu.org/48521,and

had a hard time to debug it (due to the completely missing information

in any output or log from shepherd).

Worst, this caused the system to hang early on boot!

I'm raising the priority of this issue.

Thanks,

Maxim

Ludovic Courtès wrote on 15 Jun 2023 23:15

Re: bug#33968: errors in shepherd service constructors are not logged and lead to misleading status

Recipients:(address . 33968-done@debbugs.gnu.org)

Message-ID:87y1kktojy.fsf@gnu.org

Florian Dold <florian.dold@gmail.com> skribis:

Toggle quote (4 lines)

> when defining a service type that extends shepherd-root-service-type and

> the 'start' function of the shepherd-service definition contains an

> error, the error is silently ignored. No log output is generated at all.

[...]

Toggle quote (6 lines)

> I generally feel like the state machine for services needs some work.

> In particular, it would be useful to distinguish between "failed" and

> "completed" services instead of conflating both states into "stopped".

> Or maybe have some more general mechanism for storing state about the

> service, instead of just the slot that usually contains the PID?

It’s been 4 years (!) but the good news is that all this is fixed as of

Shepherd 0.10. Closing!

Ludo’.

Closed

Your comment

This issue is archived.

To comment on this conversation send an email to 33968@debbugs.gnu.org

is:open	open issues
is:done	closed issues
submitter:<who>	search issue submitter
author:<who>	search by message author
date:yesterday..now	search by issue date
mdate:3m..2d	search by message date