Shepherd 0.9 not cleanly unmounting root

  • Done
  • quality assurance status badge
Details
2 participants
  • angry rectangle
  • Ludovic Courtès
Owner
unassigned
Submitted by
angry rectangle
Severity
important
Merged with
A
A
angry rectangle wrote on 25 Jun 2022 07:27
(address . bug-guix@gnu.org)
87a6a1fgpn.fsf@cock.li
Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
I assume this means the filesystem wasn't cleanly unmounted.
I am doing a proper shutdown, using either "reboot" or "halt."

I've attached the minimal config I've been using.
It's nothing special other than encrypted root.
I'm using an SSD with a gpt partition table.
No custom packages or external channels were used when configuring the system.

This is for my desktop computer, but I have the exact same problem with a similar minimal config on my laptop.
Mostly the same sitution there with an SSD, gpt table, and encrypted root.

The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Thanks
A
A
angry rectangle wrote on 25 Jun 2022 07:42
(address . 56209@debbugs.gnu.org)
875ykpfgbt.fsf@cock.li
forgot the attachment.
(use-modules (gnu) (gnu system) (gnu bootloader grub) (gnu packages dns)) (use-package-modules linux certs) (use-service-modules networking) (operating-system (kernel linux-libre) (host-name "myhostname") (timezone "America/New_York") (locale "en_US.utf8") (bootloader (bootloader-configuration (bootloader grub-bootloader) (target #f))) (mapped-devices (list (mapped-device (source (uuid "my--uuid--goes--here")) (target "cryptroot") (type luks-device-mapping)))) (file-systems (append (list (file-system (device "/dev/mapper/cryptroot") (mount-point "/") (type "ext4") (dependencies mapped-devices))) %base-file-systems)) (sudoers-file (plain-file "sudoers" "\ root ALL=(ALL) ALL %wheel ALL=(ALL) NOPASSWD:ALL\n")) (users (cons (user-account (name "angry") (group "users") (password (crypt "a" "$6$abc")) (supplementary-groups '("wheel" "audio" "video"))) %base-user-accounts)) (packages (cons nss-certs %base-packages)) (services (cons* (service dhcp-client-service-type) %base-services)))
L
L
Ludovic Courtès wrote on 27 Jun 2022 10:04
(address . control@debbugs.gnu.org)
871qvaleh1.fsf@gnu.org
severity 56209 important
quit
L
L
Ludovic Courtès wrote on 27 Jun 2022 10:04
(address . control@debbugs.gnu.org)
87zghyjzvo.fsf@gnu.org
merge 56209 56250
quit
L
L
Ludovic Courtès wrote on 27 Jun 2022 23:50
Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
(name . angry rectangle)(address . angryrectangle@cock.li)(address . 56209@debbugs.gnu.org)
87mtdxg4hp.fsf@gnu.org
Hi,

angry rectangle <angryrectangle@cock.li> skribis:

Toggle quote (5 lines)
> Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
> To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
> I assume this means the filesystem wasn't cleanly unmounted.
> I am doing a proper shutdown, using either "reboot" or "halt."

I can see that as well.

Toggle quote (3 lines)
> The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
> I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Preliminary investigation suggests this is because shepherd doesn’t
close log files beforehand (in 0.9, those specified as #:log-file to
‘make-forkexec-constructor’ & co. are opened by PID 1; conversely,
shepherd 0.8 would open them in the child process.)

To be continued…

Thanks for reporting the issue and finding the offending commit!

Ludo’.

PS: Below my (ugly) debugging tricks for posterity. To see those
messages, you typically need to start a VM with ‘-serial stdio’ and
pass “console=ttyS0” to the kernel. (It’s best to start a
standalone VM with an image created by ‘guix system image -t
qcow2’.)
Toggle diff (102 lines)
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..25d747d226 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -299,6 +299,9 @@ (define %root-file-system-shepherd-service
(stop #~(lambda _
;; Return #f if successfully stopped.
(sync)
+ (call-with-port (open-file "/dev/console" "w0")
+ (lambda (port)
+ (display "This is my last message.\n" port)))
(call-with-blocked-asyncs
(lambda ()
@@ -314,11 +317,24 @@ (define %root-file-system-shepherd-service
;; Close /dev/console.
(for-each close-fdes '(0 1 2))
- ;; At this point, there are no open files left, so the
- ;; root file system can be re-mounted read-only.
- (mount #f "/" #f
- (logior MS_REMOUNT MS_RDONLY)
- #:update-mtab? #f)
+ (open-fdes "/dev/null" O_RDONLY)
+ (open-fdes "/dev/console" O_WRONLY)
+ (open-fdes "/dev/console" O_WRONLY)
+ (current-output-port (fdopen 1 "w0"))
+ (current-error-port (fdopen 2 "w0"))
+ (pk 'umount-root)
+
+ (catch 'system-error
+ (lambda ()
+ ;; At this point, there are no open files left, so the
+ ;; root file system can be re-mounted read-only.
+ (mount #f "/" #f
+ (logior MS_REMOUNT MS_RDONLY)
+ #:update-mtab? #f))
+ (lambda args
+ (pk 'umount-root-error args)
+ #f))
+ (pk 'done-umount-root)
#f)))))
(respawn? #f)))
@@ -406,7 +422,28 @@ (define (file-system-shepherd-service file-system)
;; Make sure PID 1 doesn't keep TARGET busy.
(chdir "/")
- (umount #$target)
+ (call-with-port (open-file "/dev/console" "w0")
+ (lambda (port)
+ (parameterize ((current-output-port port)
+ (current-error-port port))
+ (pk 'umount #$target)
+ #$(if (file-system-mount-may-fail? file-system)
+ #~(catch 'system-error
+ (lambda ()
+ (umount #$target))
+ (const #f))
+ #~(catch 'system-error
+ (lambda ()
+ (umount #$target))
+ (lambda args
+ (pk 'umount-error args)
+ (system* #$(file-append (@ (gnu
+ packages
+ lsof)
+ lsof)
+ "/bin/lsof"))
+ #f)))
+ (pk 'done-umount #$target))))
#f))
;; We need additional modules.
diff --git a/gnu/system/examples/bare-bones.tmpl b/gnu/system/examples/bare-bones.tmpl
index 387e4b12ba..1f9012c167 100644
--- a/gnu/system/examples/bare-bones.tmpl
+++ b/gnu/system/examples/bare-bones.tmpl
@@ -2,8 +2,8 @@
;; for a "bare bones" setup, with no X11 display server.
(use-modules (gnu))
-(use-service-modules networking ssh)
-(use-package-modules screen ssh)
+(use-service-modules networking ssh shepherd)
+(use-package-modules screen ssh admin)
(operating-system
(host-name "komputilo")
@@ -38,6 +38,13 @@
"audio" "video")))
%base-user-accounts))
+ (essential-services
+ (modify-services (operating-system-default-essential-services
+ this-operating-system)
+ (shepherd-root-service-type
+ config => (shepherd-configuration
+ (shepherd shepherd-0.8)))))
+
;; Globally-installed packages.
(packages (cons screen %base-packages))
L
L
Ludovic Courtès wrote on 27 Jun 2022 23:51
control message for bug #53214
(address . control@debbugs.gnu.org)
87lethg4h8.fsf@gnu.org
block 53214 by 56209
quit
L
L
Ludovic Courtès wrote on 29 Jun 2022 00:02
Re: bug#56209: Shepherd 0.9 not cleanly unmounting root
(name . angry rectangle)(address . angryrectangle@cock.li)(address . 56209@debbugs.gnu.org)
87ilokbg56.fsf@gnu.org
Hi,

I believe the attached patch fixes the problem. I’ll do more testing on
my side but I’d be grateful if someone would give it a try too.

Ludo’.
Toggle diff (82 lines)
diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index 17b7b38a15..dea58354d9 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -328,7 +328,18 @@ (define-public shepherd-0.9
version ".tar.gz"))
(sha256
(base32
- "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))))
+ "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))
+ (modules '((guix build utils)))
+ (snippet
+ ;; Avoid continuation barriers so (@ (fibers) sleep) can be
+ ;; called from a service's 'stop' method
+ '(substitute* "modules/shepherd/service.scm"
+ (("call-with-blocked-asyncs") ;in 'stop' method
+ "(lambda (thunk) (thunk))")
+ (("\\(for-each-service\n") ;in 'shutdown-services'
+ "((lambda (proc)
+ (for-each proc
+ (fold-services cons '())))\n")))))
(arguments
(list #:configure-flags #~'("--localstatedir=/var")
#:make-flags #~'("GUILE_AUTO_COMPILE=0")
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..1fd4cd84f3 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -300,27 +300,36 @@ (define %root-file-system-shepherd-service
;; Return #f if successfully stopped.
(sync)
- (call-with-blocked-asyncs
- (lambda ()
- (let ((null (%make-void-port "w")))
- ;; Close 'shepherd.log'.
- (display "closing log\n")
- ((@ (shepherd comm) stop-logging))
+ (let ((null (%make-void-port "w")))
+ ;; Close 'shepherd.log'.
+ (display "closing log\n")
+ ((@ (shepherd comm) stop-logging))
- ;; Redirect the default output ports..
- (set-current-output-port null)
- (set-current-error-port null)
+ ;; Redirect the default output ports..
+ (set-current-output-port null)
+ (set-current-error-port null)
- ;; Close /dev/console.
- (for-each close-fdes '(0 1 2))
+ ;; Close /dev/console.
+ (for-each close-fdes '(0 1 2))
- ;; At this point, there are no open files left, so the
- ;; root file system can be re-mounted read-only.
- (mount #f "/" #f
- (logior MS_REMOUNT MS_RDONLY)
- #:update-mtab? #f)
+ (let loop ((n 10))
+ (unless (catch 'system-error
+ (lambda ()
+ ;; At this point, there are no open files left, so the
+ ;; root file system can be re-mounted read-only.
+ (mount #f "/" #f
+ (logior MS_REMOUNT MS_RDONLY)
+ #:update-mtab? #f)
+ #t)
+ (const #f))
+ (unless (zero? n)
+ ;; Yield to the other fibers. That gives logging fibers
+ ;; an opportunity to close log files so the 'mount' call
+ ;; doesn't fail with EBUSY.
+ ((@ (fibers) sleep) 1)
+ (loop (- n 1)))))
- #f)))))
+ #f)))
(respawn? #f)))
(define root-file-system-service-type
A
A
angry rectangle wrote on 29 Jun 2022 02:18
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 56209@debbugs.gnu.org)
87tu84fgzs.fsf@cock.li
I tried your patch on one of my computers and it works.
Thank you.
L
L
Ludovic Courtès wrote on 1 Jul 2022 12:29
(name . angry rectangle)(address . angryrectangle@cock.li)(address . 56209-done@debbugs.gnu.org)
87czep5do4.fsf@gnu.org
Hi,

angry rectangle <angryrectangle@cock.li> skribis:

Toggle quote (2 lines)
> I tried your patch on one of my computers and it works.

Thanks for testing.

Pushed as 0483c71cc5aeb3b69f6deb154fe12c0b2e6dc17f. The reason is took
me more time is that I wanted to have a system test for that to make
sure it doesn’t come back to haunt us in the future. Now we should be
fine. :-)

Ludo’.
Closed
?