System test partition.img differs in size across hosts(?)

  • Done
  • quality assurance status badge
Details
5 participants
  • david larsson
  • Leo Famulari
  • Maxim Cournoyer
  • Tobias Geerinckx-Rice
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Tobias Geerinckx-Rice
Severity
normal
T
T
Tobias Geerinckx-Rice wrote on 11 Jan 2022 20:31
(name . Bug reports for GNU Guix)(address . bug-guix@gnu.org)
874k6akqlx.fsf@nckx
Guix,

This is weird. On berlin:

Toggle snippet (8 lines)
$ guix build
/gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
[…]
Creating filesystem with 351 1k blocks and 40 inodes
[…]
/gnu/store/q18ca3ilma0h5hpn4s39xhzn0kc7jm5x-partition.img

On my laptop:

Toggle snippet (13 lines)
$ guix build
/gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
[…]
Creating filesystem with 242 1k blocks and 32 inodes
[…]
Copying files into the device: ext2fs_symlink: Could not allocate
inode in ext2 filesystem while creating symlink "system"
__populate_fs: Could not allocate inode in ext2 filesystem while
writing symlink"system"
mke2fs: Could not allocate inode in ext2 filesystem while
populating file system

This happens with both a tmpfs and a bcachefs /tmp.

The same make check-system TESTS="openvswitch" fails for Marius as
well, although I don't know the exact output. They tested btrfs
and tmpfs, and suggested a kernel regression.

I don't understand how that would cause this, but I'm forced to
agree: something spooky is going on in the chroot and the kernel
is a big variable.

The attached patch was written before I was aware of above
weirdness and only works around the issue.

Kind regards,

T G-R
From 18f288d4b69faa73ffb75488dbc924640441d7ee Mon Sep 17 00:00:00 2001
From: Tobias Geerinckx-Rice <me@tobias.gr>
Date: Tue, 11 Jan 2022 19:56:53 +0100
Subject: [PATCH] build: image: Account for fixed-size file system structures.

* gnu/build/image.scm (estimate-partition-size): Enforce a 1-MiB minimum.
---
gnu/build/image.scm | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

Toggle diff (28 lines)
diff --git a/gnu/build/image.scm b/gnu/build/image.scm
index bdd5ec25a9..81caa424f8 100644
--- a/gnu/build/image.scm
+++ b/gnu/build/image.scm
@@ -3,7 +3,7 @@
;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
-;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
;;;
;;; This file is part of GNU Guix.
@@ -62,8 +62,10 @@ (define (size-in-kib size)
(define (estimate-partition-size root)
"Given the ROOT directory, evaluate and return its size. As this doesn't
-take the partition metadata size into account, take a 25% margin."
- (* 1.25 (file-size root)))
+take the partition metadata size into account, take a 25% margin. As this in
+turn doesn't take any constant overhead into account, force a 1-MiB minimum."
+ (max (ash 1 20)
+ (* 1.25 (file-size root))))
(define* (make-ext-image partition target root
#:key
--
2.34.0
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYd3dSg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15ZlsA+wXVpEYWsFN1dr6+JG7ORtm3P8snCJRGFG9woL+w
nDeQAQDTyXIUvFylXSHtRC4soI6fyh7A4ImBKKMfvzrOzTsmAw==
=A7Ap
-----END PGP SIGNATURE-----

T
T
Tobias Geerinckx-Rice wrote on 11 Jan 2022 20:44
87zgo2jbr1.fsf@nckx
The most likely culprit is a change or difference in how the
kernel answers FILE-SIZE's ‘how much disc space does FILE
consume?’ — rounding it to N blocks or bytes, including or
excluding directory sizes, differing reported directory sizes,
etc.

I'll do more testing.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYd3ewg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15sJEA/jjkAnpHLQ1QP0BrCsYY5odeA4T79DyHrKjmauBy
+uFoAQDd/Mtj3rleq6mihEDNdAVa1/tIVHJQ7pZKEW+7BDCKCA==
=E0rL
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 25 Jan 2022 18:54
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 53194@debbugs.gnu.org)
87r18v4s5x.fsf@gmail.com
Hi Tobias,

[...]

Toggle quote (27 lines)
> diff --git a/gnu/build/image.scm b/gnu/build/image.scm
> index bdd5ec25a9..81caa424f8 100644
> --- a/gnu/build/image.scm
> +++ b/gnu/build/image.scm
> @@ -3,7 +3,7 @@
> ;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
> ;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
> ;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
> -;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
> +;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
> ;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
> ;;;
> ;;; This file is part of GNU Guix.
> @@ -62,8 +62,10 @@ (define (size-in-kib size)
>
> (define (estimate-partition-size root)
> "Given the ROOT directory, evaluate and return its size. As this doesn't
> -take the partition metadata size into account, take a 25% margin."
> - (* 1.25 (file-size root)))
> +take the partition metadata size into account, take a 25% margin. As this in
> +turn doesn't take any constant overhead into account, force a 1-MiB minimum."
> + (max (ash 1 20)
> + (* 1.25 (file-size root))))
>
> (define* (make-ext-image partition target root
> #:key

Looks reasonable to me (although it is interesting that the behavior is
not the same across machines...).

While at it, you may want to fix this docstring:

Toggle snippet (8 lines)
(define (file-size file)
- "Return the size of bytes of FILE, entering it if FILE is a directory."
+ "Return the size in bytes of FILE, entering it if FILE is a directory."
(file-system-fold (const #t)
(lambda (file stat result) ;leaf
(+ (stat:size stat) result))

in guix/build/store-copy.scm.

Thanks!

Maxim
L
L
Leo Famulari wrote on 4 Feb 2022 05:43
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfyu/uHQun5pevhM@jasmine.lan
On Tue, Jan 11, 2022 at 08:31:27PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (15 lines)
> On my laptop:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 242 1k blocks and 32 inodes
> […]
> Copying files into the device: ext2fs_symlink: Could not allocate inode in
> ext2 filesystem while creating symlink "system"
> __populate_fs: Could not allocate inode in ext2 filesystem while writing
> symlink"system"
> mke2fs: Could not allocate inode in ext2 filesystem while populating file
> system
> --8<---------------cut here---------------end--------------->8---

Same here.

Toggle quote (2 lines)
> This happens with both a tmpfs and a bcachefs /tmp.

And also on btrfs.
L
L
Leo Famulari wrote on 4 Feb 2022 06:17
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy21H105IgGDlbk@jasmine.lan
On Tue, Jan 11, 2022 at 08:31:27PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (2 lines)
> This is weird. On berlin:

Berlin is using ext4, right?

Toggle quote (2 lines)
> On my laptop:
> --8<---------------cut here---------------start------------->8---
[...]
Toggle quote (6 lines)
> mke2fs: Could not allocate inode in ext2 filesystem while populating file
> system
> --8<---------------cut here---------------end--------------->8---
>
> This happens with both a tmpfs and a bcachefs /tmp.

And it fails for me on btrfs, but not on ext4.

I tested with Guix kernels 5.16.5, 5.15.17, and 5.15.15, as well as
Debian's 5.10.0-11-amd64.
L
L
Leo Famulari wrote on 4 Feb 2022 06:23
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy4UlFIFVe/eFuK@jasmine.lan
On Tue, Jan 11, 2022 at 08:44:11PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (5 lines)
> The most likely culprit is a change or difference in how the kernel answers
> FILE-SIZE's ‘how much disc space does FILE consume?’ — rounding it to N
> blocks or bytes, including or excluding directory sizes, differing reported
> directory sizes, etc.

I'm going to build the version of the kernel used on berlin and test
with that.

I do find myself wondering if something in Guix is measuring the wrong
thing: maybe we are measuring the size of files compressed in transit,
rather than their uncompressed size on disk. Or something like that.
L
L
Leo Famulari wrote on 4 Feb 2022 06:32
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy6VPTWFWaDVMqd@jasmine.lan
On Fri, Feb 04, 2022 at 12:23:30AM -0500, Leo Famulari wrote:
Toggle quote (3 lines)
> I'm going to build the version of the kernel used on berlin and test
> with that.

Actually, I already had it built. This bug still manifests on that version
of the kernel. So...

Toggle quote (4 lines)
> I do find myself wondering if something in Guix is measuring the wrong
> thing: maybe we are measuring the size of files compressed in transit,
> rather than their uncompressed size on disk. Or something like that.

I'm still leaning towards something besides a change in the kernel.
L
L
Leo Famulari wrote on 4 Feb 2022 17:55
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)
Yf1agsMoPFgOzUxZ@jasmine.lan
On Fri, Feb 04, 2022 at 12:32:04AM -0500, Leo Famulari wrote:
Toggle quote (2 lines)
> I'm still leaning towards something besides a change in the kernel.

Using bisection of the Guix Git repo, it seems the problem was
introduced in commit 2d12ec724ea2, "scripts: system: Rationalize
persistency."
L
L
Leo Famulari wrote on 4 Feb 2022 18:04
(no subject)
(name . GNU bug tracker automated control server)(address . control@debbugs.gnu.org)
Yf1crbaC6auUKuqy@jasmine.lan
block 53214 with 53194
M
M
Maxim Cournoyer wrote on 6 Feb 2022 05:42
Re: bug#53194: System test partition.img differs in size across hosts(?)
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 53194@debbugs.gnu.org)
87fsow38sh.fsf@gmail.com
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (45 lines)
> Hi Tobias,
>
> [...]
>
>> diff --git a/gnu/build/image.scm b/gnu/build/image.scm
>> index bdd5ec25a9..81caa424f8 100644
>> --- a/gnu/build/image.scm
>> +++ b/gnu/build/image.scm
>> @@ -3,7 +3,7 @@
>> ;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
>> ;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
>> ;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
>> -;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
>> +;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
>> ;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
>> ;;;
>> ;;; This file is part of GNU Guix.
>> @@ -62,8 +62,10 @@ (define (size-in-kib size)
>>
>> (define (estimate-partition-size root)
>> "Given the ROOT directory, evaluate and return its size. As this doesn't
>> -take the partition metadata size into account, take a 25% margin."
>> - (* 1.25 (file-size root)))
>> +take the partition metadata size into account, take a 25% margin. As this in
>> +turn doesn't take any constant overhead into account, force a 1-MiB minimum."
>> + (max (ash 1 20)
>> + (* 1.25 (file-size root))))
>>
>> (define* (make-ext-image partition target root
>> #:key
>
> Looks reasonable to me (although it is interesting that the behavior is
> not the same across machines...).
>
> While at it, you may want to fix this docstring:
>
> (define (file-size file)
> - "Return the size of bytes of FILE, entering it if FILE is a directory."
> + "Return the size in bytes of FILE, entering it if FILE is a directory."
> (file-system-fold (const #t)
> (lambda (file stat result) ;leaf
> (+ (stat:size stat) result))
>
> in guix/build/store-copy.scm.

FYI, I pushed this workaround in
3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

Thanks,

Maxim
L
L
Leo Famulari wrote on 6 Feb 2022 18:41
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
YgAIWLZD4T85IDo/@jasmine.lan
On Sat, Feb 05, 2022 at 11:42:38PM -0500, Maxim Cournoyer wrote:
Toggle quote (3 lines)
> FYI, I pushed this workaround in
> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

I don't see this commit in the repo.
M
M
Maxim Cournoyer wrote on 7 Feb 2022 22:29
(name . Leo Famulari)(address . leo@famulari.name)
87zgn2z7p1.fsf@gmail.com
Hi Leo!

Leo Famulari <leo@famulari.name> writes:

Toggle quote (6 lines)
> On Sat, Feb 05, 2022 at 11:42:38PM -0500, Maxim Cournoyer wrote:
>> FYI, I pushed this workaround in
>> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.
>
> I don't see this commit in the repo.

Thank you for letting me know. I hate when this happens; usually the
'make authenticate' fails in my Emacs env because it doesn't run in a
'guix shell -D guix' environment and 'make authenticate' fails due to a
missing dependency, failing the git push.

Anyway, now pushed the linux-libre series for real (which included
this), as e5c06dce93.

Thanks!

Maxim
D
D
david larsson wrote on 17 Feb 2022 17:37
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
686be3f2c1cf4f7251533c521fc7bfa4@selfhosted.xyz
On 2022-01-11 20:31, Tobias Geerinckx-Rice via Bug reports for GNU Guix
wrote:
Toggle quote (46 lines)
> Guix,
>
> This is weird. On berlin:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build
> /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 351 1k blocks and 40 inodes
> […]
> /gnu/store/q18ca3ilma0h5hpn4s39xhzn0kc7jm5x-partition.img
> --8<---------------cut here---------------end--------------->8---
>
> On my laptop:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build
> /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 242 1k blocks and 32 inodes
> […]
> Copying files into the device: ext2fs_symlink: Could not allocate
> inode in ext2 filesystem while creating symlink "system"
> __populate_fs: Could not allocate inode in ext2 filesystem while
> writing symlink"system"
> mke2fs: Could not allocate inode in ext2 filesystem while populating
> file system
> --8<---------------cut here---------------end--------------->8---
>
> This happens with both a tmpfs and a bcachefs /tmp.
>
> The same make check-system TESTS="openvswitch" fails for Marius as
> well, although I don't know the exact output. They tested btrfs and
> tmpfs, and suggested a kernel regression.
>
> I don't understand how that would cause this, but I'm forced to agree:
> something spooky is going on in the chroot and the kernel is a big
> variable.
>
> The attached patch was written before I was aware of above weirdness
> and only works around the issue.
>
> Kind regards,
>
> T G-R

I hope Im not totally off here, so Im just hoping this is worth
mentioning:
Are the hosts using the same version of
? It might produce different sizes if the hosts are on different guix
commits - or is this not a possibility at all if the derivations have
the same hashes?

...because I just happened to notice that recently the guix system image
command produces images that are exactly the additional size of the root
offset and the esp-partition compared to what's specified with the
--image-size option. I think this has changed from 1-2 years back (since
Marius B. blog post reg. Ganeti). I think so because when I set up
Ganeti according to that blog post I could (IIRC) create guix instances
with the ganeti-instance-guix create script without problem - and it
produces images with guix system image --image-size=X command - but when
I did so again 1-2 weeks ago they failed with the error that Ganeti
disks were too small. The size issue could be resolved by removing from
the instance create-script the exact number of bytes to the
--image-size=X option that corresponded to the root offset and the
esp-partition sizes as defined in (gnu system image).

Maybe some commit has changed the size output of guix system image?


Best regards,
David
M
M
Mathieu Othacehe wrote on 31 Oct 2022 09:56
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87leowgyn1.fsf@gnu.org
Hello,

Toggle quote (3 lines)
> FYI, I pushed this workaround in
> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

I'm not able to reproduce this issue with or without the workaround, by
running the openvswitch test on Berlin and on my laptop. I think we can
close it for now and re-open it if someone finds a more reliable
reproducer.

Thanks,

Mathieu
Closed
?