guix deploy breaks SSH access with a PAM error

  • Open
  • quality assurance status badge
Details
4 participants
  • Ludovic Courtès
  • Mathieu Othacehe
  • Maxim Cournoyer
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
important
M
M
Maxim Cournoyer wrote on 16 Dec 2021 05:45
(name . bug-guix)(address . bug-guix@gnu.org)
87czlx88ez.fsf@gmail.com
Hello Guix!

Following the big merge of the core-updates-frozen branch into master,
I've noticed now on two counts the following: running 'guix deploy'
leaves the remote machine unreachable by SSH. The connection passes
authentication but then gets closed immediately. /var/log/messages
reveals the following error:

Toggle snippet (3 lines)
sshd[29578]: error: PAM: pam_open_session(): Module is unknown

The machines updated were running Guix System revisions predating the
core-updates-frozen merge.

The 'guix deploy' command doesn't succeed due to SSH starting to fail at
99% completion or similar; the bootloader configuration is not updated
so rebooting boots into the same old system generation (and SSH works
again):

Toggle snippet (32 lines)
guix deploy: deploying to x200...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
substitute: updating substitutes from 'http://127.0.0.1:8181'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
/gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv
/gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv
/gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv
/gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv

building /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv...
building /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv...
building /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv...
building /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv...
guix deploy: sending 5 store items (0 MiB) to 'x200.local'...
guix deploy: error: failed to deploy x200: failed to start 'guix repl' on 'x200.local'

$ guix deploy ~/stow/guix/machines/x200.scm --no-offload
The following 1 machine will be deployed:
x200

guix deploy: deploying to x200...
guix deploy: error: failed to deploy x200: remote command
'/run/setuid-programs/sudo -n -- guix repl -t machine' failed with
status 254

$ ssh x200
Last login: Wed Dec 15 23:28:02 2021 from 192.168.10.15
Connection to x200.local closed.

This is obviously embarrassing in scenarios where the SSH connection is
the main way to reach to the remote machine.

Ideas?

Thank you,

Maxim
M
M
Maxim Cournoyer wrote on 16 Dec 2021 06:27
(address . 52533@debbugs.gnu.org)
878rwl86g9.fsf@gmail.com
Hello,

I've found a workaround: disabling PAM for the remote machine
ssh-daemon. This is not done as part of 'guix deploy', so needs to be
fiddled with manually; I did it this way:

1. take note of the command line and sshd_config file:

Toggle snippet (3 lines)
ps -eFww | grep sshd

2. Copy the sshd_config file from /gnu/store to somewhere writable and
edit it so tha UsePAM is "no" instead of "yes".

3. Stop the Shepherd service with 'sudo herd stop ssh-daemon'

4. Start the ssh daemon manually (with sudo) by using the command found
in 1. but with the edited config from 2.

Then you should be able to 'guix deploy' successfully.

Reading 'man sshd_config', it says the default for UsePAM is no.
Considering this, and the issue it caused reported here, perhaps we
should disable it by default in Guix?

What do others think?

Thank you,

Maxim
M
M
Mathieu Othacehe wrote on 16 Dec 2021 09:54
control message for bug #52533
(address . control@debbugs.gnu.org)
87v8zpgcb0.fsf@meije.i-did-not-set--mail-host-address--so-tickle-me
severity 52533 important
quit
L
L
Ludovic Courtès wrote on 16 Dec 2021 16:02
Re: bug#52533: guix deploy breaks SSH access with a PAM error
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 52533@debbugs.gnu.org)
87ilvor3sn.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (12 lines)
> Following the big merge of the core-updates-frozen branch into master,
> I've noticed now on two counts the following: running 'guix deploy'
> leaves the remote machine unreachable by SSH. The connection passes
> authentication but then gets closed immediately. /var/log/messages
> reveals the following error:
>
> sshd[29578]: error: PAM: pam_open_session(): Module is unknown
>
>
> The machines updated were running Guix System revisions predating the
> core-updates-frozen merge.

This sounds a lot like this:


WDYT?

Ludo’.
M
M
Mathieu Othacehe wrote on 13 Jan 2022 13:31
(name . Ludovic Courtès)(address . ludo@gnu.org)
87r19bom0r.fsf@gnu.org
Hey,

Toggle quote (4 lines)
> This sounds a lot like this:
>
> https://issues.guix.gnu.org/32182#1

I was just kicked out of my own server due to this PAM/SSH issue. It
happens quite frequently here. Time for a fix :).

Regarding the two potential solutions that you proposed in 2018, are
they still actual? If yes, I could maybe try to implement the second
suggestion: introducing service chain-loading.

Thanks,

Mathieu
M
M
Mathieu Othacehe wrote on 13 Jan 2022 13:38
(name . Ludovic Courtès)(address . ludo@gnu.org)
87ilunolnz.fsf@gnu.org
Toggle quote (4 lines)
> Regarding the two potential solutions that you proposed in 2018, are
> they still actual? If yes, I could maybe try to implement the second
> suggestion: introducing service chain-loading.

Oh sorry, I stopped reading the thread at
chain-loading might not be enough, I'll keep digging.

Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 13 Jan 2022 16:04
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87tue77k40.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (7 lines)
>> This sounds a lot like this:
>>
>> https://issues.guix.gnu.org/32182#1
>
> I was just kicked out of my own server due to this PAM/SSH issue. It
> happens quite frequently here. Time for a fix :).

Note that ‘guix deploy’ now opens a single SSH session, starting from
7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the
problem.

Toggle quote (4 lines)
> Regarding the two potential solutions that you proposed in 2018, are
> they still actual? If yes, I could maybe try to implement the second
> suggestion: introducing service chain-loading.

Service chain-loading was implemented in the Shepherd a few years ago.
However, it doesn’t really help; consider these two scenario:

• You do ‘guix system reconfigure && herd restart term-tty1’. In that
case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process
(post-glibc upgrade, thanks to service chain-loading) and ‘login’
will happily load the .so files listed in /etc/pam.d/login (also
post-glibc upgrade).

• You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,
‘sshd’, and all the other services that depend on PAM: these
pre-glibc upgrade programs will try dlopening the post-glibc upgrade
PAM plugins, which will break.

The crux of the problem rather is the global /etc/pam.d: it’s valid for
pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
both.

FHS distros have a similar problem though; how do they handle it? Do
they force services to be restarted when glibc is upgraded, or something
along these lines?

In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each
PAM-using Shepherd service (login, sshd, etc.) so that it sets
PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS
being configured? Tricky!

We could maybe sidestep the issue altogether with socket-activated
services: they’d be started on-demand, so the second scenario above
would be unlikely. But getting there is quite a bit of work…

Ludo’.
M
M
Maxim Cournoyer wrote on 13 Jan 2022 17:45
(name . Ludovic Courtès)(address . ludo@gnu.org)
87mtjz1t63.fsf@gmail.com
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (11 lines)
> Hi,
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>>> This sounds a lot like this:
>>>
>>> https://issues.guix.gnu.org/32182#1
>>
>> I was just kicked out of my own server due to this PAM/SSH issue. It
>> happens quite frequently here. Time for a fix :).

Not a meaningful contribution to the discussion, but my workaround is to
disable PAM; as it is not enabled in OpenSSH by default, perhaps we
should also leave it off unless requested? What are the advantages of
having it on?

Toggle quote (30 lines)
> Note that ‘guix deploy’ now opens a single SSH session, starting from
> 7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the
> problem.
>
>> Regarding the two potential solutions that you proposed in 2018, are
>> they still actual? If yes, I could maybe try to implement the second
>> suggestion: introducing service chain-loading.
>
> Service chain-loading was implemented in the Shepherd a few years ago.
> However, it doesn’t really help; consider these two scenario:
>
> • You do ‘guix system reconfigure && herd restart term-tty1’. In that
> case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process
> (post-glibc upgrade, thanks to service chain-loading) and ‘login’
> will happily load the .so files listed in /etc/pam.d/login (also
> post-glibc upgrade).
>
> • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,
> ‘sshd’, and all the other services that depend on PAM: these
> pre-glibc upgrade programs will try dlopening the post-glibc upgrade
> PAM plugins, which will break.
>
> The crux of the problem rather is the global /etc/pam.d: it’s valid for
> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
> both.
>
> FHS distros have a similar problem though; how do they handle it? Do
> they force services to be restarted when glibc is upgraded, or something
> along these lines?

I just asked this question in Debian's OFTC channel:

"how does debian handle glibc updates? are services restarted when it
happens? Or does it postpone updating glibc until the next reboot?"

And got for answer: "there is no magic postponing of updates"; the
external needrestart [0] program was also mentioned.

Researching some more, it seems this may be handled on Debian by the use
of postinst scripts (which is an arbitrary shell script run after a
package is installed); so the libc package of Debian for example
restarts the postgres service to avoid problems:


Toggle quote (5 lines)
> In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each
> PAM-using Shepherd service (login, sshd, etc.) so that it sets
> PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS
> being configured? Tricky!

Good question, but that seems a good path to pursue; old services would
be using their own old pam modules, allowing them to continue running
unimpacted, while new ones would get the updated pam modules.

Toggle quote (4 lines)
> We could maybe sidestep the issue altogether with socket-activated
> services: they’d be started on-demand, so the second scenario above
> would be unlikely. But getting there is quite a bit of work…

I fail to see how this would be a solution for openssh, which would
typically already be running unless you've never login ounce since the
machine was up (or am I missing something?). Also, it seems to me inetd
can already do "socket activation", if this was somehow useful.

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 17 Jan 2022 14:25
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
877daypk8r.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (8 lines)
>>> I was just kicked out of my own server due to this PAM/SSH issue. It
>>> happens quite frequently here. Time for a fix :).
>
> Not a meaningful contribution to the discussion, but my workaround is to
> disable PAM; as it is not enabled in OpenSSH by default, perhaps we
> should also leave it off unless requested? What are the advantages of
> having it on?

Consistency: authentication had rather work consistently across all
system services that depend on it.

[...]

Toggle quote (24 lines)
>> The crux of the problem rather is the global /etc/pam.d: it’s valid for
>> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
>> both.
>>
>> FHS distros have a similar problem though; how do they handle it? Do
>> they force services to be restarted when glibc is upgraded, or something
>> along these lines?
>
> I just asked this question in Debian's OFTC channel:
>
> "how does debian handle glibc updates? are services restarted when it
> happens? Or does it postpone updating glibc until the next reboot?"
>
> And got for answer: "there is no magic postponing of updates"; the
> external needrestart [0] program was also mentioned.
>
> Researching some more, it seems this may be handled on Debian by the use
> of postinst scripts (which is an arbitrary shell script run after a
> package is installed); so the libc package of Debian for example
> restarts the postgres service to avoid problems:
>
> [0] https://github.com/liske/needrestart
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275

Yeah. My recollection is that apt is interactive by default, and it
would typically pop up a dialog telling you that services X and Y need
to be restarted, and asking whether you want to restart them now.

The difference compared to what we have (a message at then telling that
you “may need” to run ‘herd restart X’), the benefit IIRC is that it
tells you which services need to be restarted.

[...]

Toggle quote (8 lines)
>> We could maybe sidestep the issue altogether with socket-activated
>> services: they’d be started on-demand, so the second scenario above
>> would be unlikely. But getting there is quite a bit of work…
>
> I fail to see how this would be a solution for openssh, which would
> typically already be running unless you've never login ounce since the
> machine was up (or am I missing something?).

sshd could also be started via socket activation; ‘sshd’ subprocesses
corresponding to existing logins would be unaffected.

Toggle quote (3 lines)
> Also, it seems to me inetd can already do "socket activation", if this
> was somehow useful.

Yes, inetd can do that. It would be nicer though to have it all
integrated in the Shepherd.

(Basically, it’s a choice we could make right away: do we move all
network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
inetd services, or do we instead extend the Shepherd to support socket
activation? I’m rather in favor of the latter, but if in Guix System we
build an abstraction that can equally well target inetd or a future
Shepherd version, that’s even better.)

Ludo’.
M
M
Maxim Cournoyer wrote on 17 Jan 2022 16:19
(name . Ludovic Courtès)(address . ludo@gnu.org)
87v8yijsp6.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (9 lines)
> sshd could also be started via socket activation; ‘sshd’ subprocesses
> corresponding to existing logins would be unaffected.
>
>> Also, it seems to me inetd can already do "socket activation", if this
>> was somehow useful.
>
> Yes, inetd can do that. It would be nicer though to have it all
> integrated in the Shepherd.

I'm not sure. The beauty of Shepherd, in my eyes, when compared to
other init systems, is that it is lean and clean. Leveraging what's
already out there (and part of GNU) seems an obvious path to me, as it:

1. Means less code to write, document and maintain.
2. Creates more cohesion between various components of the GNU project.

Toggle quote (7 lines)
> (Basically, it’s a choice we could make right away: do we move all
> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
> inetd services, or do we instead extend the Shepherd to support socket
> activation? I’m rather in favor of the latter, but if in Guix System we
> build an abstraction that can equally well target inetd or a future
> Shepherd version, that’s even better.)

We could start with just targeting inetd, and build the abstraction
later, if the need arises, perhaps? We may never need it.

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 17 Jan 2022 17:13
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
875yqimjc2.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (20 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>> sshd could also be started via socket activation; ‘sshd’ subprocesses
>> corresponding to existing logins would be unaffected.
>>
>>> Also, it seems to me inetd can already do "socket activation", if this
>>> was somehow useful.
>>
>> Yes, inetd can do that. It would be nicer though to have it all
>> integrated in the Shepherd.
>
> I'm not sure. The beauty of Shepherd, in my eyes, when compared to
> other init systems, is that it is lean and clean. Leveraging what's
> already out there (and part of GNU) seems an obvious path to me, as it:
>
> 1. Means less code to write, document and maintain.
> 2. Creates more cohesion between various components of the GNU project.

Heheh, Guix was started to address #2 actually. Today, I think #2 is
okay but should not be an obstacle.

As for #1, sure, but Shepherd will need to grow a proper event loop
anyway, so socket activation won’t make much of a difference.

Also, taking a step back, systemd undoubtedly changed user expectations
for the better in terms of integration, monitoring, and logging. Having
the same level of integration in the Shepherd would be a step in that
direction.

Toggle quote (10 lines)
>> (Basically, it’s a choice we could make right away: do we move all
>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>> inetd services, or do we instead extend the Shepherd to support socket
>> activation? I’m rather in favor of the latter, but if in Guix System we
>> build an abstraction that can equally well target inetd or a future
>> Shepherd version, that’s even better.)
>
> We could start with just targeting inetd, and build the abstraction
> later, if the need arises, perhaps? We may never need it.

Yes, so what I had in mind is, in Guix System, something like
<socket-activated-service>, which would kinda look like
<shepherd-service> but be lowered (for now) to an inetd service.

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 18 Jan 2022 05:33
(name . Ludovic Courtès)(address . ludo@gnu.org)
87h7a1irxc.fsf@gmail.com
Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (10 lines)
>> I'm not sure. The beauty of Shepherd, in my eyes, when compared to
>> other init systems, is that it is lean and clean. Leveraging what's
>> already out there (and part of GNU) seems an obvious path to me, as it:
>>
>> 1. Means less code to write, document and maintain.
>> 2. Creates more cohesion between various components of the GNU project.
>
> Heheh, Guix was started to address #2 actually. Today, I think #2 is
> okay but should not be an obstacle.

I personally still think the idea is more than "okay"; I see value in
it; one of the obvious benefits is documentation; most GNU packages come
with Texinfo documentation, which makes for a nice, integrated
experience. I also think that as the system becomes more established
and integrate more of GNU, more GNU packages maintainers may be
interested in joining and contributing (reaching some critical mass).

Toggle quote (3 lines)
> As for #1, sure, but Shepherd will need to grow a proper event loop
> anyway, so socket activation won’t make much of a difference.

If we keep it dumb and use inetd, it wouldn't, right? From what I
understand, systemd uses socket activation as a means to chain events,
while inetd is typically used to delay a service starting to save on
resources such as RAM (for services seldom used). Is my primitive
understanding about right?

Toggle quote (5 lines)
> Also, taking a step back, systemd undoubtedly changed user expectations
> for the better in terms of integration, monitoring, and logging. Having
> the same level of integration in the Shepherd would be a step in that
> direction.

At a heavy cost (complexity -- sheer amount of code). I remember
finding out, for example, that the database-backed, compressed logging
of systemd would consume more disk space than an uncompressed text log
file. That's because each message has multiple keys associated with
that needs to be written to disk. It's surprisingly inefficient.

Toggle quote (14 lines)
>>> (Basically, it’s a choice we could make right away: do we move all
>>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>>> inetd services, or do we instead extend the Shepherd to support socket
>>> activation? I’m rather in favor of the latter, but if in Guix System we
>>> build an abstraction that can equally well target inetd or a future
>>> Shepherd version, that’s even better.)
>>
>> We could start with just targeting inetd, and build the abstraction
>> later, if the need arises, perhaps? We may never need it.
>
> Yes, so what I had in mind is, in Guix System, something like
> <socket-activated-service>, which would kinda look like
> <shepherd-service> but be lowered (for now) to an inetd service.

This sounds good to me, if you are confident it can fix the problem at
hand.

Thank you,

Maxim
L
L
Ludovic Courtès wrote on 18 Jan 2022 12:27
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87iluhjnbm.fsf@gnu.org
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (21 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> I'm not sure. The beauty of Shepherd, in my eyes, when compared to
>>> other init systems, is that it is lean and clean. Leveraging what's
>>> already out there (and part of GNU) seems an obvious path to me, as it:
>>>
>>> 1. Means less code to write, document and maintain.
>>> 2. Creates more cohesion between various components of the GNU project.
>>
>> Heheh, Guix was started to address #2 actually. Today, I think #2 is
>> okay but should not be an obstacle.
>
> I personally still think the idea is more than "okay"; I see value in
> it; one of the obvious benefits is documentation; most GNU packages come
> with Texinfo documentation, which makes for a nice, integrated
> experience. I also think that as the system becomes more established
> and integrate more of GNU, more GNU packages maintainers may be
> interested in joining and contributing (reaching some critical mass).

Heheh. :-)

Toggle quote (5 lines)
>> As for #1, sure, but Shepherd will need to grow a proper event loop
>> anyway, so socket activation won’t make much of a difference.
>
> If we keep it dumb and use inetd, it wouldn't, right?

It will get that, independent of socket activation.

Toggle quote (5 lines)
> From what I understand, systemd uses socket activation as a means to
> chain events, while inetd is typically used to delay a service
> starting to save on resources such as RAM (for services seldom used).
> Is my primitive understanding about right?

Yes. In most cases, it’s about starting services lazily (much like the
Hurd’s passive translators, too.)

Thanks,
Ludo’.
?