guix deploy breaks SSH access with a PAM error

Open

Details

4 participants

Ludovic Courtès
Mathieu Othacehe
Maxim Cournoyer
Mathieu Othacehe

Owner: unassigned

Submitted by: Maxim Cournoyer

Severity: important

Maxim Cournoyer wrote on 16 Dec 2021 05:45

Recipients:(name . bug-guix)(address . bug-guix@gnu.org)

Message-ID:87czlx88ez.fsf@gmail.com

Hello Guix!

Following the big merge of the core-updates-frozen branch into master,

I've noticed now on two counts the following: running 'guix deploy'

leaves the remote machine unreachable by SSH. The connection passes

authentication but then gets closed immediately. /var/log/messages

reveals the following error:

Toggle snippet (3 lines)

sshd[29578]: error: PAM: pam_open_session(): Module is unknown

The machines updated were running Guix System revisions predating the

core-updates-frozen merge.

The 'guix deploy' command doesn't succeed due to SSH starting to fail at

99% completion or similar; the bootloader configuration is not updated

so rebooting boots into the same old system generation (and SSH works

again):

Toggle snippet (32 lines)guix deploy: deploying to x200...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
substitute: updating substitutes from 'http://127.0.0.1:8181'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
   /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv
   /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv
   /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv
   /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv

building /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv...
building /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv...
building /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv...
building /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv...
guix deploy: sending 5 store items (0 MiB) to 'x200.local'...
guix deploy: error: failed to deploy x200: failed to start 'guix repl' on 'x200.local'

$ guix deploy ~/stow/guix/machines/x200.scm --no-offload
The following 1 machine will be deployed:
  x200

guix deploy: deploying to x200...
guix deploy: error: failed to deploy x200: remote command
'/run/setuid-programs/sudo -n -- guix repl -t machine' failed with
status 254

$ ssh x200
Last login: Wed Dec 15 23:28:02 2021 from 192.168.10.15
Connection to x200.local closed.

This is obviously embarrassing in scenarios where the SSH connection is

the main way to reach to the remote machine.

Ideas?

Thank you,

Maxim

Maxim Cournoyer wrote on 16 Dec 2021 06:27

Recipients:(address . 52533@debbugs.gnu.org)

Message-ID:878rwl86g9.fsf@gmail.com

Hello,

I've found a workaround: disabling PAM for the remote machine

ssh-daemon. This is not done as part of 'guix deploy', so needs to be

fiddled with manually; I did it this way:

1. take note of the command line and sshd_config file:

Toggle snippet (3 lines)

ps -eFww | grep sshd

2. Copy the sshd_config file from /gnu/store to somewhere writable and

edit it so tha UsePAM is "no" instead of "yes".

3. Stop the Shepherd service with 'sudo herd stop ssh-daemon'

4. Start the ssh daemon manually (with sudo) by using the command found

in 1. but with the edited config from 2.

Then you should be able to 'guix deploy' successfully.

Reading 'man sshd_config', it says the default for UsePAM is no.

Considering this, and the issue it caused reported here, perhaps we

should disable it by default in Guix?

What do others think?

Thank you,

Maxim

Mathieu Othacehe wrote on 16 Dec 2021 09:54

control message for bug #52533

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87v8zpgcb0.fsf@meije.i-did-not-set--mail-host-address--so-tickle-me

severity 52533 important

quit

Ludovic Courtès wrote on 16 Dec 2021 16:02

Re: bug#52533: guix deploy breaks SSH access with a PAM error

Recipients:(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 52533@debbugs.gnu.org)

Message-ID:87ilvor3sn.fsf@gnu.org

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (12 lines)> Following the big merge of the core-updates-frozen branch into master,
> I've noticed now on two counts the following: running 'guix deploy'
> leaves the remote machine unreachable by SSH.  The connection passes
> authentication but then gets closed immediately.  /var/log/messages
> reveals the following error:
>
> sshd[29578]:  error: PAM: pam_open_session(): Module is unknown
>
>
> The machines updated were running Guix System revisions predating the
> core-updates-frozen merge.

This sounds a lot like this:

https://issues.guix.gnu.org/32182#1

WDYT?

Ludo’.

Mathieu Othacehe wrote on 13 Jan 2022 13:31

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87r19bom0r.fsf@gnu.org

Hey,

Toggle quote (4 lines)

> This sounds a lot like this:

> https://issues.guix.gnu.org/32182#1

I was just kicked out of my own server due to this PAM/SSH issue. It

happens quite frequently here. Time for a fix :).

Regarding the two potential solutions that you proposed in 2018, are

they still actual? If yes, I could maybe try to implement the second

suggestion: introducing service chain-loading.

Thanks,

Mathieu

Mathieu Othacehe wrote on 13 Jan 2022 13:38

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87ilunolnz.fsf@gnu.org

Toggle quote (4 lines)

> Regarding the two potential solutions that you proposed in 2018, are

> they still actual? If yes, I could maybe try to implement the second

> suggestion: introducing service chain-loading.

Oh sorry, I stopped reading the thread at

https://issues.guix.gnu.org/32182#1.Looks like the service

chain-loading might not be enough, I'll keep digging.

Thanks,

Mathieu

Ludovic Courtès wrote on 13 Jan 2022 16:04

Recipients:(name . Mathieu Othacehe)(address . othacehe@gnu.org)

Message-ID:87tue77k40.fsf@gnu.org

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (7 lines)

>> This sounds a lot like this:

>> https://issues.guix.gnu.org/32182#1

> I was just kicked out of my own server due to this PAM/SSH issue. It

> happens quite frequently here. Time for a fix :).

Note that ‘guix deploy’ now opens a single SSH session, starting from

7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the

problem.

Toggle quote (4 lines)

> Regarding the two potential solutions that you proposed in 2018, are

> they still actual? If yes, I could maybe try to implement the second

> suggestion: introducing service chain-loading.

Service chain-loading was implemented in the Shepherd a few years ago.

However, it doesn’t really help; consider these two scenario:

• You do ‘guix system reconfigure && herd restart term-tty1’. In that

case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process

(post-glibc upgrade, thanks to service chain-loading) and ‘login’

will happily load the .so files listed in /etc/pam.d/login (also

post-glibc upgrade).

• You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,

‘sshd’, and all the other services that depend on PAM: these

pre-glibc upgrade programs will try dlopening the post-glibc upgrade

PAM plugins, which will break.

The crux of the problem rather is the global /etc/pam.d: it’s valid for

pre-glibc upgrade programs, or for post-glibc upgrade programs, but not

both.

FHS distros have a similar problem though; how do they handle it? Do

they force services to be restarted when glibc is upgraded, or something

along these lines?

In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each

PAM-using Shepherd service (login, sshd, etc.) so that it sets

PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS

being configured? Tricky!

We could maybe sidestep the issue altogether with socket-activated

services: they’d be started on-demand, so the second scenario above

would be unlikely. But getting there is quite a bit of work…

Ludo’.

Maxim Cournoyer wrote on 13 Jan 2022 17:45

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87mtjz1t63.fsf@gmail.com

Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (11 lines)> Hi,
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>>> This sounds a lot like this:
>>>
>>>   https://issues.guix.gnu.org/32182#1
>>
>> I was just kicked out of my own server due to this PAM/SSH issue. It
>> happens quite frequently here. Time for a fix :).

Not a meaningful contribution to the discussion, but my workaround is to

disable PAM; as it is not enabled in OpenSSH by default, perhaps we

should also leave it off unless requested? What are the advantages of

having it on?

Toggle quote (30 lines)> Note that ‘guix deploy’ now opens a single SSH session, starting from
> 7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the
> problem.
>
>> Regarding the two potential solutions that you proposed in 2018, are
>> they still actual? If yes, I could maybe try to implement the second
>> suggestion: introducing service chain-loading.
>
> Service chain-loading was implemented in the Shepherd a few years ago.
> However, it doesn’t really help; consider these two scenario:
>
>   • You do ‘guix system reconfigure && herd restart term-tty1’.  In that
>     case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process
>     (post-glibc upgrade, thanks to service chain-loading) and ‘login’
>     will happily load the .so files listed in /etc/pam.d/login (also
>     post-glibc upgrade).
>
>   • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,
>     ‘sshd’, and all the other services that depend on PAM: these
>     pre-glibc upgrade programs will try dlopening the post-glibc upgrade
>     PAM plugins, which will break.
>
> The crux of the problem rather is the global /etc/pam.d: it’s valid for
> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
> both.
>
> FHS distros have a similar problem though; how do they handle it?  Do
> they force services to be restarted when glibc is upgraded, or something
> along these lines?

I just asked this question in Debian's OFTC channel:

"how does debian handle glibc updates? are services restarted when it

happens? Or does it postpone updating glibc until the next reboot?"

And got for answer: "there is no magic postponing of updates"; the

external needrestart [0] program was also mentioned.

Researching some more, it seems this may be handled on Debian by the use

of postinst scripts (which is an arbitrary shell script run after a

package is installed); so the libc package of Debian for example

restarts the postgres service to avoid problems:

[0] https://github.com/liske/needrestart

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275

Toggle quote (5 lines)

> In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each

> PAM-using Shepherd service (login, sshd, etc.) so that it sets

> PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS

> being configured? Tricky!

Good question, but that seems a good path to pursue; old services would

be using their own old pam modules, allowing them to continue running

unimpacted, while new ones would get the updated pam modules.

Toggle quote (4 lines)

> We could maybe sidestep the issue altogether with socket-activated

> services: they’d be started on-demand, so the second scenario above

> would be unlikely. But getting there is quite a bit of work…

I fail to see how this would be a solution for openssh, which would

typically already be running unless you've never login ounce since the

machine was up (or am I missing something?). Also, it seems to me inetd

can already do "socket activation", if this was somehow useful.

Thanks,

Maxim

Ludovic Courtès wrote on 17 Jan 2022 14:25

Recipients:(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)

Message-ID:877daypk8r.fsf@gnu.org

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (8 lines)

>>> I was just kicked out of my own server due to this PAM/SSH issue. It

>>> happens quite frequently here. Time for a fix :).

> Not a meaningful contribution to the discussion, but my workaround is to

> disable PAM; as it is not enabled in OpenSSH by default, perhaps we

> should also leave it off unless requested? What are the advantages of

> having it on?

Consistency: authentication had rather work consistently across all

system services that depend on it.

[...]

Toggle quote (24 lines)>> The crux of the problem rather is the global /etc/pam.d: it’s valid for
>> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
>> both.
>>
>> FHS distros have a similar problem though; how do they handle it?  Do
>> they force services to be restarted when glibc is upgraded, or something
>> along these lines?
>
> I just asked this question in Debian's OFTC channel:
>
> "how does debian handle glibc updates?  are services restarted when it
> happens?  Or does it postpone updating glibc until the next reboot?"
>
> And got for answer: "there is no magic postponing of updates"; the
> external needrestart [0] program was also mentioned.
>
> Researching some more, it seems this may be handled on Debian by the use
> of postinst scripts (which is an arbitrary shell script run after a
> package is installed); so the libc package of Debian for example
> restarts the postgres service to avoid problems:
>
> [0]  https://github.com/liske/needrestart
> [1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275

Yeah. My recollection is that apt is interactive by default, and it

would typically pop up a dialog telling you that services X and Y need

to be restarted, and asking whether you want to restart them now.

The difference compared to what we have (a message at then telling that

you “may need” to run ‘herd restart X’), the benefit IIRC is that it

tells you which services need to be restarted.

[...]

Toggle quote (8 lines)

>> We could maybe sidestep the issue altogether with socket-activated

>> services: they’d be started on-demand, so the second scenario above

>> would be unlikely. But getting there is quite a bit of work…

> I fail to see how this would be a solution for openssh, which would

> typically already be running unless you've never login ounce since the

> machine was up (or am I missing something?).

sshd could also be started via socket activation; ‘sshd’ subprocesses

corresponding to existing logins would be unaffected.

Toggle quote (3 lines)

> Also, it seems to me inetd can already do "socket activation", if this

> was somehow useful.

Yes, inetd can do that.  It would be nicer though to have it all
integrated in the Shepherd.

(Basically, it’s a choice we could make right away: do we move all
network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
inetd services, or do we instead extend the Shepherd to support socket
activation?  I’m rather in favor of the latter, but if in Guix System we
build an abstraction that can equally well target inetd or a future
Shepherd version, that’s even better.)

Ludo’.

Maxim Cournoyer wrote on 17 Jan 2022 16:19

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87v8yijsp6.fsf@gmail.com

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (9 lines)

> sshd could also be started via socket activation; ‘sshd’ subprocesses

> corresponding to existing logins would be unaffected.

>> Also, it seems to me inetd can already do "socket activation", if this

>> was somehow useful.

> Yes, inetd can do that. It would be nicer though to have it all

> integrated in the Shepherd.

I'm not sure. The beauty of Shepherd, in my eyes, when compared to

other init systems, is that it is lean and clean. Leveraging what's

already out there (and part of GNU) seems an obvious path to me, as it:

1. Means less code to write, document and maintain.

2. Creates more cohesion between various components of the GNU project.

Toggle quote (7 lines)

> (Basically, it’s a choice we could make right away: do we move all

> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to

> inetd services, or do we instead extend the Shepherd to support socket

> activation? I’m rather in favor of the latter, but if in Guix System we

> build an abstraction that can equally well target inetd or a future

> Shepherd version, that’s even better.)

We could start with just targeting inetd, and build the abstraction

later, if the need arises, perhaps? We may never need it.

Thanks,

Maxim

Ludovic Courtès wrote on 17 Jan 2022 17:13

Recipients:(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)

Message-ID:875yqimjc2.fsf@gnu.org

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (20 lines)> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>> sshd could also be started via socket activation; ‘sshd’ subprocesses
>> corresponding to existing logins would be unaffected.
>>
>>> Also, it seems to me inetd can already do "socket activation", if this
>>> was somehow useful.
>>
>> Yes, inetd can do that.  It would be nicer though to have it all
>> integrated in the Shepherd.
>
> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
> other init systems, is that it is lean and clean.  Leveraging what's
> already out there (and part of GNU) seems an obvious path to me, as it:
>
> 1. Means less code to write, document and maintain.
> 2. Creates more cohesion between various components of the GNU project.

Heheh, Guix was started to address #2 actually. Today, I think #2 is

okay but should not be an obstacle.

As for #1, sure, but Shepherd will need to grow a proper event loop

anyway, so socket activation won’t make much of a difference.

Also, taking a step back, systemd undoubtedly changed user expectations

for the better in terms of integration, monitoring, and logging. Having

the same level of integration in the Shepherd would be a step in that

direction.

Toggle quote (10 lines)>> (Basically, it’s a choice we could make right away: do we move all
>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>> inetd services, or do we instead extend the Shepherd to support socket
>> activation?  I’m rather in favor of the latter, but if in Guix System we
>> build an abstraction that can equally well target inetd or a future
>> Shepherd version, that’s even better.)
>
> We could start with just targeting inetd, and build the abstraction
> later, if the need arises, perhaps?  We may never need it.

Yes, so what I had in mind is, in Guix System, something like
<socket-activated-service>, which would kinda look like
<shepherd-service> but be lowered (for now) to an inetd service.

Thanks,
Ludo’.

Maxim Cournoyer wrote on 18 Jan 2022 05:33

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87h7a1irxc.fsf@gmail.com

Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (10 lines)>> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
>> other init systems, is that it is lean and clean.  Leveraging what's
>> already out there (and part of GNU) seems an obvious path to me, as it:
>>
>> 1. Means less code to write, document and maintain.
>> 2. Creates more cohesion between various components of the GNU project.
>
> Heheh, Guix was started to address #2 actually.  Today, I think #2 is
> okay but should not be an obstacle.

I personally still think the idea is more than "okay"; I see value in
it; one of the obvious benefits is documentation; most GNU packages come
with Texinfo documentation, which makes for a nice, integrated
experience.  I also think that as the system becomes more established
and integrate more of GNU, more GNU packages maintainers may be
interested in joining and contributing (reaching some critical mass).

Toggle quote (3 lines)

> As for #1, sure, but Shepherd will need to grow a proper event loop

> anyway, so socket activation won’t make much of a difference.

If we keep it dumb and use inetd, it wouldn't, right?  From what I
understand, systemd uses socket activation as a means to chain events,
while inetd is typically used to delay a service starting to save on
resources such as RAM (for services seldom used).  Is my primitive
understanding about right?

Toggle quote (5 lines)

> Also, taking a step back, systemd undoubtedly changed user expectations

> for the better in terms of integration, monitoring, and logging. Having

> the same level of integration in the Shepherd would be a step in that

> direction.

At a heavy cost (complexity -- sheer amount of code).  I remember
finding out, for example, that the database-backed, compressed logging
of systemd would consume more disk space than an uncompressed text log
file.  That's because each message has multiple keys associated with
that needs to be written to disk.  It's surprisingly inefficient.

Toggle quote (14 lines)>>> (Basically, it’s a choice we could make right away: do we move all
>>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>>> inetd services, or do we instead extend the Shepherd to support socket
>>> activation?  I’m rather in favor of the latter, but if in Guix System we
>>> build an abstraction that can equally well target inetd or a future
>>> Shepherd version, that’s even better.)
>>
>> We could start with just targeting inetd, and build the abstraction
>> later, if the need arises, perhaps?  We may never need it.
>
> Yes, so what I had in mind is, in Guix System, something like
> <socket-activated-service>, which would kinda look like
> <shepherd-service> but be lowered (for now) to an inetd service.

This sounds good to me, if you are confident it can fix the problem at

hand.

Thank you,

Maxim

Ludovic Courtès wrote on 18 Jan 2022 12:27

Recipients:(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)

Message-ID:87iluhjnbm.fsf@gnu.org

Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (21 lines)> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
>>> other init systems, is that it is lean and clean.  Leveraging what's
>>> already out there (and part of GNU) seems an obvious path to me, as it:
>>>
>>> 1. Means less code to write, document and maintain.
>>> 2. Creates more cohesion between various components of the GNU project.
>>
>> Heheh, Guix was started to address #2 actually.  Today, I think #2 is
>> okay but should not be an obstacle.
>
> I personally still think the idea is more than "okay"; I see value in
> it; one of the obvious benefits is documentation; most GNU packages come
> with Texinfo documentation, which makes for a nice, integrated
> experience.  I also think that as the system becomes more established
> and integrate more of GNU, more GNU packages maintainers may be
> interested in joining and contributing (reaching some critical mass).

Heheh. :-)

Toggle quote (5 lines)

>> As for #1, sure, but Shepherd will need to grow a proper event loop

>> anyway, so socket activation won’t make much of a difference.

> If we keep it dumb and use inetd, it wouldn't, right?

It will get that, independent of socket activation.

Toggle quote (5 lines)

> From what I understand, systemd uses socket activation as a means to

> chain events, while inetd is typically used to delay a service

> starting to save on resources such as RAM (for services seldom used).

> Is my primitive understanding about right?

Yes. In most cases, it’s about starting services lazily (much like the

Hurd’s passive translators, too.)

Thanks,

Ludo’.

Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 52533@debbugs.gnu.org

is:open	open issues
is:done	closed issues
submitter:<who>	search issue submitter
author:<who>	search by message author
date:yesterday..now	search by issue date
mdate:3m..2d	search by message date