[shepherd] Altering system time renders herd unresponsive

  • Open
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Sergey Trofimov
  • Vladilen Kozin
Owner
unassigned
Submitted by
Vladilen Kozin
Severity
important
Merged with
V
V
Vladilen Kozin wrote on 22 Oct 2023 15:43
(address . bug-guix@gnu.org)
CACw=CXN8dbRb8RmiHimqTs6J_QtSz5HuXaxf0mkRJeEEX1Wy7w@mail.gmail.com
Hello guix.

My server would consistently run with system time 1h ahead of actual.
Both `date` and `hwclock` would show the same time off by 1hr, while
BIOS showed me correct time. I'm not sure why, but some services won't
run if time difference is e.g. over 15min or smth, so.

$ sudo date -s '-1 hour'

fixes time but causes `herd` to become unresponsive as in you type a
command, any command and stare at tty stuck. Also ssh'ing into the
system becomes impossible. Any attempt gets logged in
/var/log/messages - I can see that, but you again just stare at
unresponsive terminal. Initially I thought it fried shepherd
completely, so I powercycle the system to get it back. `sudo reboot`
being an alias to `herd` command will of course not work - so you have
to do it physically. Annoying but feasible on a desktop system -
complete nightmare on a physical server which may take up to 20min to
reboot due to inventory lifecycle and such.

By chance, I got distracted this time and just left it hanging. Lo and
behold it unfroze some 15-20min later. What gives I've no clue.

I hope I won't be seeing this particular issue again, cause I followed
system clock alteration with:
$ sudo hwclock -w
and reboot shows correct time.

In general my experience with shepherd has been less than stellar.
IMO, this just shouldn't happen with PID 1 ever - cause there isn't
anything you can do at this point. Not the first time it became
unresponsive. On occasion after pull that changes some user service
code, followed by system reconfigure those services would start
failing to find their binaries - best guess I have there is that those
specific services depend on user-home service or some such and
something happens that prevents discovery of said binaries in PATH -
binaries in those services aren't referenced by absolute path in GNU
store. Separate issue.

Generation 8 Oct 14 2023 00:22:53 (current)
file name: /var/guix/profiles/system-8-link
canonical file name: /gnu/store/j9i2w1zacw7sl8vlb7k1g7p0vnd58ns7-system
label: GNU with Linux 6.4.16
bootloader: grub
root device: label: "r720-guix-0"
kernel: /gnu/store/cbc7x9in2dnjrnh840c21ivgygnndp1c-linux-6.4.16/bzImage
channels:
guix:
branch: master
commit: 3963fa1a465708690cd1554d911613f1c92f5eef

Thank you

--
Best regards
Vlad Kozin
L
L
Ludovic Courtès wrote on 23 Oct 2023 21:58
(name . Vladilen Kozin)(address . vladilen.kozin@gmail.com)(address . 66684@debbugs.gnu.org)
874jihf6vq.fsf@gnu.org
Hi Vladilen,

Vladilen Kozin <vladilen.kozin@gmail.com> skribis:

Toggle quote (11 lines)
> My server would consistently run with system time 1h ahead of actual.
> Both `date` and `hwclock` would show the same time off by 1hr, while
> BIOS showed me correct time. I'm not sure why, but some services won't
> run if time difference is e.g. over 15min or smth, so.
>
> $ sudo date -s '-1 hour'
>
> fixes time but causes `herd` to become unresponsive as in you type a
> command, any command and stare at tty stuck. Also ssh'ing into the
> system becomes impossible.

Thanks for your report. This issue comes from Fibers 1.3.1:


There’s currently no bug-fix in sight though.

Ludo’.
L
L
Ludovic Courtès wrote on 23 Nov 2023 12:44
control message for bug #66684
(address . control@debbugs.gnu.org)
875y1sd78a.fsf@gnu.org
severity 66684 important
quit
S
S
Sergey Trofimov wrote on 16 Jan 07:55 +0100
control message for bug #68476
(address . control@debbugs.gnu.org)
09ab2021d319afa614850a7443fd0f10@sarg.org.ru
merge 68476 66684
quit
?