Installation tests are failing

  • Done
  • quality assurance status badge
Details
5 participants
  • bokr
  • Ludovic Courtès
  • Mathieu Othacehe
  • Maxim Cournoyer
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Mathieu Othacehe
Severity
important
M
M
Mathieu Othacehe wrote on 8 Apr 2022 11:51
(address . bug-guix@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
87r167rjhv.fsf@gnu.org
Hello,

The installation tests are failing this way:

Toggle snippet (4 lines)
conversation expecting pattern ((quote pause))
Apr 7 17:41:58 localhost installer[227]: guix system: error: failed to connect to `/var/guix/daemon-socket/socket': Connection refused

this is right after the 'guix-daemon' service is restarted. It looks
like this regression is introduced by the switch to the new Shepherd
release.

See:

Thanks,

Mathieu
M
M
Mathieu Othacehe wrote on 8 Apr 2022 17:10
(address . bug-guix@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
87v8vjwqzk.fsf@gnu.org
The following tests are also failing since the Shepherd upgrade:


Thanks,

Mathieu
M
M
Mathieu Othacehe wrote on 26 Apr 2022 10:27
control message for bug #54786
(address . control@debbugs.gnu.org)
87bkwow8oo.fsf@meije.i-did-not-set--mail-host-address--so-tickle-me
severity 54786 important
quit
M
M
Mathieu Othacehe wrote on 28 Apr 2022 09:22
Re: bug#54786: Installation tests are failing
(address . 54786@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
87zgk5brkd.fsf@gnu.org
Hello,

Those tests are still failing. It looks like most of the failures are
caused by daemons started multiple times.

Toggle quote (2 lines)
The nginx daemon seems to be started multiple times:

Toggle snippet (20 lines)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)



This is the GNU system. Welcome.
komputilo login: nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] still could not bind()
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: nginx
action: start
key: %exception
args: ("#<&invoke-error program: \"/gnu/store/815abphg8vr8qkl8ykd8pyxp1v62c9gk-nginx-1.21.6/sbin/nginx\" arguments: (\"-c\" \"/gnu/store/rbjgg41p22lgkjwrc8inrhbmqah54cgq-nginx.conf\" \"-p\" \"/var/run/nginx\") exit-status: 1 term-signal: #f stop-signal: #f>")

Tests failed, dumping log file '/gnu/store/p72g83l9nag6c830pzwgcgpnvnyr53p1-cgit-test/cgit.log'.

Toggle quote (2 lines)
The nginx daemon seems to be started multiple times:

Toggle snippet (20 lines)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)



This is the GNU system. Welcome.
komputilo login: nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] still could not bind()
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: nginx
action: start
key: %exception
args: ("#<&invoke-error program: \"/gnu/store/815abphg8vr8qkl8ykd8pyxp1v62c9gk-nginx-1.21.6/sbin/nginx\" arguments: (\"-c\" \"/gnu/store/ayafihmfwg3yw4hp8nw622g2rr9mw7vn-nginx.conf\" \"-p\" \"/var/run/nginx\") exit-status: 1 term-signal: #f stop-signal: #f>")

Tests failed, dumping log file '/gnu/store/ix0hpwpr7b6zh20arig9bpg2lqzysxi7-gitile-test/gitile.log'.

Toggle quote (3 lines)
> * jami-test (https://ci.guix.gnu.org/build/646811/details)

Looks like those tests are failing because the daemon is started
multiple times:

Toggle snippet (45 lines)
This is the GNU system. Welcome.
jami login: Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

12:21:08.165 os_core_unix.c !pjlib 2.11 for POSIX initialized
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: jami
action: start
key: match-error
args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: jami
action: start
key: match-error
args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: jami
action: start
key: match-error
args: ("match" "no matching pattern" #f)

Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 28 Apr 2022 21:19
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 54786@debbugs.gnu.org)
87y1zpaud6.fsf@gnu.org
Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (4 lines)
>
> The nginx daemon seems to be started multiple times:

I believe this is caused by a change of semantics (really: a bug) in the
shepherd ‘start’ method in 0.9.0.

Previously, ‘start’ would wait until the daemon was started. If the
service was being started, shepherd wouldn’t reply until it was done
starting it.

In 0.9.0, shepherd replies even while it’s waiting for the service to be
started. But as a consequence, it lets you start a service that is
already being started, leading to this mess you reported.


The proper fix is to better track the status of each service in
shepherd, and to prevent double-starts.

In the interim, perhaps we can work around that by using a different
check to determine whether the service is running. For instance,
instead of:

(test-assert "nginx running"
(marionette-eval
'(begin
(use-modules (gnu services herd))
(start-service 'nginx))
marionette))

… we’d write something like:

(test-assert "nginx running"
(wait-for-file "/var/run/nginx/pid"))

Thoughts? I’ll give that a try.

Thanks for the heads-up!

Ludo’.
L
L
Ludovic Courtès wrote on 29 Apr 2022 21:50
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 54786@debbugs.gnu.org)
87mtg3655a.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (16 lines)
> In the interim, perhaps we can work around that by using a different
> check to determine whether the service is running. For instance,
> instead of:
>
> (test-assert "nginx running"
> (marionette-eval
> '(begin
> (use-modules (gnu services herd))
> (start-service 'nginx))
> marionette))
>
> … we’d write something like:
>
> (test-assert "nginx running"
> (wait-for-file "/var/run/nginx/pid"))

I pushed something along these lines as
73eeeeafbb0765f76834b53c9fe6cf3c8f740840.

I wasn’t able to fix the tailon test because the ‘tailon’ package
doesn’t build and I failed to address that in a timely fashion.

Ludo’.
M
M
Mathieu Othacehe wrote on 30 Apr 2022 15:02
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 54786@debbugs.gnu.org)
875ymqwwq8.fsf@gnu.org
Hey Ludo,

Toggle quote (3 lines)
> I pushed something along these lines as
> 73eeeeafbb0765f76834b53c9fe6cf3c8f740840.

Thanks for the fix! The jami and jami-provisioning tests are also broken
because of what looks like to be the same issue:

Toggle snippet (13 lines)
One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: jami
action: start
key: match-error
args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

I think we don't have the right approach here: we should check that the
system tests are passing before pushing series and not adapt the tests
afterwards.

Historically this was difficult because the system tests were often in a
semi-broken state. Before the Shepherd update the tests were however all
passing (modulo rare intermittent failures).

As it's not always obvious what's going to break the system tests and
what's not (simple package update can easily break them), it would be
really nice to have mandatory commit verification.

The mumi/cuirass gateway that has already been discussed could really
help us here. If some people are motivated, we could split the work and
introduce such a mechanism.

Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 1 May 2022 15:26
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 54786@debbugs.gnu.org)
875ymp4c5f.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (15 lines)
> Thanks for the fix! The jami and jami-provisioning tests are also broken
> because of what looks like to be the same issue:
>
> One does not simply initialize the client: Another daemon is detected
> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
> 1. &action-exception-error:
> service: jami
> action: start
> key: match-error
> args: ("match" "no matching pattern" #f)
> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
> https://jami.net/
> [Video support enabled]
> [Plugins support enabled]

Yes, I noticed that, but I’m not sure how to apply a similar workaround.

Toggle quote (4 lines)
> I think we don't have the right approach here: we should check that the
> system tests are passing before pushing series and not adapt the tests
> afterwards.

Yes, apologies for that.

Toggle quote (12 lines)
> Historically this was difficult because the system tests were often in a
> semi-broken state. Before the Shepherd update the tests were however all
> passing (modulo rare intermittent failures).
>
> As it's not always obvious what's going to break the system tests and
> what's not (simple package update can easily break them), it would be
> really nice to have mandatory commit verification.
>
> The mumi/cuirass gateway that has already been discussed could really
> help us here. If some people are motivated, we could split the work and
> introduce such a mechanism.

Yes, I agree; an “always green” ‘master’ branch would be great.

Do you have milestones in mind for “commit verification”?

As I see it, the difficulty is that we’ve been looking at a horizon of
features à la GitLab-CI without being quite sure how to get there (apart
from deploying GitLab or a similar tool, that is).

A first step that comes to mind would be an easier way to set up
transient jobsets for a branch (or, ideally, for an issue: the thing
would apply patches and create the branch).

Thoughts?

(Maybe worth moving to guix-devel.)

Ludo’.
M
M
Maxim Cournoyer wrote on 25 May 2022 05:43
(name . Ludovic Courtès)(address . ludo@gnu.org)
87a6b646qs.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (21 lines)
> Hi,
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> Thanks for the fix! The jami and jami-provisioning tests are also broken
>> because of what looks like to be the same issue:
>>
>> One does not simply initialize the client: Another daemon is detected
>> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
>> 1. &action-exception-error:
>> service: jami
>> action: start
>> key: match-error
>> args: ("match" "no matching pattern" #f)
>> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
>> https://jami.net/
>> [Video support enabled]
>> [Plugins support enabled]
>
> Yes, I noticed that, but I’m not sure how to apply a similar workaround.

I tried fixing that today, but so far I've only managed to understand
what seems to be going wrong, with this (not so great) workflow:

1. Add pk uses in the code.

2. $(./pre-inst-env guix system vm --no-graphic -e '(@@ (gnu tests
telephony) %jami-os)' --no-offload --no-substitutes) -m 512 -nic
user,model=virtio-net-pci,hostfwd=tcp::10022-:22

3. ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p
10022 root@localhost

and poke around with 'herd status', read /var/log/messages, experiment
with dbus-send, etc.

This allowed me to find out that (dbus-available-services) appears to be
broken. I'm not sure why the exceptions are reported so badly by
Shepherd (are exceptions raised with 'error' not handled by Shepherd or
something? -- the with-retries loop should end up printing the caught
exception arguments -- I would also have expected to see the backtrace
somewhere.

Anyway, connecting to another machine that is running the
jami-service-type still (hasn't been reconfigured in a while), I could
see:

Toggle snippet (8 lines)
scheme@(guix-user)> ,use (gnu build jami-service)
scheme@(guix-user)> (dbus-available-services)
;;; Failed to autoload fork+exec-command in (shepherd service):
;;; no code for module (fibers)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
error: fork+exec-command: unbound variable

Oh yes, so it now requires guile-fibers. After installing it:

Toggle snippet (6 lines)
scheme@(guix-user)> ,use (gnu build jami-service)
scheme@(guix-user)> (dbus-available-services)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
No scheduler current; call within run-fibers instead

So the users of fork+exec-command (a public API) needs to be adjusted.
I suspect that's the crux of the issue here. The rest (the jami tests
using Shepherd's start-service to check the service status and causing
multiple starts) should be easy to workaround.

To be continued...

Maxim
L
L
Ludovic Courtès wrote on 28 May 2022 23:29
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
878rql9wh9.fsf@gnu.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (26 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Mathieu Othacehe <othacehe@gnu.org> skribis:
>>
>>> Thanks for the fix! The jami and jami-provisioning tests are also broken
>>> because of what looks like to be the same issue:
>>>
>>> One does not simply initialize the client: Another daemon is detected
>>> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: ERROR:
>>> 1. &action-exception-error:
>>> service: jami
>>> action: start
>>> key: match-error
>>> args: ("match" "no matching pattern" #f)
>>> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
>>> https://jami.net/
>>> [Video support enabled]
>>> [Plugins support enabled]
>>
>> Yes, I noticed that, but I’m not sure how to apply a similar workaround.
>
> I tried fixing that today, but so far I've only managed to understand
> what seems to be going wrong, with this (not so great) workflow:

While working on https://issues.guix.gnu.org/55444, I figured
‘wait-for-service’ could be useful for system tests that were previously
using ‘start-service’ as a way to wait for a service to be up and
running.

I tried the following change, which should be semantically equivalent to
what was happening with the Shepherd 0.8. However, it doesn’t seem to
work, for reasons that escape me.

Thoughts?

Ludo’.
Toggle diff (48 lines)
diff --git a/gnu/tests/telephony.scm b/gnu/tests/telephony.scm
index bc464a431a..c219868859 100644
--- a/gnu/tests/telephony.scm
+++ b/gnu/tests/telephony.scm
@@ -145,11 +145,7 @@ (define marionette
(marionette-eval
'(begin
(use-modules (gnu services herd))
- (match (start-service 'jami)
- (#f #f)
- (('service response-parts ...)
- (match (assq-ref response-parts 'running)
- ((pid) (number? pid))))))
+ (wait-for-service 'jami #:timeout 60))
marionette))
(test-assert "service can be stopped"
@@ -158,12 +154,7 @@ (define marionette
(use-modules (gnu services herd)
(rnrs base))
(setenv "PATH" "/run/current-system/profile/bin")
- (let ((pid (match (start-service 'jami)
- (#f #f)
- (('service response-parts ...)
- (match (assq-ref response-parts 'running)
- ((pid) pid))))))
-
+ (let ((pid (wait-for-service 'jami)))
(assert (number? pid))
(match (stop-service 'jami)
@@ -193,14 +184,10 @@ (define pid (match (start-service 'jami)
;; Restart the service.
(restart-service 'jami)
- (define new-pid (match (start-service 'jami)
- (#f #f)
- (('service response-parts ...)
- (match (assq-ref response-parts 'running)
- ((pid) pid)))))
+ (define new-pid (wait-for-service 'jami))
(assert (number? new-pid))
- (not (eq? pid new-pid)))
+ (not (= pid new-pid)))
marionette))
(unless #$provisioning? (test-skip 1))
M
M
Maxim Cournoyer wrote on 31 May 2022 18:44
[PATCH] services: jami: Modernize to adjust to Shepherd 0.9+ changes.
(address . 54786@debbugs.gnu.org)
20220531164407.13914-1-maxim.cournoyer@gmail.com
This partially fixes https://issues.guix.gnu.org/54786, allowing the 'jami'
and 'jami-provisioning' system tests to pass again.

In version 0.9.0, Shepherd constructors are now run concurrently, via
cooperative scheduling (Guile Fibers). The Jami service previously relied on
blocking sleeps while polling for D-Bus services to become ready after forking
a process; this wouldn't work anymore since while blocking the service process
wouldn't be given the chance to finish starting. The new reliance on Fibers
in Shepherd's fork+exec-command in the helper 'send-dbus' procedure also meant
that it wouldn't work outside of Shepherd anymore. Finally, the
'start-service' Shepherd procedure used in the test suite would cause the Jami
daemon to be spawned multiple times (a bug introduced in Shepherd 0.9.0).

To fix/simplify these problems, this change does the following:

1. Use the Guile AC/D-Bus library for D-Bus communication, which simplify
things, such as avoiding the need to fork 'dbus-send' processes.

2. The non-blocking 'sleep' version of Fiber is used for the 'with-retries'
waiting syntax.

3. A 'dbus' package variant is used to adjust the session bus configuration,
tailoring it for the use case at hand.

4. Avoid start-service in the tests, preferring 'jami-service-available?' for
now.

* gnu/build/jami-service.scm (parse-dbus-reply, strip-quotes)
(deserialize-item, serialize-boolean, dbus-dict->alist)
(dbus-array->list, parse-account-ids, parse-account-details)
(parse-contacts): Delete procedures.
(%send-dbus-binary, %send-dbus-bus, %send-dbus-user, %send-dbus-group)
(%send-dbus-debug): Delete parameters.
(jami-service-running?): New procedure.
(send-dbus/configuration-manager): Rename to...
(call-configuration-manager-method): ... this. Turn METHOD into a positional
argument. Turn ARGUMENTS into an optional argument. Invoke
`call-dbus-method' instead of `send-dbus', adjusting callers accordingly.
(get-account-ids, id->account-details, id->account-details)
(id->volatile-account-details, username->id, add-account remove-account)
(username->contacts, remove-contact, add-contact, set-account-details)
(set-all-moderators, username->all-moderators?, username->moderators)
(set-moderator): Adjust accordingly.
(with-retries, send-dbus, dbus-available-services)
(dbus-service-available?): Move to ...
* gnu/build/dbus-service.scm: ... this new module.
(send-dbus): Rewrite to use the Guile AC/D-Bus library.
(%dbus-query-timeout, sleep*): New variables.
(%current-dbus-connection): New parameter.
(initialize-dbus-connection!, argument->signature-type)
(call-dbus-method): New procedures.
(dbus-available-services): Adjust accordingly.
* gnu/local.mk (GNU_SYSTEM_MODULES): Register new module.
* gnu/packages/glib.scm (dbus-for-jami): New variable.
* gnu/services/telephony.scm: (jami-configuration)[dbus]: Default to
dbus-for-jami.
(jami-dbus-session-activation): Write a D-Bus daemon configuration file at
'/var/run/jami/session-local.conf'.
(jami-shepherd-services): Add the closure of guile-ac-d-bus and guile-fibers
as extensions. Adjust imported modules. Remove no longer used parameters.
<jami-dbus-session>: Use a PID file, avoiding the need for the manual
synchronization.
<jami>: Set DBUS_SESSION_BUS_ADDRESS environment variable. Poll using
'jami-service-available?' instead of 'dbus-service-available?'.
* gnu/tests/telephony.scm (run-jami-test): Add needed Guile extensions. Set
DBUS_SESSION_BUS_ADDRESS environment variable. Adjust all tests to use
'jami-service-available?' to determine if the service is started rather than
the now problematic Shepherd's 'start-service'.
---
gnu/build/dbus-service.scm | 212 ++++++++++++++++
gnu/build/jami-service.scm | 390 +++++------------------------
gnu/local.mk | 1 +
gnu/packages/glib.scm | 19 +-
gnu/services/telephony.scm | 500 +++++++++++++++++--------------------
gnu/tests/telephony.scm | 412 +++++++++++++++---------------
6 files changed, 726 insertions(+), 808 deletions(-)
create mode 100644 gnu/build/dbus-service.scm

Toggle diff (414 lines)
diff --git a/gnu/build/dbus-service.scm b/gnu/build/dbus-service.scm
new file mode 100644
index 0000000000..d3d8c9f716
--- /dev/null
+++ b/gnu/build/dbus-service.scm
@@ -0,0 +1,212 @@
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;;
+;;; This module contains procedures to interact with D-Bus via the 'dbus-send'
+;;; command line utility. Before using any public procedure
+;;;
+;;; Code:
+
+(define-module (gnu build dbus-service)
+ #:use-module (ice-9 match)
+ #:use-module (srfi srfi-1)
+ #:use-module (srfi srfi-19)
+ #:use-module (srfi srfi-26)
+ #:autoload (d-bus protocol connections) (d-bus-conn?
+ d-bus-conn-flush
+ d-bus-connect
+ d-bus-disconnect
+ d-bus-session-bus-address
+ d-bus-system-bus-address)
+ #:autoload (d-bus protocol messages) (MESSAGE_TYPE_METHOD_CALL
+ d-bus-headers-ref
+ d-bus-message-body
+ d-bus-message-headers
+ d-bus-read-message
+ d-bus-write-message
+ header-PATH
+ header-DESTINATION
+ header-INTERFACE
+ header-MEMBER
+ header-SIGNATURE
+ make-d-bus-message)
+ #:export (%dbus-query-timeout
+
+ initialize-dbus-connection!
+ %current-dbus-connection
+ send-dbus
+ call-dbus-method
+
+ dbus-available-services
+ dbus-service-available?
+
+ sleep*
+ with-retries))
+
+(define %dbus-query-timeout 2) ;in seconds
+
+;;; Use Fibers' sleep to enable cooperative scheduling in Shepherd >= 0.9.0,
+;;; which is required at least for the Jami service.
+(define sleep* (if (resolve-module '(fibers) #f)
+ (module-ref (resolve-interface '(fibers)) 'sleep)
+ (begin
+ (format #f "fibers not available -- blocking 'sleep' in use")
+ sleep)))
+
+;;;
+;;; Utilities.
+;;;
+
+(define-syntax-rule (with-retries n delay body ...)
+ "Retry the code in BODY up to N times until it doesn't raise an exception nor
+return #f, else raise an error. A delay of DELAY seconds is inserted before
+each retry."
+ (let loop ((attempts 0))
+ (catch #t
+ (lambda ()
+ (let ((result (begin body ...)))
+ (if (not result)
+ (error "failed attempt" attempts)
+ result)))
+ (lambda args
+ (if (< attempts n)
+ (begin
+ (sleep* delay) ;else wait and retry
+ (loop (+ 1 attempts)))
+ (error "maximum number of retry attempts reached"
+ body ... args))))))
+
+
+;;;
+;;; Low level wrappers above AC/D-Bus.
+;;;
+
+;; The active D-Bus connection (a parameter) used by the other procedures.
+(define %current-dbus-connection (make-parameter #f))
+
+(define* (initialize-dbus-connection!
+ #:key (address (or (d-bus-session-bus-address)
+ (d-bus-system-bus-address))))
+ "Initialize the D-Bus connection. ADDRESS should be the address of the D-Bus
+session, e.g. \"unix:path=/var/run/dbus/system_bus_socket\", the default value
+if ADDRESS is not provided and DBUS_SESSION_BUS_ADDRESS is not set. Return
+the initialized D-Bus connection."
+ ;; Clear current correction if already active.
+ (when (d-bus-conn? (%current-dbus-connection))
+ (d-bus-disconnect (%current-dbus-connection)))
+
+ (let ((connection (d-bus-connect address)))
+ (%current-dbus-connection connection) ;update connection parameter
+ (call-dbus-method "Hello")) ;initial handshake
+
+ (%current-dbus-connection))
+
+(define* (send-dbus message #:key
+ (connection (%current-dbus-connection))
+ timeout)
+ "Send a D-Bus MESSAGE to CONNECTION and return the body of its reply. Up to
+READ-RETRIES replies are read until a matching reply is found, else an error
+is raised. MESSAGE is to be constructed with `make-d-bus-message'. When the
+body contains a single element, it is returned directly, else the body
+elements are returned as a list. TIMEOUT is a timeout value in seconds."
+ (let ((serial (d-bus-write-message connection message))
+ (start-time (current-time time-monotonic))
+ (timeout* (or timeout %dbus-query-timeout)))
+ (d-bus-conn-flush connection)
+ (let retry ()
+ (when (> (time-second (time-difference (current-time time-monotonic)
+ start-time))
+ timeout*)
+ (error 'dbus "fail to get reply in timeout" timeout*))
+ (let* ((reply (d-bus-read-message connection))
+ (reply-headers (d-bus-message-headers reply))
+ (reply-serial (d-bus-headers-ref reply-headers 'REPLY_SERIAL))
+ (error-name (d-bus-headers-ref reply-headers 'ERROR_NAME))
+ (body (d-bus-message-body reply)))
+ ;; Validate the reply matches the message.
+ (when error-name
+ (error 'dbus "method failed with error" error-name body))
+ ;; Some replies do not include a serial header, such as the for the
+ ;; org.freedesktop.DBus NameAcquired one.
+ (if (and reply-serial (= serial reply-serial))
+ (match body
+ ((x x* ..1) ;contains 2 ore more elements
+ body)
+ ((x)
+ x) ;single element; return it directly
+ (#f #f))
+ (retry))))))
+
+(define (argument->signature-type argument)
+ "Infer the D-Bus signature type from ARGUMENT."
+ ;; XXX: avoid ..1 when using vectors due to a bug (?) in (ice-9 match).
+ (match argument
+ ((? boolean?) "b")
+ ((? string?) "s")
+ (#((? string?) (? string?) ...) "as")
+ (#(((? string?) . (? string?))
+ ((? string?) . (? string?)) ...) "a{ss}")
+ (_ (error 'dbus "no rule to infer type from argument" argument))))
+
+(define* (call-dbus-method method
+ #:key
+ (path "/org/freedesktop/DBus")
+ (destination "org.freedesktop.DBus")
+ (interface "org.freedesktop.DBus")
+ (connection (%current-dbus-connection))
+ arguments
+ timeout)
+ "Call the D-Bus method specified by METHOD, PATH, DESTINATION and INTERFACE.
+The currently active D-Bus CONNECTION is used unless explicitly provided.
+Method arguments may be provided via ARGUMENTS sent as the message body.
+TIMEOUT limit the maximum time to allow for the reply. Return the body of the
+reply."
+ (let ((message (make-d-bus-message
+ MESSAGE_TYPE_METHOD_CALL 0 #f '()
+ `#(,(header-PATH path)
+ ,(header-DESTINATION destination)
+ ,(header-INTERFACE interface)
+ ,(header-MEMBER method)
+ ,@(if arguments
+ (list (header-SIGNATURE
+ (string-join
+ (map argument->signature-type arguments)
+ "")))
+ '()))
+ arguments)))
+ (send-dbus message #:connection connection #:timeout timeout)))
+
+
+;;;
+;;; Higher-level, D-Bus procedures.
+;;;
+
+(define (dbus-available-services)
+ "Return the list of available (acquired) D-Bus services."
+ (let ((names (vector->list (call-dbus-method "ListNames"))))
+ ;; Remove entries such as ":1.7".
+ (remove (cut string-prefix? ":" <>) names)))
+
+(define (dbus-service-available? service)
+ "Predicate to check for the D-Bus SERVICE availability."
+ (member service (dbus-available-services)))
+
+;; Local Variables:
+;; eval: (put 'with-retries 'scheme-indent-function 2)
+;; End:
diff --git a/gnu/build/jami-service.scm b/gnu/build/jami-service.scm
index ddfc8cf937..0ceb03eb02 100644
--- a/gnu/build/jami-service.scm
+++ b/gnu/build/jami-service.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -24,16 +24,16 @@
;;; Code:
(define-module (gnu build jami-service)
+ #:use-module (gnu build dbus-service)
#:use-module (ice-9 format)
#:use-module (ice-9 match)
- #:use-module (ice-9 peg)
#:use-module (ice-9 rdelim)
#:use-module (ice-9 regex)
- #:use-module (rnrs io ports)
- #:autoload (shepherd service) (fork+exec-command)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-26)
- #:export (account-fingerprint?
+ #:export (jami-service-available?
+
+ account-fingerprint?
account-details->recutil
get-accounts
get-usernames
@@ -51,43 +51,12 @@ (define-module (gnu build jami-service)
set-all-moderators
set-moderator
username->all-moderators?
- username->moderators
-
- dbus-available-services
- dbus-service-available?
-
- %send-dbus-binary
- %send-dbus-bus
- %send-dbus-user
- %send-dbus-group
- %send-dbus-debug
- send-dbus
-
- with-retries))
+ username->moderators))
;;;
;;; Utilities.
;;;
-(define-syntax-rule (with-retries n delay body ...)
- "Retry the code in BODY up to N times until it doesn't raise an exception
-nor return #f, else raise an error. A delay of DELAY seconds is inserted
-before each retry."
- (let loop ((attempts 0))
- (catch #t
- (lambda ()
- (let ((result (begin body ...)))
- (if (not result)
- (error "failed attempt" attempts)
- result)))
- (lambda args
- (if (< attempts n)
- (begin
- (sleep delay) ;else wait and retry
- (loop (+ 1 attempts)))
- (error "maximum number of retry attempts reached"
- body ... args))))))
-
(define (alist->list alist)
"Flatten ALIST into a list."
(append-map (match-lambda
@@ -104,212 +73,34 @@ (define (account-fingerprint? val)
(and (string? val)
(regexp-exec account-fingerprint-rx val)))
-
-;;;
-;;; D-Bus reply parser.
-;;;
-
-(define (parse-dbus-reply reply)
- "Return the parse tree of REPLY, a string returned by the 'dbus-send'
-command."
- ;; Refer to 'man 1 dbus-send' for the grammar reference. Note that the
- ;; format of the replies doesn't match the format of the input, which is the
- ;; one documented, but it gives an idea. For an even better reference, see
- ;; the `print_iter' procedure of the 'dbus-print-message.c' file from the
- ;; 'dbus' package sources.
- (define-peg-string-patterns
- "contents <- header (item / container (item / container*)?)
- item <-- WS type WS value NL
- container <- array / dict / variant
- array <-- array-start (item / container)* array-end
- dict <-- array-start dict-entry* array-end
- dict-entry <-- dict-entry-start item item dict-entry-end
- variant <-- variant-start item
- type <-- 'string' / 'int16' / 'uint16' / 'int32' / 'uint32' / 'int64' /
- 'uint64' / 'double' / 'byte' / 'boolean' / 'objpath'
- value <-- (!NL .)* NL
- header < (!NL .)* NL
- variant-start < WS 'variant'
- array-start < WS 'array [' NL
- array-end < WS ']' NL
- dict-entry-start < WS 'dict entry(' NL
- dict-entry-end < WS ')' NL
- DQ < '\"'
- WS < ' '*
- NL < '\n'*")
-
- (peg:tree (match-pattern contents reply)))
-
-(define (strip-quotes text)
- "Strip the leading and trailing double quotes (\") characters from TEXT."
- (let* ((text* (if (string-prefix? "\"" text)
- (string-drop text 1)
- text))
- (text** (if (string-suffix? "\"" text*)
- (string-drop-right text* 1)
- text*)))
- text**))
-
-(define (deserialize-item item)
- "Return the value described by the ITEM parse tree as a Guile object."
- ;; Strings are printed wrapped in double quotes (see the print_iter
- ;; procedure in dbus-print-message.c).
- (match item
- (('item ('type "string") ('value value))
- (strip-quotes value))
- (('item ('type "boolean") ('value value))
- (if (string=? "true" value)
- #t
- #f))
- (('item _ ('value value))
- value)))
-
-(define (serialize-boolean bool)
- "Return the serialized format expected by dbus-send for BOOL."
- (format #f "boolean:~:[false~;true~]" bool))
-
-(define (dict->alist dict-parse-tree)
- "Translate a dict parse tree to an alist."
- (define (tuples->alist tuples)
- (map (lambda (x) (apply cons x)) tuples))
-
- (match dict-parse-tree
- ('dict
- '())
- (('dict ('dict-entry keys values) ...)
- (let ((keys* (map deserialize-item keys))
- (values* (map deserialize-item values)))
- (tuples->alist (zip keys* values*))))))
-
-(define (array->list array-parse-tree)
- "Translate an array parse tree to a list."
- (match array-parse-tree
- ('array
- '())
- (('array items ...)
- (map deserialize-item items))))
-
-
-;;;
-;;; Low-level, D-Bus-related procedures.
-;;;
+(define (validate-fingerprint fingerprint)
+ "Validate that fingerprint is 40 characters long."
+ (unless (account-fingerprint? fingerprint)
+ (error "Account fingerprint is not valid:" fingerprint)))
-;;; The following parameters are used in the jami-service-type service
-;;; definition to conveniently customize the behavior of the send-dbus helper,
-;;; even when called indirectly.
-(define %send-dbus-binary (make-parameter "dbus-send"))
-(define %send-dbus-bus (make-parameter #f))
-(define %send-dbus-user (make-parameter #f))
-(define %send-dbus-group (make-parameter #f))
-(define %send-dbus-debug (make-parameter #f))
-
-(define* (send-dbus #:key service path interface method
- bus
- dbus-send
- user group
- timeout
- arguments)
- "Return the response of DBUS-SEND, else raise an error. Unless explicitly
-provided, DBUS-SEND takes the value of the %SEND-DBUS-BINARY parameter. BUS
-can be used to specify the bus address, such as 'unix:path=/var/run/jami/bus'.
-Alternatively, the %SEND-DBUS-BUS parameter can be used. ARGUMENTS can be
-used to pass input values to a D-Bus method call. TIMEOUT is the amount of
-time to
This message was truncated. Download the full message here.
L
L
Ludovic Courtès wrote on 1 Jun 2022 11:54
Re: bug#54786: Installation tests are failing
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87o7zcwvy6.fsf_-_@gnu.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (9 lines)
> gnu/build/dbus-service.scm | 212 ++++++++++++++++
> gnu/build/jami-service.scm | 390 +++++------------------------
> gnu/local.mk | 1 +
> gnu/packages/glib.scm | 19 +-
> gnu/services/telephony.scm | 500 +++++++++++++++++--------------------
> gnu/tests/telephony.scm | 412 +++++++++++++++---------------
> 6 files changed, 726 insertions(+), 808 deletions(-)
> create mode 100644 gnu/build/dbus-service.scm

Before going further, I’d like to understand: this does more than just
fix the Jami system tests, right?

It would have been nice to have surgical changes to “just” fix the
tests, along the lines of https://issues.guix.gnu.org/54786#9,
possibly followed by a rework of the whole machinery, if that’s
possible.

Besides, I think we should talk to Jami upstream (which shouldn’t be too
hard :-)). It doesn’t seem reasonable to me to have 800+ lines of code
in the distro to start one service. Usually the ‘start’ and ‘stop’
methods are between 2 and 10 lines of code.

What do you think is missing upstream so that starting Jami is simpler?

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 1 Jun 2022 15:10
(name . Ludovic Courtès)(address . ludo@gnu.org)
878rqgr0l4.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (21 lines)
> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> gnu/build/dbus-service.scm | 212 ++++++++++++++++
>> gnu/build/jami-service.scm | 390 +++++------------------------
>> gnu/local.mk | 1 +
>> gnu/packages/glib.scm | 19 +-
>> gnu/services/telephony.scm | 500 +++++++++++++++++--------------------
>> gnu/tests/telephony.scm | 412 +++++++++++++++---------------
>> 6 files changed, 726 insertions(+), 808 deletions(-)
>> create mode 100644 gnu/build/dbus-service.scm
>
> Before going further, I’d like to understand: this does more than just
> fix the Jami system tests, right?
>
> It would have been nice to have surgical changes to “just” fix the
> tests, along the lines of <https://issues.guix.gnu.org/54786#9>,
> possibly followed by a rework of the whole machinery, if that’s
> possible.

It's not really possible unfortunately, because the rework from talking
to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
needed or at least simplified fixing the issues. Going back trying to
make it work the way it was would be new work that'd end up being
scrapped anyway with a subsequent commit making use of the Guile D-Bus
library, so I'm not interested in pursuing it.

Toggle quote (8 lines)
> Besides, I think we should talk to Jami upstream (which shouldn’t be too
> hard :-)). It doesn’t seem reasonable to me to have 800+ lines of code
> in the distro to start one service. Usually the ‘start’ and ‘stop’
> methods are between 2 and 10 lines of code.
>
> What do you think is missing upstream so that starting Jami is
> simpler?

1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
The upstream systemd service definition for the Jami daemon (jamid) is
this:

Toggle snippet (6 lines)
# net.jami.daemon.service
[D-BUS Service]
Name=cx.ring.Ring
Exec=@LIBDIR@/jamid

But that's nearly not where the complexity of our jami-service-type
lies. Rather, it's in the following:

2) The lack of a way to declaratively configure Jami and the need to use
D-Bus API to issue commands to Jami non-interactively. For example, to
have Jami import an account it's necessary to go via either

a) the GUI or
b) the D-Bus API

The Jami service in Guix makes use of b), which introduces the need for
some Scheme bindings wrapping the low-level D-Bus interface. Perhaps
such bindings could live in Jami itself.

The second point (2) could be addressed upstream, but since it's a
rather niche use case (the common use case is simply running the client
GUI), is already achievable via D-Bus, and would probably require a
considerable amount of work in Jami itself, I think we can keep it as is
for now, as a Guix System exclusive feature ;-). Note that even if Jami
could be configured via configuration files, we'd still want to be able
to communicate with it via D-Bus to maintain the possible actions
currently available in our Shepherd service (listing/enabling/disable
accounts, etc.), so it'd only really help to reduce the start slot, and
that's it. We'd still need most of the D-Bus bindings, so it wouldn't
help that much anyway.

I hope that clarifies how our jami-service-type is both complex but also
unique.

Happy video-conferencing!

Maxim
L
L
Ludovic Courtès wrote on 2 Jun 2022 15:13
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
8735gnqkcp.fsf@gnu.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (12 lines)
>> Before going further, I’d like to understand: this does more than just
>> fix the Jami system tests, right?
>>
>> It would have been nice to have surgical changes to “just” fix the
>> tests, along the lines of <https://issues.guix.gnu.org/54786#9>,
>> possibly followed by a rework of the whole machinery, if that’s
>> possible.
>
> It's not really possible unfortunately, because the rework from talking
> to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
> needed or at least simplified fixing the issues.

So am I right that the “issues” were not specifically related to the
Shepherd 0.9.0 switch, or at least not just to that? (Just to make sure
I understand the context.)

Toggle quote (20 lines)
>> Besides, I think we should talk to Jami upstream (which shouldn’t be too
>> hard :-)). It doesn’t seem reasonable to me to have 800+ lines of code
>> in the distro to start one service. Usually the ‘start’ and ‘stop’
>> methods are between 2 and 10 lines of code.
>>
>> What do you think is missing upstream so that starting Jami is
>> simpler?
>
> 1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
> The upstream systemd service definition for the Jami daemon (jamid) is
> this:
>
> # net.jami.daemon.service
> [D-BUS Service]
> Name=cx.ring.Ring
> Exec=@LIBDIR@/jamid
>
> But that's nearly not where the complexity of our jami-service-type
> lies.

But that should be fine: we have dozens of D-Bus services that happily
get started by dbus-daemon.

Toggle quote (25 lines)
> Rather, it's in the following:
>
> 2) The lack of a way to declaratively configure Jami and the need to use
> D-Bus API to issue commands to Jami non-interactively. For example, to
> have Jami import an account it's necessary to go via either
>
> a) the GUI or
> b) the D-Bus API
>
> The Jami service in Guix makes use of b), which introduces the need for
> some Scheme bindings wrapping the low-level D-Bus interface. Perhaps
> such bindings could live in Jami itself.
>
> The second point (2) could be addressed upstream, but since it's a
> rather niche use case (the common use case is simply running the client
> GUI), is already achievable via D-Bus, and would probably require a
> considerable amount of work in Jami itself, I think we can keep it as is
> for now, as a Guix System exclusive feature ;-). Note that even if Jami
> could be configured via configuration files, we'd still want to be able
> to communicate with it via D-Bus to maintain the possible actions
> currently available in our Shepherd service (listing/enabling/disable
> accounts, etc.), so it'd only really help to reduce the start slot, and
> that's it. We'd still need most of the D-Bus bindings, so it wouldn't
> help that much anyway.

Ah I see.

Toggle quote (3 lines)
> I hope that clarifies how our jami-service-type is both complex but also
> unique.

Sure, the ability to configure Jami in a declarative and stateless
fashion is a plus, that’s really cool.

Longer-term I think this should go in Jami proper though. It’s great
that Guix has an edge over the competition :-), but having to maintain
it is less nice.

Also, in more concrete terms: one goal of the least-authority work at
‘make-forkexec-constructor/container’ and the whole (gnu build shepherd)
module. Jami is one of its last remaining users (adjusting it felt like
beyond my abilities, precisely because it’s much more complex than the
other services I adjusted).

Could you take a look at that eventually, once this patch has been
reviewed?

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 2 Jun 2022 19:24
(name . Ludovic Courtès)(address . ludo@gnu.org)
877d5zx9jt.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (24 lines)
> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> Before going further, I’d like to understand: this does more than just
>>> fix the Jami system tests, right?
>>>
>>> It would have been nice to have surgical changes to “just” fix the
>>> tests, along the lines of <https://issues.guix.gnu.org/54786#9>,
>>> possibly followed by a rework of the whole machinery, if that’s
>>> possible.
>>
>> It's not really possible unfortunately, because the rework from talking
>> to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
>> needed or at least simplified fixing the issues.
>
> So am I right that the “issues” were not specifically related to the
> Shepherd 0.9.0 switch, or at least not just to that? (Just to make sure
> I understand the context.)

I tried capturing the issue in the commit message, but I'll provide
another more hands-on view: the Jami service was broken due to changes
in Shepherd 0.9.0 that caused the blocking sleeps + concurrent
make+forkexec-constructor/container and fork+exec-command combination
used to not work anymore.

This problem can be manually observed by spawning a VM with the Jami
service:

$(guix system vm --no-graphic -e '(@@ (gnu tests telephony) %jami-os)') -m 512

Then you'll see the service doesn't even start:

Toggle snippet (23 lines)
root@jami ~# herd status
[...]
Stopped:
- jami
[...]

root@jami ~# pgrep jamid
192

root@jami ~# killall jamid

root@jami ~# herd start jami
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

12:53:47.144 os_core_unix.c !pjlib 2.11 for POSIX initialized

herd: exception caught while executing 'start' on service 'jami':
Throw to key `match-error' with args `("match" "no matching pattern" #f)'.

I've ran this: herd start jami& strace -p1 -f -s800 -o strace.out

Attached is the last 10% of the gzip'd file. I couldn't explain this
failure very clearly, but when I tried investigating it was failing on
the '(dbus-service-available? "cx.ring.Ring")' call, if I recall
correctly.
Attachment: strace.out.gz
[...]

Toggle quote (18 lines)
>>> What do you think is missing upstream so that starting Jami is
>>> simpler?
>>
>> 1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
>> The upstream systemd service definition for the Jami daemon (jamid) is
>> this:
>>
>> # net.jami.daemon.service
>> [D-BUS Service]
>> Name=cx.ring.Ring
>> Exec=@LIBDIR@/jamid
>>
>> But that's nearly not where the complexity of our jami-service-type
>> lies.
>
> But that should be fine: we have dozens of D-Bus services that happily
> get started by dbus-daemon.

I guess that works (minus races like we've had with elogind) if the
other services are also D-Bus services sharing the same bus. But here
the code talking with Jami are in the Shepherd service actions and more
critically in the start slot itself -- so it's important the D-Bus
service has been acquired and ready to service D-Bus calls otherwise
they'd fail (that's what the loop polling for (jami-service-available?)
ensures).

Toggle quote (37 lines)
>> Rather, it's in the following:
>>
>> 2) The lack of a way to declaratively configure Jami and the need to use
>> D-Bus API to issue commands to Jami non-interactively. For example, to
>> have Jami import an account it's necessary to go via either
>>
>> a) the GUI or
>> b) the D-Bus API
>>
>> The Jami service in Guix makes use of b), which introduces the need for
>> some Scheme bindings wrapping the low-level D-Bus interface. Perhaps
>> such bindings could live in Jami itself.
>>
>> The second point (2) could be addressed upstream, but since it's a
>> rather niche use case (the common use case is simply running the client
>> GUI), is already achievable via D-Bus, and would probably require a
>> considerable amount of work in Jami itself, I think we can keep it as is
>> for now, as a Guix System exclusive feature ;-). Note that even if Jami
>> could be configured via configuration files, we'd still want to be able
>> to communicate with it via D-Bus to maintain the possible actions
>> currently available in our Shepherd service (listing/enabling/disable
>> accounts, etc.), so it'd only really help to reduce the start slot, and
>> that's it. We'd still need most of the D-Bus bindings, so it wouldn't
>> help that much anyway.
>
> Ah I see.
>
>> I hope that clarifies how our jami-service-type is both complex but also
>> unique.
>
> Sure, the ability to configure Jami in a declarative and stateless
> fashion is a plus, that’s really cool.
>
> Longer-term I think this should go in Jami proper though. It’s great
> that Guix has an edge over the competition :-), but having to maintain
> it is less nice.

Perhaps with the Scheme bindings introduced by Olivier for the Jami
tests (that work via an embedded libguile), it could be possible to add
the ability to pass an init script to 'jamid' at launch time, which
would automate importing the account. Proper 'Scheme' bindings would be
nice though, and I'd like to look into the feasibility to add these via
Swig. Food for thought.

Toggle quote (10 lines)
> Also, in more concrete terms: one goal of the least-authority work at
> <https://issues.guix.gnu.org/54997> is to remove
> ‘make-forkexec-constructor/container’ and the whole (gnu build shepherd)
> module. Jami is one of its last remaining users (adjusting it felt like
> beyond my abilities, precisely because it’s much more complex than the
> other services I adjusted).
>
> Could you take a look at that eventually, once this patch has been
> reviewed?

I reviewed how that works, and it'd be easy; I just didn't see the
incentive yet (there's no composition needed for the service, and it'd
make the definition slightly less readable). If you tell me
mark+forkexec-constructor/container is going the way of the Dodo though,
that's a good enough incentive :-).

Thanks for having a look!

Maxim
L
L
Ludovic Courtès wrote on 2 Jun 2022 22:43
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87v8tilrsh.fsf@gnu.org
Howdy!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (6 lines)
> I tried capturing the issue in the commit message, but I'll provide
> another more hands-on view: the Jami service was broken due to changes
> in Shepherd 0.9.0 that caused the blocking sleeps + concurrent
> make+forkexec-constructor/container and fork+exec-command combination
> used to not work anymore.

OK. Thanks for sharing the strace log; at first sight I don’t see any
clear clue, but hey, maybe it’s fine to leave that as a mystery since
there’s another solution.

[...]

Toggle quote (11 lines)
>> Longer-term I think this should go in Jami proper though. It’s great
>> that Guix has an edge over the competition :-), but having to maintain
>> it is less nice.
>
> Perhaps with the Scheme bindings introduced by Olivier for the Jami
> tests (that work via an embedded libguile), it could be possible to add
> the ability to pass an init script to 'jamid' at launch time, which
> would automate importing the account. Proper 'Scheme' bindings would be
> nice though, and I'd like to look into the feasibility to add these via
> Swig. Food for thought.

Sounds fun. (BTW, I’d recommend against SWIG: it’s not “pretty”, leaves
a lot of work to do, including wrapping the generated wrappers and
debugging memory management issue. Using the FFI provides more
flexibility and is much more fun IMO.)

Toggle quote (16 lines)
>> Also, in more concrete terms: one goal of the least-authority work at
>> <https://issues.guix.gnu.org/54997> is to remove
>> ‘make-forkexec-constructor/container’ and the whole (gnu build shepherd)
>> module. Jami is one of its last remaining users (adjusting it felt like
>> beyond my abilities, precisely because it’s much more complex than the
>> other services I adjusted).
>>
>> Could you take a look at that eventually, once this patch has been
>> reviewed?
>
> I reviewed how that works, and it'd be easy; I just didn't see the
> incentive yet (there's no composition needed for the service, and it'd
> make the definition slightly less readable). If you tell me
> mark+forkexec-constructor/container is going the way of the Dodo though,
> that's a good enough incentive :-).

Awesome. :-)

Thanks for explaining!

Ludo’.
M
M
Maxim Cournoyer wrote on 4 Jun 2022 06:37
(name . Ludovic Courtès)(address . ludo@gnu.org)
874k11ujqq.fsf@gmail.com
Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (4 lines)
> Howdy!
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

[...]

Toggle quote (6 lines)
>> I reviewed how that works, and it'd be easy; I just didn't see the
>> incentive yet (there's no composition needed for the service, and it'd
>> make the definition slightly less readable). If you tell me
>> mark+forkexec-constructor/container is going the way of the Dodo though,
>> that's a good enough incentive :-).

That turns out to be bit problematic; dbus-daemon must not run in its
own user namespace (CLONE_NEWUSER) as it wants to validate user/group
IDs. That's probably the reason it was working with
'make-forkexec-constructor/container', as this was dropping the user and
net namespaces, contrary to least-authority, which uses them all.

The problem then seems to be that since we need CAP_SYS_ADMIN when
dropping the user namespace, as CLONE_NEWUSER is what gives us
superpowers. Per 'man user_namespaces':

The child process created by clone(2) with the CLONE_NEWUSER flag starts
out with a complete set of capabilities in the new user namespace.

Which means that if we combine something like (untested):

Toggle snippet (8 lines)
(make-forkexec-constructor
(least-authority
(list (file-append coreutils "/bin/true"))
(mappings (delq 'user %namespaces))
#:user "nobody"
#:group "nobody"))

the make-forkexec-constructor will switch to the non-privileged user
before the clone call is made, and it will fail with EPERM.

When using 'make-forkexec-constructor/container', the clone(2) call
happens before switching user, thus as 'root' in Shepherd, which
explains why it works.

I'm not sure how it could be fixed; it seems the user changing business
would need to be handled by the least-authority-wrapper code? And the
make-forkexec-constructor would probably need to detect that command is
a pola wrapper and then avoid changing the user/group itself to not
interfere.

To be continued!

Maxim
L
L
Ludovic Courtès wrote on 7 Jun 2022 16:00
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
877d5sbmjt.fsf@gnu.org
Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (35 lines)
>>> I reviewed how that works, and it'd be easy; I just didn't see the
>>> incentive yet (there's no composition needed for the service, and it'd
>>> make the definition slightly less readable). If you tell me
>>> mark+forkexec-constructor/container is going the way of the Dodo though,
>>> that's a good enough incentive :-).
>
> That turns out to be bit problematic; dbus-daemon must not run in its
> own user namespace (CLONE_NEWUSER) as it wants to validate user/group
> IDs. That's probably the reason it was working with
> 'make-forkexec-constructor/container', as this was dropping the user and
> net namespaces, contrary to least-authority, which uses them all.
>
> The problem then seems to be that since we need CAP_SYS_ADMIN when
> dropping the user namespace, as CLONE_NEWUSER is what gives us
> superpowers. Per 'man user_namespaces':
>
> The child process created by clone(2) with the CLONE_NEWUSER flag starts
> out with a complete set of capabilities in the new user namespace.
>
> Which means that if we combine something like (untested):
>
> (make-forkexec-constructor
> (least-authority
> (list (file-append coreutils "/bin/true"))
> (mappings (delq 'user %namespaces))
> #:user "nobody"
> #:group "nobody"))
>
> the make-forkexec-constructor will switch to the non-privileged user
> before the clone call is made, and it will fail with EPERM.
>
> When using 'make-forkexec-constructor/container', the clone(2) call
> happens before switching user, thus as 'root' in Shepherd, which
> explains why it works.

Damnit, that’s right. For example the result of:

(lower-object (least-authority-wrapper (file-append coreutils "/bin/uname")
#:namespaces (delq 'user %namespaces)))

won’t run as an unprivileged user:

Toggle snippet (25 lines)
$ $(guix build /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv)
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://guix.bordeaux.inria.fr'... 100.0%
The following derivations will be built:
/gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
/gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
building /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv...
successfully built /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
building /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv...
successfully built /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
Backtrace:
5 (primitive-load "/gnu/store/ifsh87aifh2k8pqzhkjxncq3vskpwx3l-pola-wrapper")
In ice-9/eval.scm:
191:35 4 (_ #f)
In gnu/build/linux-container.scm:
300:8 3 (call-with-temporary-directory #<procedure 7f9aa3a674b0 at gnu/build/linux-container.scm:396:3 (root)>)
397:16 2 (_ "/tmp/guix-directory.K9gBNH")
239:7 1 (run-container "/tmp/guix-directory.K9gBNH" (#<<file-system> device: "/gnu/store/jkjs0inmzhj4vsvclbf08nmh0shm7lrf-attr-2.5…> …) …)
In guix/build/syscalls.scm:
1099:12 0 (_ 1845624849)

guix/build/syscalls.scm:1099:12: In procedure clone: 1845624849: Operation not permitted

Toggle quote (6 lines)
> I'm not sure how it could be fixed; it seems the user changing business
> would need to be handled by the least-authority-wrapper code? And the
> make-forkexec-constructor would probably need to detect that command is
> a pola wrapper and then avoid changing the user/group itself to not
> interfere.

I think we would add #:user and #:group to ‘least-authority-wrapper’ and
have it call setuid/setgid. ‘make-forkexec-constructor’ doesn’t need to
be modified, but the user simply won’t pass #:user and #:group to it.

Thanks,
Ludo’.
B
(name . Ludovic Courtès)(address . ludo@gnu.org)
20220608005809.GA2794@LionPure
Attachment: file
M
M
Maxim Cournoyer wrote on 11 Jun 2022 06:18
(name . Ludovic Courtès)(address . ludo@gnu.org)
87h74rg7ef.fsf@gmail.com
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (11 lines)
>> When using 'make-forkexec-constructor/container', the clone(2) call
>> happens before switching user, thus as 'root' in Shepherd, which
>> explains why it works.
>
> Damnit, that’s right. For example the result of:
>
> (lower-object (least-authority-wrapper (file-append coreutils "/bin/uname")
> #:namespaces (delq 'user %namespaces)))
>
> won’t run as an unprivileged user:

[...]

Toggle quote (4 lines)
> I think we would add #:user and #:group to ‘least-authority-wrapper’ and
> have it call setuid/setgid. ‘make-forkexec-constructor’ doesn’t need to
> be modified, but the user simply won’t pass #:user and #:group to it.

OK! I'll adjust the jami-service-type when we get around to implement
the above; for now I've pushed my proposed fix which still uses
'make-forkexec-constructor/container' as
85b4dabd94d53f8179f31a42046cd83fc3a352fc.

Thanks,

Maxim
M
M
Mathieu Othacehe wrote on 9 Aug 2022 16:20
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
875yj15wib.fsf@gnu.org
Closing as all the installation tests are now fixed.

Thanks to everyone involved :)

Mathieu
Closed
?