[PATCH 0/7] Add 'generic-html' updater

  • Done
  • quality assurance status badge
Details
2 participants
  • Léo Le Bouter
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:43
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214326.28052-1-ludo@gnu.org
Hi!

These patches allow ‘guix refresh’ coverage to go from 78% to 88%
as reported by ‘guix refresh --list-updaters’ (both are probably
slightly overestimated) by adding a new ‘generic-html’ updater.

The updater crawls the web page where the package’s source tarball
is stored, using Guile-Lib’s (htmlprag), which we depend on since
commit 02e2e093e858e8a0ca7bd66c1f1f6fd0a1705edb. Among other things,
it handles freedesktop.org packages.

Feedback welcome!

Thanks,
Ludo’.

Ludovic Courtès (7):
gnu-maintenance: Use (htmlprag) for 'latest-html-release'.
gnu-maintenance: 'latest-html-release' considers non-relative URLs.
gnu-maintenance: 'release-file?' rejects checksum files.
gnu-maintenance: 'latest-html-release' can determine signature file
name.
gnu-maintenance: 'latest-html-release' better computes version number.
gnu-maintenance: Add 'generic-html' updater.
gnu: hwloc: Add 'release-monitoring-url' property.

doc/guix.texi | 6 +-
gnu/packages/mpi.scm | 6 ++
guix/gnu-maintenance.scm | 136 ++++++++++++++++++++++++++++-----------
3 files changed, 108 insertions(+), 40 deletions(-)

--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 1/7] gnu-maintenance: Use (htmlprag) for 'latest-html-release'.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-1-ludo@gnu.org
* guix/gnu-maintenance.scm (html->sxml): Remove. Autoload (htmlprag)
instead.
* doc/guix.texi (Requirements): Mention 'guix refresh' for the Guile-Lib
dependency.
---
doc/guix.texi | 3 ++-
guix/gnu-maintenance.scm | 13 +------------
2 files changed, 3 insertions(+), 13 deletions(-)

Toggle diff (47 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 4cf241c56a..97094a7d0a 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -865,7 +865,8 @@ the @code{crate} importer (@pxref{Invoking guix import}).
@item
@uref{https://www.nongnu.org/guile-lib/doc/ref/htmlprag/, Guile-Lib} for
-the @code{go} importer (@pxref{Invoking guix import}).
+the @code{go} importer (@pxref{Invoking guix import}) and for some of
+the ``updaters'' (@pxref{Invoking guix refresh}).
@item
When @url{http://www.bzip.org, libbz2} is available,
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 9e393d18cd..febed57c3a 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -38,6 +38,7 @@
#:use-module (guix upstream)
#:use-module (guix packages)
#:autoload (zlib) (call-with-gzip-input-port)
+ #:autoload (htmlprag) (html->sxml) ;from Guile-Lib
#:export (gnu-package-name
gnu-package-mundane-name
gnu-package-copyright-holder
@@ -447,18 +448,6 @@ hosted on ftp.gnu.org, or not under that name (this is the case for
;;; Latest HTTP release.
;;;
-(define (html->sxml port)
- "Read HTML from PORT and return the corresponding SXML tree."
- (let ((str (get-string-all port)))
- (catch #t
- (lambda ()
- ;; XXX: This is the poor developer's HTML-to-XML converter. It's good
- ;; enough for directory listings at <https://kernel.org/pub> but if
- ;; needed we could resort to (htmlprag) from Guile-Lib.
- (call-with-input-string (string-replace-substring str "<hr>" "<hr />")
- xml->sxml))
- (const '(html))))) ;parse error
-
(define (html-links sxml)
"Return the list of links found in SXML, the SXML tree of an HTML page."
(let loop ((sxml sxml)
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 2/7] gnu-maintenance: 'latest-html-release' considers non-relative URLs.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-2-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Allow for URL to be an
arbitrary URL rather than a relative URL reference.
---
guix/gnu-maintenance.scm | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

Toggle diff (48 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index febed57c3a..98d326e500 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2012, 2013 Nikita Karetnikov <nikita@karetnikov.org>
;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
;;;
@@ -479,19 +479,21 @@ return the corresponding signature URL, or #f it signatures are unavailable."
(port (http-fetch/cached uri #:ttl 3600))
(sxml (html->sxml port)))
(define (url->release url)
- (and (string=? url (basename url)) ;relative reference?
- (release-file? package url)
- (let-values (((name version)
- (package-name->name+version
- (tarball-sans-extension url)
- #\-)))
- (upstream-source
- (package name)
- (version version)
- (urls (list (string-append base-url directory "/" url)))
- (signature-urls
- (list (file->signature
- (string-append base-url directory "/" url))))))))
+ (let* ((base (basename url))
+ (url (if (string=? base url)
+ (string-append base-url directory "/" url)
+ url)))
+ (and (release-file? package base)
+ (let-values (((name version)
+ (package-name->name+version
+ (tarball-sans-extension base)
+ #\-)))
+ (upstream-source
+ (package name)
+ (version version)
+ (urls (list url))
+ (signature-urls
+ (list (file->signature url))))))))
(define candidates
(filter-map url->release (html-links sxml)))
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 3/7] gnu-maintenance: 'release-file?' rejects checksum files.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-3-ludo@gnu.org
* guix/gnu-maintenance.scm (release-file?): Reject ".md5sum",
".sha1sum", and ".sha256sum".
---
guix/gnu-maintenance.scm | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Toggle diff (17 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 98d326e500..a8b24fa336 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -247,7 +247,9 @@ network to check in GNU's database."
(define (release-file? project file)
"Return #f if FILE is not a release tarball of PROJECT, otherwise return
true."
- (and (not (member (file-extension file) '("sig" "sign" "asc")))
+ (and (not (member (file-extension file)
+ '("sig" "sign" "asc"
+ "md5sum" "sha1sum" "sha256sum")))
(and=> (regexp-exec %tarball-rx file)
(lambda (match)
;; Filter out unrelated files, like `guile-www-1.1.1'.
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 4/7] gnu-maintenance: 'latest-html-release' can determine signature file name.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-4-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): #:file->signature
defaults to #f.
[file->signature/guess]: New procedure.
[url->release]: Use it when FILE->SIGNATURE is #f.
Introduce 'links' variable.
(url-prefix-rewrite): Check whether URL is true before calling
'string-prefix?'.
(latest-savannah-release): Adjust comment about detached signatures.
---
guix/gnu-maintenance.scm | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)

Toggle diff (76 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index a8b24fa336..3bffa4d11e 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -470,16 +470,29 @@ hosted on ftp.gnu.org, or not under that name (this is the case for
#:key
(base-url "https://kernel.org/pub")
(directory (string-append "/" package))
- (file->signature (cut string-append <> ".sig")))
+ file->signature)
"Return an <upstream-source> for the latest release of PACKAGE (a string) on
SERVER under DIRECTORY, or #f. BASE-URL should be the URL of an HTML page,
typically a directory listing as found on 'https://kernel.org/pub'.
-FILE->SIGNATURE must be a procedure; it is passed a source file URL and must
-return the corresponding signature URL, or #f it signatures are unavailable."
- (let* ((uri (string->uri (string-append base-url directory "/")))
- (port (http-fetch/cached uri #:ttl 3600))
- (sxml (html->sxml port)))
+When FILE->SIGNATURE is omitted or #f, guess the detached signature file name,
+if any. Otherwise, FILE->SIGNATURE must be a procedure; it is passed a source
+file URL and must return the corresponding signature URL, or #f it signatures
+are unavailable."
+ (let* ((uri (string->uri (string-append base-url directory "/")))
+ (port (http-fetch/cached uri #:ttl 3600))
+ (sxml (html->sxml port))
+ (links (delete-duplicates (html-links sxml))))
+ (define (file->signature/guess url)
+ (let ((base (basename url)))
+ (any (lambda (link)
+ (any (lambda (extension)
+ (and (string=? (string-append base extension)
+ (basename link))
+ (string-append url extension)))
+ '(".asc" ".sig" ".sign")))
+ links)))
+
(define (url->release url)
(let* ((base (basename url))
(url (if (string=? base url)
@@ -495,10 +508,10 @@ return the corresponding signature URL, or #f it signatures are unavailable."
(version version)
(urls (list url))
(signature-urls
- (list (file->signature url))))))))
+ (list ((or file->signature file->signature/guess) url))))))))
(define candidates
- (filter-map url->release (html-links sxml)))
+ (filter-map url->release links))
(close-port port)
(match candidates
@@ -614,7 +627,7 @@ releases are on gnu.org."
(define (url-prefix-rewrite old new)
"Return a one-argument procedure that rewrites URL prefix OLD to NEW."
(lambda (url)
- (if (string-prefix? old url)
+ (if (and url (string-prefix? old url))
(string-append new (string-drop url (string-length old)))
url)))
@@ -646,9 +659,8 @@ releases are on gnu.org."
(directory (dirname (uri-path uri)))
(rewrite (url-prefix-rewrite %savannah-base
"mirror://savannah")))
- ;; Note: We use the default 'file->signature', which adds ".sig", but not
- ;; all projects on Savannah follow that convention: some use ".asc" and
- ;; perhaps some lack signatures altogether.
+ ;; Note: We use the default 'file->signature', which adds ".sig", ".asc",
+ ;; or whichever detached signature naming scheme PACKAGE uses.
(and=> (latest-html-release package
#:base-url %savannah-base
#:directory directory)
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 5/7] gnu-maintenance: 'latest-html-release' better computes version number.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-5-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Use 'tarball->version'
rather than 'package-name->name+version' to extract the version number.
This fixes problems with packages like 'netsurf' and 'libdom' that have
"-src" in their tarball name, where "src" would be taken as the new
version number.
---
guix/gnu-maintenance.scm | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

Toggle diff (21 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 3bffa4d11e..5aa16acfde 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -499,12 +499,9 @@ are unavailable."
(string-append base-url directory "/" url)
url)))
(and (release-file? package base)
- (let-values (((name version)
- (package-name->name+version
- (tarball-sans-extension base)
- #\-)))
+ (let ((version (tarball->version base)))
(upstream-source
- (package name)
+ (package package)
(version version)
(urls (list url))
(signature-urls
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 7/7] gnu: hwloc: Add 'release-monitoring-url' property.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-7-ludo@gnu.org
* gnu/packages/mpi.scm (hwloc-1)[properties]: New field.
---
gnu/packages/mpi.scm | 6 ++++++
1 file changed, 6 insertions(+)

Toggle diff (19 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 53ee6ef1cd..a8ebd8aeb8 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -66,6 +66,12 @@
(sha256
(base32
"0za1b9lvrm3rhn0lrxja5f64r0aq1qs4m0pxn1ji2mbi8ndppyyx"))))
+
+ (properties
+ ;; Tell the 'generic-html' updater to monitor this URL for updates.
+ `((release-monitoring-url
+ . "https://www-lb.open-mpi.org/software/hwloc/current")))
+
(build-system gnu-build-system)
(outputs '("out" ;'lstopo' & co., depends on Cairo, libx11, etc.
"lib" ;small closure
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 22:46
[PATCH 6/7] gnu-maintenance: Add 'generic-html' updater.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-6-ludo@gnu.org
This brings total updater coverage, as reported by 'guix refresh
--list-updaters', from 78% to 88.3%. Among many other things, it covers
freedesktop.org packages.

* guix/gnu-maintenance.scm (html-updatable-package?)
(latest-html-updatable-release): New procedures.
(%generic-html-updater): New variable.
* doc/guix.texi (Invoking guix refresh): Document it.
---
doc/guix.texi | 3 +++
guix/gnu-maintenance.scm | 58 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 1 deletion(-)

Toggle diff (104 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 97094a7d0a..89c8c58295 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -11693,6 +11693,9 @@ the updater for @uref{https://www.stackage.org, Stackage} packages.
the updater for @uref{https://crates.io, Crates} packages.
@item launchpad
the updater for @uref{https://launchpad.net, Launchpad} packages.
+@item generic-html
+a generic updater that crawls the HTML page where the source tarball of
+the package is hosted, when applicable.
@end table
For instance, the following command only checks for updates of Emacs
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 5aa16acfde..ced5497b37 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -28,6 +28,7 @@
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-26)
+ #:use-module (srfi srfi-34)
#:use-module (rnrs io ports)
#:use-module (system foreign)
#:use-module (guix http-client)
@@ -66,7 +67,8 @@
%gnu-ftp-updater
%savannah-updater
%xorg-updater
- %kernel.org-updater))
+ %kernel.org-updater
+ %generic-html-updater))
;;; Commentary:
;;;
@@ -697,6 +699,53 @@ releases are on gnu.org."
#:file->signature file->signature)
(cut adjusted-upstream-source <> rewrite))))
+(define html-updatable-package?
+ ;; Return true if the given package may be handled by the generic HTML
+ ;; updater.
+ (let ((hosting-sites '("github.com" "github.io" "gitlab.com"
+ "notabug.org" "sr.ht"
+ "gforge.inria.fr" "gitlab.inria.fr"
+ "ftp.gnu.org" "download.savannah.gnu.org"
+ "pypi.org" "crates.io" "rubygems.org"
+ "bioconductor.org")))
+ (url-predicate (lambda (url)
+ (match (string->uri url)
+ (#f #f)
+ (uri
+ (let ((scheme (uri-scheme uri))
+ (host (uri-host uri)))
+ (and (memq scheme '(http https))
+ (not (member host hosting-sites))))))))))
+
+(define (latest-html-updatable-release package)
+ "Return the latest release of PACKAGE. Do that by crawling the HTML page of
+the directory containing its source tarball."
+ (let* ((uri (string->uri
+ (match (origin-uri (package-source package))
+ ((? string? url) url)
+ ((url _ ...) url))))
+ (custom (assoc-ref (package-properties package)
+ 'release-monitoring-url))
+ (base (or custom
+ (string-append (symbol->string (uri-scheme uri))
+ "://" (uri-host uri))))
+ (directory (if custom
+ ""
+ (dirname (uri-path uri))))
+ (package (package-upstream-name package)))
+ (catch #t
+ (lambda ()
+ (guard (c ((http-get-error? c) #f))
+ (latest-html-release package
+ #:base-url base
+ #:directory directory)))
+ (lambda (key . args)
+ ;; Return false and move on upon connection failures.
+ (unless (memq key '(gnutls-error tls-certificate-error
+ system-error))
+ (apply throw key args))
+ #f))))
+
(define %gnu-updater
;; This is for everything at ftp.gnu.org.
(upstream-updater
@@ -737,4 +786,11 @@ releases are on gnu.org."
(pred (url-prefix-predicate "mirror://kernel.org/"))
(latest latest-kernel.org-release)))
+(define %generic-html-updater
+ (upstream-updater
+ (name 'generic-html)
+ (description "Updater that crawls HTML pages.")
+ (pred html-updatable-package?)
+ (latest latest-html-updatable-release)))
+
;;; gnu-maintenance.scm ends here
--
2.30.1
L
L
Léo Le Bouter wrote on 17 Mar 2021 11:18
[PATCH 0/7] Add 'generic-html' updater
(address . 47126@debbugs.gnu.org)
5a2391930ed890f8cf61da88ccda6df9bb874630.camel@zaclys.net
Hello!

That's awesome thanks a lot Ludo!!

I am wondering, does this handle cases where there's a subfolder with
version and then another tarball with version as well?

Like GNOME for example:

I see this is a generic solution, I see you made available some options
to customize per-package as needed but can we get as precise/reliable
as Debian's watch/uscan with that?

Or if I understand correctly, we should always point it to a page where
the link for the latest release is always published? That last thing
really sounds nice!

Léo
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEFIvLi9gL+xax3g6RRaix6GvNEKYFAmBR15EACgkQRaix6GvN
EKYmFxAAsgGHKw+x7IWA4NJMYWvLo0WhY1Ss3wt0CbnOeNDW3aaDV1S39WCs5zPO
5GpYp1Y/jo1PGDgilJL5V/Aof8McoRQi1PttVTmfUDAg2n3q4Uv9eKJlKfEFPopZ
JgxYE/1Y9b+6qjNagBT921AgNpuL0JVdsftdhcOEmmr2ROLTSMLBvHiKc9uCXPDv
e/hqIVYfXtf6p+EJIwAbEruqhPuLFuPcmOKoEm8hd6T/0cBYoRt36g+uQ8Dkyil5
YEPR61XbGo+A6n88uoTmUrCmGvuiQ+uhvI513PlvTrOPtPYvRQf23r5vWii8Dazj
az1BVBH3EnKkZRXhPjE9IG+mwxPq/MWzaZsuyKP5qBwDGZmSrzpzarbaYso4y4La
NdbpCmUMdLYVvK9F/o2UB6P9wyMXH6HCtvZUg86WiylhGnceMj65bZdrnmqWlQAo
f9O6WTwdXU6vD4RgjBpINqsPH2iqGEFkKDP68ZznNTqjAlEbC5vOoWVVIkO32VmI
ZtEf2sQggUFDO9ez+78VgY7he4YXeqfzenDwfyFhSRIe+LKGy0oNbrYuizSkbviy
vOy1YblmEQnE+vG1d7Uwy7CWANBFPSEUJIHZdKrutV2NPH6CexgpktEjFcoeSTu2
PfR/BfIEoIbZMN/4GVwhwcakc08njQV0c7L4JTqQcm2swr/GdAU=
=faOe
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 17 Mar 2021 14:52
(name . Léo Le Bouter)(address . lle-bout@zaclys.net)(address . 47126@debbugs.gnu.org)
87lfal51lf.fsf_-_@gnu.org
Hi Léo,

Léo Le Bouter <lle-bout@zaclys.net> skribis:

Toggle quote (2 lines)
> That's awesome thanks a lot Ludo!!

Just pushed this series as fe96f64110676f28b948f0d31a1726501abdae0e.
Unleash your update powers, comrades! :-)

Toggle quote (10 lines)
> I am wondering, does this handle cases where there's a subfolder with
> version and then another tarball with version as well?
>
> Like GNOME for example:
> https://download.gnome.org/sources/NetworkManager/1.31/NetworkManager-1.31.1.tar.xz
>
> I see this is a generic solution, I see you made available some options
> to customize per-package as needed but can we get as precise/reliable
> as Debian's watch/uscan with that?

There’s a ‘gnome’ updater for GNOME:


And yes, it actually works. :-)

In the case of NetworkManager, there’s a bug right now:

Toggle snippet (6 lines)
$ guix refresh network-manager
ni sekvas la redirektigon al 'https://download.gnome.org/sources/NetworkManager/cache.json'...
ni sekvas la redirektigon al 'https://fr2.rpmfind.net/linux/gnome.org/sources/NetworkManager/cache.json'...
gnu/packages/gnome.scm:7648:13: network-manager would be upgraded from 1.24.0 to rc2

I’ll see what’s up. But otherwise ‘guix refresh -t gnome’ produces
sensible results.

At any rate, updaters sometimes bitrot, produce buggy results as in the
example above. Please do use ‘guix refresh’ and report any issues!

Also, there are still ~12% of packages for which none of the updaters
apply. We should investigate and see how we can bring that down to
zero.

Thanks for your feedback!

Ludo’.
L
L
Ludovic Courtès wrote on 17 Mar 2021 14:53
control message for bug #47126
(address . control@debbugs.gnu.org)
87k0q551ky.fsf@gnu.org
tags 47126 fixed
close 47126
quit
?