Downloading substitutes is too slow upon nginx cache misses

Done

Details

7 participants

dian_cecht
Ludovic Courtès
Maxim Cournoyer
Tobias Geerinckx-Rice
Mark H Weaver
Florian Pelz
Ricardo Wurmus

Owner: unassigned

Submitted by: dian_cecht

Severity: important

dian_cecht wrote on 21 Mar 2017 02:44

No notification of cache misses when downloading substitutes

Recipients:(name . GuixSD)(address . bug-guix@gnu.org)

Message-ID:20170320184449.5ac06051@khaalida

Just ran guix pull and guix package -u, and found some of the programs
download VERY slowly (<100kb/s, usually around 95). I asked on #guix
and lfam mentioned it was probably a cache miss.

It would be nice if there was some notification that a cache miss
happened and the download will likely be slow, otherwise a user might
wonder what problem there is with their connection.

Tobias Geerinckx-Rice wrote on 21 Mar 2017 03:46

Recipients:(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)

Message-ID:144e9ba8-af93-fb18-d2b9-f198ae7c11e9@tobias.gr

Hullo,

On 21/03/17 02:44, dian_cecht@zoho.com wrote:

Toggle quote (4 lines)

> Just ran guix pull and guix package -u, and found some of the programs

> download VERY slowly (<100kb/s, usually around 95). I asked on #guix

> and lfam mentioned it was probably a cache miss.

Do you mean that *substitutes* existed, but were not yet on

mirror.hydra.gnu.org and so were silently proxied from the much slower

hydra.gnu.org?

Or did Guix fall back to downloading *source* tarballs from some slow

upstream to build locally?

(I've no access to IRC at the mo'.)

Kind regards,

T G-R

Attachment: signature.asc

dian_cecht wrote on 21 Mar 2017 03:52

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:20170320195247.05f72fc9@khaalida

On Tue, 21 Mar 2017 03:46:29 +0100

Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (11 lines)> Hullo,
> 
> On 21/03/17 02:44, dian_cecht@zoho.com wrote:
> > Just ran guix pull and guix package -u, and found some of the
> > programs download VERY slowly (<100kb/s, usually around 95). I
> > asked on #guix and lfam mentioned it was probably a cache miss.  
> 
> Do you mean that *substitutes* existed, but were not yet on
> mirror.hydra.gnu.org and so were silently proxied from the much slower
> hydra.gnu.org?

The URL displayed during the download was mirror.hydra.gnu.org.

Toggle quote (4 lines)

> Or did Guix fall back to downloading *source* tarballs from some slow

> upstream to build locally?

It was a binary download, not source. At least, I don't recall anything
about compiles at any point (and I'm sure it didn't take long enough to
do that; one package was icecat which I'm sure wouldn't have downloaded
at 90k/s then compiled in less than 15 minutes (fwiw, according to my
build logs firefox takes about 2 hours to build, so unless icecat is
magically orders of magnitude faster to build, then I'm sure it was
just a download + install, and not download + compile + install)

Tobias Geerinckx-Rice wrote on 21 Mar 2017 04:57

Recipients:(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)

Message-ID:8e7e07d1-563f-666f-2c32-2a772757c86f@tobias.gr

Ahoy,

On 21/03/17 03:52, dian_cecht@zoho.com wrote:

Toggle quote (3 lines)

> The URL displayed during the download was mirror.hydra.gnu.org.

> [...] It was a binary download, not source.

Oh, OK. I'm not an expert on how Hydra's set up these days, but will

assume it's not too different from my own (a fast nginx proxy_cache,

mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

Whenever you're the first to request a substitute, mirror.hydra.gnu.org

transparently forwards the request to hydra.gnu.org.

The latter has to compress the response on the fly, leading to much

slower transfer speeds. It slowly sends it back to the mirror, which

slowly sends it on to you while also saving it on disc so all subsequent

downloads will be fast — by Hydra standards – and not involve hydra.gnu.org.

Maybe you knew all this, but it's also the reason that...

Toggle quote (5 lines)

> On 21/03/17 02:44, dian_cecht@zoho.com wrote:

> It would be nice if there was some notification that a cache miss

> happened and the download will likely be slow, otherwise a user might

> wonder what problem there is with their connection.

...I'm afraid this makes no sense from guix's point of view.

The term ‘cache miss’ here is an implementation detail of our current

Hydra set-up, not something guix can or IMO should care about. There are

hundreds of reasons why your connection might be slow at any given time.

Guix should just tell you so (it does), not guess why. Or worse: know.

(But if others disagree, we'll have to extend the Hydra API to somehow

relay this information to the client, in the spirit of the modern Web.)

HTTP 200½: OK, fine, but it's Going to Suck.

T G-R

Attachment: signature.asc

dian_cecht wrote on 21 Mar 2017 05:48

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:20170320214809.466dc5fe@khaalida

On Tue, 21 Mar 2017 04:57:09 +0100

Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (22 lines)> Ahoy,
> 
> On 21/03/17 03:52, dian_cecht@zoho.com wrote:
> > The URL displayed during the download was mirror.hydra.gnu.org.
> > [...] It was a binary download, not source.  
> 
> Oh, OK. I'm not an expert on how Hydra's set up these days, but will
> assume it's not too different from my own (a fast nginx proxy_cache,
> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).
> 
> Whenever you're the first to request a substitute,
> mirror.hydra.gnu.org transparently forwards the request to
> hydra.gnu.org.
> 
> The latter has to compress the response on the fly, leading to much
> slower transfer speeds. It slowly sends it back to the mirror, which
> slowly sends it on to you while also saving it on disc so all
> subsequent downloads will be fast — by Hydra standards – and not
> involve hydra.gnu.org.
> 
> Maybe you knew all this, but it's also the reason that...

I'm not familiar with the implementation details, nor how hydra is

currently setup.

Toggle quote (13 lines)> > On 21/03/17 02:44, dian_cecht@zoho.com wrote:
> > It would be nice if there was some notification that a cache miss
> > happened and the download will likely be slow, otherwise a user
> > might wonder what problem there is with their connection.  
> 
> ...I'm afraid this makes no sense from guix's point of view.
> 
> The term ‘cache miss’ here is an implementation detail of our current
> Hydra set-up, not something guix can or IMO should care about. There
> are hundreds of reasons why your connection might be slow at any
> given time. Guix should just tell you so (it does), not guess why. Or
> worse: know.

I'm not suggesting having Guix tell me why my network is slow, only if
the download might be slow because it's having to pull from
hydra.gnu.org. Having Guix automagically troubleshoot networking
problems is well beyond the scope of a package manager, even one that
goes as far beyond simple package management as Guix does.

Toggle quote (5 lines)

> (But if others disagree, we'll have to extend the Hydra API to somehow

> relay this information to the client, in the spirit of the modern

> Web.)

AFAIK, Guix devs are working on a replacement for the current build
system, so the sane option wouldn't be extending the current hydra
system to handle a new API call, but to try and work this type of
feature into the next system. Unless, of course, something like this
could be done in hydra reasonably easily, in which case why not.

Another option would be to have the mirrors automatically cache the
files as soon as they are available to try. I'd hope this would be how
things are handled already, but one never knows.

Tobias Geerinckx-Rice wrote on 21 Mar 2017 07:21

Recipients:(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)

Message-ID:d8962205-0e0f-59ef-c957-923ba9bc01d4@tobias.gr

Mornin',

On 21/03/17 05:48, dian_cecht@zoho.com wrote:

Toggle quote (2 lines)

> I'm not suggesting having Guix tell me why my network is slow,

I never mentioned your network. Your proxied connection to a substitute

server, yes. And, well, this very bug report is for Guix to tell you why

that's slow...

Toggle quote (3 lines)

> only if the download might be slow because it's having to pull from

> hydra.gnu.org.

(Side note: ‘it’ here is mirror.hydra.gnu.org, never a well-configured

Guix client.)

So to implement this, the client would need to display a ‘warning‘

message or flag sent by the substitute server, to notify the user that

their download might be slower... sometimes... by an unknown amount...

possibly?

But see, that wouldn't be true at all on my system (and surely others),

despite being set up nearly identically to Hydra. On the other hand, my

home download speed fluctuates wildly, even between simultaneous

connections to the same server. Whether or not a file is cached makes no

difference. To be told would be noise at best, misleading at worst.

I'd be against this only for those reasons, but I promise I'm not.

It's just all a bit vague, 's all, and my personal opinion is that once

the vagueness is resolved, not much will remain. But who knows.

Toggle quote (5 lines)

> AFAIK, Guix devs are working on a replacement for the current build

> system, so the sane option wouldn't be extending the current hydra

> system to handle a new API call, but to try and work this type of

> feature into the next system.

My point is that it wouldn't be sane, and would be an ugly hack in

either system. Cuirass isn't really different from Hydra is this regard.

Me shut up now :-) I'm more interested in what others have to say.

Kind regards,

T G-R

Attachment: signature.asc

dian_cecht wrote on 21 Mar 2017 07:49

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:20170320234912.46680062@khaalida

On Tue, 21 Mar 2017 07:21:54 +0100

Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (8 lines)

> > only if the download might be slow because [mirror.hydra] is having

> > to pull from hydra.gnu.org.

> So to implement this, the client would need to display a ‘warning‘

> message or flag sent by the substitute server, to notify the user that

> their download might be slower... sometimes... by an unknown amount...

> possibly?

Simply a notification that mirror.hydra doesn't currently have a cached
version of the file and the download might be slower than normal would
be fine. As-is, looking up and seeing download speeds that amount to
less than 10% of one's normal bandwidth is a bit concerning since it
would seem like there is a problem. In this case, Guix would be giving
the user some notification that something /is/ out of the ordinary, and
possibly save the user some effort trying to determine the cause of the
slowdown.

Toggle quote (5 lines)

> But see, that wouldn't be true at all on my system (and surely

> others), despite being set up nearly identically to Hydra. On the

> other hand, my home download speed fluctuates wildly, even between

> simultaneous connections to the same server.

I'm not sure how any of this matters. If you are running a local Hydra

instance or whatever, then I'd assume you'd be aware of what, if any,

problems that could arise. In this case, I'd hope hydra would allow you

to disable this feature.

Toggle quote (3 lines)

> Whether or not a file is cached makes no difference. To be told would

> be noise at best, is leading at worst.

Had I been notified that mirror.hydra was currently pulling from hydra,
it would have saved me the time of jumping on IRC and asking what was
up, which only worked because someone was in #guix and had an idea of
what was going on; had that not been the case, I would have started
looking for the cause for the slowdown and wasted several minutes (at
least) trying to figure out what was wrong, and since it was on
mirror.hydra's end, I'd have no way to know the slowdown was on their
end and not mine, nor my ISP's problem.

Toggle quote (8 lines)

> > AFAIK, Guix devs are working on a replacement for the current build

> > system, so the sane option wouldn't be extending the current hydra

> > system to handle a new API call, but to try and work this type of

> > feature into the next system.

> My point is that it wouldn't be sane, and would be an ugly hack in

> either system.

I don't see how this would have to be "an ugly hack". It's simply a
query and response. The simplest way I can see for this to work would
be for mirror.hydra to either just send the requested file, or a
response that the file isn't cached then start to trickle the file on to
the client.

Florian Pelz wrote on 21 Mar 2017 13:59

Recipients:(address . 26201@debbugs.gnu.org)

Message-ID:be6b7b69-5ab9-3d4e-68fe-4d582699b2cc@pelzflorian.de

On Mon, 2017-03-20 at 21:48 -0700, dian_cecht@zoho.com wrote:

Toggle quote (5 lines)

> Another option would be to have the mirrors automatically cache the

> files as soon as they are available to try. I'd hope this would be how

> things are handled already, but one never knows.

If it cached everything, it wouldn’t be a cache?

Tobias Geerinckx-Rice wrote on 21 Mar 2017 15:55

Recipients:(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)

Message-ID:1bbd8ee3-1745-3642-27ed-f095c732dc11@tobias.gr

Hullo!

On 21/03/17 07:49, dian_cecht@zoho.com wrote:

Toggle quote (4 lines)

> I'm not sure how any of this matters. If you are running a local

> Hydra instance or whatever, then I'd assume you'd be aware of what,

> if any, problems that could arise.

It matters for the reasons mentioned. It's not a ‘local Hydra’ & I have

no idea what problems you're talking about.

My problem is that every invocation of Guix already fills several

screens with Guile cache misses. Adding another warning (‘warning! the

system is working exactly as designed!’) will only serve to make those

other warnings look less silly, and I think that would be a shame.

To clarify:

- Warnings should be scary because warnings should be actionable.

There's nothing the user can or needs to do about a cache miss.

- It would be randomly shown to everyone, since this happens constantly.

- The behaviour warned about is not incorrect or abnormal.

- As already noted, it's how caching works.

Toggle quote (6 lines)

> I don't see how this would have to be "an ugly hack". It's simply a

> query and response. The simplest way I can see for this to work would

> be for mirror.hydra to either just send the requested file, or a

> response that the file isn't cached then start to trickle the file on

> to the client.

Well, yeah... That's the ugly hack. :-)

It's not that your suggestion's hard to implement. In fact, it's

just one line for nginx (which it turns out I already had):

add_header X-Cache-Status $upstream_cache_status;

and 6 lines of lightly-tested Guile (attached)¹. And presto. This thing.

Doesn't mean we should.

Kind regards,

T G-R

¹: Why? Practice. Irony. Light masochism.

From 6d459a442d73628a0628385283c7cf04dff1b797 Mon Sep 17 00:00:00 2001

Still not a good idea.

* guix/http-client.scm (http-fetch): Add #:peek-behind-proxy parameter

to expose caching proxy implementation details as a scary warning.

* guix/scripts/substitute.scm (fetch): Use it.

---

guix/http-client.scm | 10 +++++++++-

guix/scripts/substitute.scm | 3 ++-

2 files changed, 11 insertions(+), 2 deletions(-)

Toggle diff (53 lines)diff --git a/guix/http-client.scm b/guix/http-client.scm
index 6874c51..2366f5e 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -2,6 +2,7 @@
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
 ;;; Copyright © 2012, 2015 Free Software Foundation, Inc.
+;;; Copyright © 2017 Tobias Geerinckx-Rice <me@tobias.gr>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -222,7 +223,8 @@ or if EOF is reached."
 
 (define* (http-fetch uri #:key port (text? #f) (buffered? #t)
                      keep-alive? (verify-certificate? #t)
-                     (headers '((user-agent . "GNU Guile"))))
+                     (headers '((user-agent . "GNU Guile")))
+                     (peek-behind-cache? #f))
   "Return an input port containing the data at URI, and the expected number of
 bytes available or #f.  If TEXT? is true, the data at URI is considered to be
 textual.  Follow any HTTP redirection.  When BUFFERED? is #f, return an
@@ -253,8 +255,14 @@ Raise an '&http-get-error' condition if downloading fails."
                      (http-get uri #:streaming? #t #:port port
                                #:keep-alive? #t
                                #:headers headers))
+                    ((headers)
+                     (response-headers resp))
                     ((code)
                      (response-code resp)))
+        (when (and peek-behind-cache?
+                   (equal? (assoc-ref headers 'x-cache-status) "MISS"))
+              (warning (_ "the caching proxy is working properly!~%"))
+              (warning (_ "and there's nothing you can do about it.~%")))
         (case code
           ((200)
            (values data (response-content-length resp)))
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index faeb019..4a4f115 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -216,7 +216,8 @@ provide."
                (unless (or buffered? (not (file-port? port)))
                  (setvbuf port _IONBF)))
              (http-fetch uri #:text? #f #:port port
-                         #:verify-certificate? #f))))))
+                         #:verify-certificate? #f
+                         #:peek-behind-cache? #t))))))
     (else
      (leave (_ "unsupported substitute URI scheme: ~a~%")
             (uri->string uri)))))
-- 
2.9.3

Attachment: signature.asc

dian_cecht wrote on 21 Mar 2017 16:32

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:20170321083239.3cbf1e8d@khaalida

On Tue, 21 Mar 2017 15:55:05 +0100

Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (4 lines)

> To clarify:

> - Warnings should be scary because warnings should be actionable.

There are warnings and there are errors. Warnings don't have to be
scary; I get them every time I update emacs because of duplicate icons
stored in two different directories in the store. Is that actionable?
Not as far as I am concerned, unless I want to hand delete something
from the store, which, as far as I understand it, shouldn't be done.

Toggle quote (2 lines)

> There's nothing the user can or needs to do about a cache miss.

Please reread the 2nd part of my response in Message #23 in this

bugreport for why this is needed.

Toggle quote (3 lines)

> - It would be randomly shown to everyone, since this happens

> constantly.

Unless mirror.hydra randomly loses data in it's cache from hydra, it

won't be random in the least.

Toggle quote (2 lines)

> - The behaviour warned about is not incorrect or abnormal.

No, but the behavior would inform the user that the unusual and random

slowdown isn't another problem and is because mirror.hydra is having to

update it's cache, which, as I explained before, is useful information.

Toggle quote (2 lines)

> [...]

Quite frankly I'd like someone else to take a look at this bug, if
for no other reason than I'm not sure if we're communicating clearly
with each other here. Most of what you are saying makes no sense
whatsoever and seems to miss the point I have attempted to make.

While I will thank you for actually writing a patch, saying "the
caching proxy is working properly! and there's nothing you can do about
it." seems rather cynical and clearly misses the point of what I'm
requesting here.

dian_cecht wrote on 21 Mar 2017 16:35

Recipients:(name . Florian Pelz)(address . pelzflorian@pelzflorian.de)(address . 26201@debbugs.gnu.org)

Message-ID:20170321083536.639716a9@khaalida

On Tue, 21 Mar 2017 13:59:27 +0100

Florian Pelz <pelzflorian@pelzflorian.de> wrote:

Toggle quote (8 lines)

> On Mon, 2017-03-20 at 21:48 -0700, dian_cecht@zoho.com wrote:

> > Another option would be to have the mirrors automatically cache the

> > files as soon as they are available to try. I'd hope this would be

> > how things are handled already, but one never knows.

> >

> If it cached everything, it wouldn’t be a cache?

If the point is to reduce the load on hydra, then at some point it
could have everything. If it doesn't, then why have a mirror when it's
just pulling right the source all the time anyways?

Tobias Geerinckx-Rice wrote on 21 Mar 2017 17:07

Recipients:(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)

Message-ID:553699c2-fb50-5cf4-a80d-8ee0a70c039d@tobias.gr

On 21/03/17 16:32, dian_cecht@zoho.com wrote:

Toggle quote (3 lines)

> Unless mirror.hydra randomly loses data in it's cache from hydra, it

> won't be random in the least.

It will. Whether one is first to download from the cache after the

substitute is built is essentially random.

Toggle quote (2 lines)

> Quite frankly I'd like someone else to take a look at this bug,

Glad you agree.

Toggle quote (4 lines)

> if for no other reason than I'm not sure if we're communicating clearly

> with each other here. Most of what you are saying makes no sense

> whatsoever and seems to miss the point I have attempted to make.

I assure you it does not.

Kind regards,

T G-R

Attachment: signature.asc

Ludovic Courtès wrote on 21 Mar 2017 17:43

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:8760j2wpfy.fsf@gnu.org

Hello!

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (4 lines)

> Oh, OK. I'm not an expert on how Hydra's set up these days, but will

> assume it's not too different from my own (a fast nginx proxy_cache,

> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

I think there’s room for improvement in our nginx config at

https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf.

For instance, I just discovered ‘proxy_cache_lock’ while looking at

http://nginx.org/en/docs/http/ngx_http_proxy_module.html; looks useful

in reducing load on hydra.gnu.org. Surely there are other ways to tweak

caching.

Besides, I’d like to use ‘guix publish’ on hydra.gnu.org. I suspect

it’s going to be faster than Starman (the HTTP server behind Hydra), and

also it uses an in-process gzip by default, as opposed to bzip2 which is

what Hydra uses (better compression ratio, but super CPU-intensive).

At any rate, clients should not paper over server-side performance

issues IMO.

Thanks,

Ludo’.

Tobias Geerinckx-Rice wrote on 21 Mar 2017 18:08

Recipients:(address . ludo@gnu.org)(address . 26201@debbugs.gnu.org)

Message-ID:9889a4b5-c300-cd03-1095-1115428067fb@tobias.gr

Ludo',

On 21/03/17 17:43, Ludovic Courtès wrote:

Toggle quote (8 lines)

> I think there’s room for improvement in our nginx config at

> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.

> For instance, I just discovered ‘proxy_cache_lock’ while looking at

> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful

> in reducing load on hydra.gnu.org. Surely there are other ways to tweak

> caching.

Indeed! For reference, here's my cache configuration.

That's right. Now you can all¹ steal some criminally overpriced Belgian

bandwidth!

server {

server_name substitutes.tobias.gr;

listen [::]:443 ssl http2;

listen 443 ssl http2;

# FIXME move to main LE cert

ssl_certificate substitutes.pem;

ssl_certificate_key substitutes.key;

# "" means ‘inherit from upstream’ here.

add_header Cache-Control "";

# So does ‘off’. This is all a bit hacky.

expires off;

proxy_hide_header Set-Cookie;

proxy_ignore_headers Set-Cookie;

# Almost all traffic is already compressed.

gzip off;

...

location / {

limit_except GET { deny all; }

proxy_pass SUPER_SEKRIT_BACKEND;

# https://www.nginx.com/blog/nginx-caching-guide

add_header X-Cache-Status $upstream_cache_status;

proxy_cache default;

# We allow only GET requests, so don't waste key space:

proxy_cache_key "$request_uri";

proxy_cache_lock on;

proxy_cache_lock_timeout 3h; #yolo

proxy_cache_use_stale error timeout

http_500 http_502 http_503 http_504;

}

...

}

I'm sure it's hardly optimal (or, erm, ‘good’) either but it works.

Toggle quote (5 lines)

> Besides, I’d like to use ‘guix publish’ on hydra.gnu.org. I suspect

> it’s going to be faster than Starman (the HTTP server behind Hydra), and

> also it uses an in-process gzip by default, as opposed to bzip2 which is

> what Hydra uses (better compression ratio, but super CPU-intensive).

Back when I used Hydra-the-software I do so briefly and I think it

worked. But no hard tests.

Toggle quote (3 lines)

> At any rate, clients should not paper over server-side performance

> issues IMO.

Entirely off-topic, but this 'tude is a part of what drew me to Guix in

the first place. So, like, thanks, in general :-)

Kind regards,

T G-R

¹: Just put it *after* mirror.hydra.gnu.org, OK?

Attachment: signature.asc

Ludovic Courtès wrote on 22 Mar 2017 23:06

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:87fui50xws.fsf@gnu.org

Hey Tobias,

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (14 lines)> On 21/03/17 17:43, Ludovic Courtès wrote:
>> I think there’s room for improvement in our nginx config at
>> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.
>> 
>> For instance, I just discovered ‘proxy_cache_lock’ while looking at
>> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
>> in reducing load on hydra.gnu.org.  Surely there are other ways to tweak
>> caching.
>
> Indeed! For reference, here's my cache configuration.
>
> That's right. Now you can all¹ steal some criminally overpriced Belgian
> bandwidth!

Heheh. :-)

Toggle quote (14 lines)>       limit_except GET {        deny all; }
>       proxy_pass                SUPER_SEKRIT_BACKEND;
>
>       # https://www.nginx.com/blog/nginx-caching-guide
>       add_header                X-Cache-Status $upstream_cache_status;
>
>       proxy_cache               default;
>       # We allow only GET requests, so don't waste key space:
>       proxy_cache_key           "$request_uri";
>       proxy_cache_lock          on;
>       proxy_cache_lock_timeout  3h; #yolo
>       proxy_cache_use_stale     error timeout
>                                 http_500 http_502 http_503 http_504;

I didn’t fully understand the docs for the last 3 directives here. For

instance, what happens when 10 clients do GET /nar/xyz-texlive? Do the

9 unlucky clients wait for 3 hours and then get 404?

Anyway, thanks for sharing your tips. :-)

Toggle quote (3 lines)

> Entirely off-topic, but this 'tude is a part of what drew me to Guix in

> the first place. So, like, thanks, in general :-)

:-)

Ludo’.

Ludovic Courtès wrote on 22 Mar 2017 23:22

hydra.gnu.org uses ‘guix publish’ for nars and narinfos

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:87r31pyms2.fsf_-_@gnu.org

Hi again!

Until now hydra.gnu.org was using Hydra (the software) to serve not only

the Web interface but also all the .narinfo and /nar URLs (substitute

meta-data and substitutes).

Starting from now, hydra.gnu.org directs all .narinfo and corresponding

nar requests to ‘guix publish’ instead of Hydra.

‘guix publish’ should be faster and less resource-hungry than Hydra. It

uses in-process gzip for nar compression instead of bzip2 (I chose level

7, which seems to provide compression ratios close to what bzip2

provides with its default compression level, while being 3 times

faster). Unlike Hydra it never forks so for instance, 404 responses for

.narinfo URLs should be quicker. Hopefully, that will improve the

worst-case (cache miss) throughput.

I configured nginx in such a way that the former Hydra-provided /nar

URLs (which are cached in nginx instances, in our

/var/guix/substitute/cache directories, etc.) are still available.

‘guix publish’ uses the /guix/nar URLs while Hydra uses /nar, so the

nginx config redirects to either Hydra or ‘guix publish’ depending on

the URL:

https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/hydra.gnu.org-locations.conf#n29

Hydra-provided .narinfos are still cached here and there; they’ll be

progressively expire and be replaced by ‘guix publish’-provided

.narinfos.

Let me know if you notice anything fishy!

Ludo’.

Ricardo Wurmus wrote on 23 Mar 2017 11:29

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87mvccs2uu.fsf@elephly.net

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)

> Until now hydra.gnu.org was using Hydra (the software) to serve not only

> the Web interface but also all the .narinfo and /nar URLs (substitute

> meta-data and substitutes).

> Starting from now, hydra.gnu.org directs all .narinfo and corresponding

> nar requests to ‘guix publish’ instead of Hydra.

That’s very cool! I’m happy to see more of Hydra replaced.

Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC

https://elephly.net

Mark H Weaver wrote on 23 Mar 2017 19:36

Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos

Recipients:(name . Ludovic Courtès)(address . ludo@gnu.org)

Message-ID:87inmzrgbf.fsf@netris.org

ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (17 lines)> Hi again!
>
> Until now hydra.gnu.org was using Hydra (the software) to serve not only
> the Web interface but also all the .narinfo and /nar URLs (substitute
> meta-data and substitutes).
>
> Starting from now, hydra.gnu.org directs all .narinfo and corresponding
> nar requests to ‘guix publish’ instead of Hydra.
>
> ‘guix publish’ should be faster and less resource-hungry than Hydra.  It
> uses in-process gzip for nar compression instead of bzip2 (I chose level
> 7, which seems to provide compression ratios close to what bzip2
> provides with its default compression level, while being 3 times
> faster).  Unlike Hydra it never forks so for instance, 404 responses for
> .narinfo URLs should be quicker.  Hopefully, that will improve the
> worst-case (cache miss) throughput.

Excellent!  Any improvement in 404 response time will be very helpful.
I've noticed that spikes of narinfo requests resulting in 404 has been a
major source of overloading on Hydra, because these requests cannot be
cached for very long.  The reason: if we cache those failures for N
minutes, this effectively delays the appearance of new nars by N minutes
(if it was requested before that).  This forces us to choose a small N
for negative cache entries, which means the cache is not much help here.

One question: what will happen in the case of multiple concurrent
requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
be run on-demand?  Recall that the nginx proxy will pass all of those
requests through, and not create the cache entry until it has received a
complete response.  This has caused us severe problems with huge nars
such as texinfo-texmf, to the point that we had to crudely block those
nar requests.  Unfortunately, it is not obvious how to block the
associated narinfo requests due to the lack of job name in the URL, so
this results in failures on the client side that must be manually worked
around.

     Thanks,
       Mark

Tobias Geerinckx-Rice wrote on 23 Mar 2017 19:52

Recipients:(address . mhw@netris.org)

Message-ID:25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr

Mark,

On 23/03/17 19:36, Mark H Weaver wrote:

Toggle quote (4 lines)

> One question: what will happen in the case of multiple concurrent

> requests for the same nar? Will multiple nar-pack-and-bzip2 processes

> be run on-demand?

I think this used to be the case with the previous nginx configuration,

but the recent changes pushed by Ludo' were aimed in part at preventing

that.

Toggle quote (2 lines)

> Recall that the nginx proxy will pass all of those requests through,

Are you sure? I was under the impression¹ that this is exactly what

‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please

— anyone! — correct me if I'm misguided.

Kind regards,

T G-R

¹:

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

Attachment: signature.asc

Tobias Geerinckx-Rice wrote on 23 Mar 2017 20:25

Recipients:(address . ludo@gnu.org)(address . 26201@debbugs.gnu.org)

Message-ID:a1f7cae6-0d37-6d6b-8ed9-8fd124fc037c@tobias.gr

Ludo',

On 22/03/17 23:06, Ludovic Courtès wrote:

Toggle quote (9 lines)

> Tobias Geerinckx-Rice <me@tobias.gr> skribis:

>> proxy_cache_lock on;

>> proxy_cache_lock_timeout 3h; #yolo

>> proxy_cache_use_stale error timeout

>> http_500 http_502 http_503 http_504;

> I didn’t fully understand the docs for the last 3 directives here. For

> instance, what happens when 10 clients do GET /nar/xyz-texlive? Do the

> 9 unlucky clients wait for 3 hours and then get 404?

From ‘proxy_cache_lock’ [1]:

“When enabled, only one request at a time will be allowed to populate

a new cache element identified according to the proxy_cache_key

directive by passing a request to a proxied server. Other requests

of the same cache element will either wait for a response to appear

in the cache or the cache lock for this element to be released, up

to the time set by the proxy_cache_lock_timeout directive.”

Hmm. Good point: ‘to appear in the cache’, when we don't cache 404s or

even 410s.

I don't actually know.

Kind regards,

T G-R

[1]:

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

Attachment: signature.asc

Maxim Cournoyer wrote on 24 Mar 2017 03:15

Re: bug#26201: No notification of cache misses when downloading substitutes

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:87efxnzagb.fsf@gmail.com

Hi!

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (21 lines)> On 21/03/17 16:32, dian_cecht@zoho.com wrote:
>> Unless mirror.hydra randomly loses data in it's cache from hydra, it
>> won't be random in the least.
>
> It will. Whether one is first to download from the cache after the
> substitute is built is essentially random.
>
>> Quite frankly I'd like someone else to take a look at this bug,
>
> Glad you agree.
>
>> if for no other reason than I'm not sure if we're communicating clearly
>> with each other here. Most of what you are saying makes no sense
>> whatsoever and seems to miss the point I have attempted to make.
>
> I assure you it does not.
>
> Kind regards,
>
> T G-R

Please allow me to jump in and voice my opinion here. To me it doesn't

make sense to concern the Guix client with implementation details of how

the caching of substitutes happen and its impacts.

This situation is bound to change in the future or become irrelevant

(say, if a new build farm would be able to sustain higher transfer

speeds to the cache mirror), or if the caching implementation changes.

If the current cache building implementation is slow to the point of

being a problem it should be fixed (or documented).

Cheers,

Maxim

Mark H Weaver wrote on 24 Mar 2017 09:12

Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:87y3vvozy5.fsf@netris.org

Hi,

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (15 lines)> On 23/03/17 19:36, Mark H Weaver wrote:
>> One question: what will happen in the case of multiple concurrent
>> requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
>> be run on-demand?
>
> I think this used to be the case with the previous nginx configuration,
> but the recent changes pushed by Ludo' were aimed in part at preventing
> that.
>
>> Recall that the nginx proxy will pass all of those requests through,
>
> Are you sure? I was under the impression¹ that this is exactly what
> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
> — anyone! — correct me if I'm misguided.

I agree that "proxy_cache_lock on" should prevent multiple concurrent

requests for the same URL, but unfortunately its behavior is quite

undesirable, and arguably worse than leaving it off in our case. See:

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

Specifically:

Other requests of the same cache element will either wait for a

response to appear in the cache or the cache lock for this element to

be released, up to the time set by the proxy_cache_lock_timeout

directive.

In our problem case, it takes more than an hour for Hydra to finish

sending a response for the 'texlive-texmf' nar. During that time, the

nar will be slowly sent to the first client while it's being packed and

bzipped on-demand.

IIUC, with "proxy_cache_lock on", we have two choices of how other

client requests will be treated:

(1) If we increase "proxy_cache_lock_timeout" to a huge value, then

there will *no* data sent to the other clients until the first

client has received the entire nar, which means they wait over an

hour before receiving the first byte. I guess this will result in

timeouts on the client side.

(2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients

will get failure responses until the first client has received the

entire nar.

Either way, this would cause users to see the same download failures

(requiring user work-arounds like --fallback) that this fix is intended

to prevent for 'texlive-texmf', but instead of happening only for that

one nar, it will now happen for *all* large nars.

Or at least that's what I'd expect based on my reading of the nginx docs

linked above. I haven't tried it.

IMO, the best solution is to *never* generate nars on Hydra in response

to client requests, but rather to have the build slaves pack and

compress the nars, copy them to Hydra, and then serve them as static

files using nginx.

A far inferior solution, but possibly acceptable and closer to the

current approach, would be to arrange for all concurrent responses for

the same nar to be sent incrementally from a single nar-packing process.

More concretely, while packing and sending a nar response to the first

client, the data would also be written to a file. Subsequent requests

for the same nar would be serviced using the equivalent of:

tail --bytes=+0 --follow FILENAME

This way, no one would have to wait an hour to receive the first byte.

What do you think?

Mark

Ludovic Courtès wrote on 24 Mar 2017 10:25

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87d1d710xc.fsf@gnu.org

Hi!

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (2 lines)

> Tobias Geerinckx-Rice <me@tobias.gr> writes:

[...]

Toggle quote (40 lines)>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case.  See:
>
>   https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
>   Other requests of the same cache element will either wait for a
>   response to appear in the cache or the cache lock for this element to
>   be released, up to the time set by the proxy_cache_lock_timeout
>   directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar.  During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
>     there will *no* data sent to the other clients until the first
>     client has received the entire nar, which means they wait over an
>     hour before receiving the first byte.  I guess this will result in
>     timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
>     will get failure responses until the first client has received the
>     entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning

concurrent compression threads of the same item at the same time, while

also avoiding starvation (proxy_cache_lock_timeout should ensure that

nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small

delays in some cases (if you’re downloading a nar that’s already being

downloaded.)

Toggle quote (5 lines)

> IMO, the best solution is to *never* generate nars on Hydra in response

> to client requests, but rather to have the build slaves pack and

> compress the nars, copy them to Hydra, and then serve them as static

> files using nginx.

The problem is that we want nars to be signed by the master node. Or,

if we don’t require that, we need a PKI that allows us to express the

fact that hydra.gnu.org delegates to the build machines.

Toggle quote (11 lines)> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
>   tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes. I would think that NGINX does something like that for its caching,

but I don’t know exactly when/how.

Other solutions I’ve thought about:

1. Produce narinfos and nars periodically rather than on-demand and

serve them as static files.

pros: better HTTP latency and bandwidth

pros: allows us to add a Content-Length for nars

cons: doesn’t reduce load on hydra.gnu.org

cons: introduces arbitrary delays in delivering nars

cons: difficult/expensive to know what new store items are available

2. Produce a narinfo and corresponding nar the first time they are

requested. So, the first time we receive “GET foo.narinfo”, return

404 and spawn a thread to compute foo.narinfo and foo.nar. Return

200 only when both are ready.

The precomputed nar{,info}s would be kept in a cache and we could

make sure a narinfo and its nar have the same lifetime, which

addresses one of the problems we have.

pros: better HTTP latency and bandwidth

pros: allows us to add a Content-Length for nars

pros: helps keep narinfo/nar lifetime in sync

cons: doesn’t reduce load on hydra.gnu.org

cons: exposes inconsistency between the store contents and the HTTP

response (you may get 404 even if the thing is actually in

store), but maybe that’s not a problem

Thoughts?

Ludo’.

Tobias Geerinckx-Rice wrote on 26 Mar 2017 19:35

Recipients:(address . mhw@netris.org)

Message-ID:1988d01c-1e67-bf47-2b43-cf3551d0651b@tobias.gr

Mark,

On 24/03/17 09:12, Mark H Weaver wrote:

Toggle quote (5 lines)

> IIUC, with "proxy_cache_lock on", we have two choices of how other

> client requests will be treated:

> [badly, ed.]

Eh. You're probably (and disappointingly) right.

When configuring my little cache, I had a clear idea of how such a cache

should work (basically, your last scenario below), then looked at the

nginx documentation to find what I had in mind. ‘proxy_cache_lock’ matched.

I should have been more pessimistic and done more testing.

Shame on me, &c. Too much other things on my mind. :-/

Toggle quote (3 lines)

> Or at least that's what I'd expect based on my reading of the nginx docs

> linked above. I haven't tried it.

I can try to do some simple tests tomorrow.

Toggle quote (5 lines)

> IMO, the best solution is to *never* generate nars on Hydra in response

> to client requests, but rather to have the build slaves pack and

> compress the nars, copy them to Hydra, and then serve them as static

> files using nginx.

A true mirror at last! Do we have the disc space for that?

And could Hydra actually handle compressing *everything*, without an

infinitely growing back-log? I don't have access to any statistics, but

I'm guessing that a fair number of package+versions are never actually

requested, and hence never compressed. This would change that.

Toggle quote (11 lines)> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
> 
>   tail --bytes=+0 --follow FILENAME
> 
> This way, no one would have to wait an hour to receive the first byte.

^ This is so obviously the right solution, that it would be

disappointing if nginx really couldn't be made to do it. It already

buffers proxy responses to a temporary file anyway...

Kind regards,

T G-R

Attachment: signature.asc

Ludovic Courtès wrote on 27 Mar 2017 13:20

Bandwidth when retrieving substitutes

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:8760ivm0dx.fsf_-_@gnu.org

Hi there!

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (8 lines)

> ‘guix publish’ should be faster and less resource-hungry than Hydra. It

> uses in-process gzip for nar compression instead of bzip2 (I chose level

> 7, which seems to provide compression ratios close to what bzip2

> provides with its default compression level, while being 3 times

> faster). Unlike Hydra it never forks so for instance, 404 responses for

> .narinfo URLs should be quicker. Hopefully, that will improve the

> worst-case (cache miss) throughput.

Another interesting data point on the client side this time:

Toggle snippet (37 lines)

$ wget -O- https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |bunzip2 >/dev/null

--2017-03-27 13:12:50-- https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17

Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720

Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/x-nix-archive]

Saving to: ‘STDOUT’

- [ <=> ] 53.01M 9.29MB/s in 5.5s

2017-03-27 13:12:55 (9.57 MB/s) - written to stdout [55582050]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |gunzip >/dev/null

--2017-03-27 13:13:00-- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17

Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720

Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/x-nix-archive]

Saving to: ‘STDOUT’

- [ <=> ] 59.19M 40.8MB/s in 1.4s

2017-03-27 13:13:02 (40.8 MB/s) - written to stdout [62068901]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 >/dev/null

--2017-03-27 13:15:58-- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17

Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720

Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/x-nix-archive]

Saving to: ‘STDOUT’

- [ <=> ] 59.19M 42.5MB/s in 1.4s

2017-03-27 13:16:00 (42.5 MB/s) - written to stdout [62068901]

40�MB/s vs. 10�MB/s! (Both items were cached on mirror.hydra.gnu.org.)

IOW, bunzip2 was the bottleneck when retrieving substitutes (and that’s

on an i7.) With ‘perf timechart’ we see that bunzip2 is indeed busy

all the time right from the start.

Ludo’.

Tobias Geerinckx-Rice wrote on 27 Mar 2017 20:47

Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos

Recipients:(address . 26201@debbugs.gnu.org)(address . ludo@gnu.org)

Message-ID:bad0ed66-6c44-7147-fc3d-01622cf6c62f@tobias.gr

Guix,

On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:

Toggle quote (2 lines)

> I can try to do some simple tests tomorrow.

Two observations:

- ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;

‘proxy_cache_lock_age’ must also be set to an equally ridiculously

long span. Otherwise, multiple requests will still be sent to ‘guix

publish’ if they are more than 5s apart. Bleh.

(The problem then becomes that clients will stall while the file is

being cached, as explained by Mark. curl patiently waited.)

- Say client A requests a nar from ‘guix publish’ (no nginx involved).

If another client requests the same nar while A's still downloading,

‘guix publish’ will... silently drop A's connection?

I was not expecting this.

Kind regards,

T G-R

Attachment: signature.asc

Ludovic Courtès wrote on 28 Mar 2017 16:47

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)

Message-ID:87wpb931cd.fsf@gnu.org

Hey!

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (13 lines)> On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:
>> I can try to do some simple tests tomorrow.
>
> Two observations:
>
> - ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;
>   ‘proxy_cache_lock_age’ must also be set to an equally ridiculously
>   long span. Otherwise, multiple requests will still be sent to ‘guix
>   publish’ if they are more than 5s apart. Bleh.
>
>   (The problem then becomes that clients will stall while the file is
>    being cached, as explained by Mark. curl patiently waited.)

Setting ‘proxy_cache_lock_timeout’ to 5s is reasonable I think: if
you’re unlucky, you wait for 5 seconds, and then we get ‘guix publish’
threads serving the same request in parallel; in the most common case,
there’s only ever one instance of a given request being served at a
given time.

Toggle quote (5 lines)

> - Say client A requests a nar from ‘guix publish’ (no nginx involved).

> If another client requests the same nar while A's still downloading,

> ‘guix publish’ will... silently drop A's connection?

> I was not expecting this.

That would be a bug. Do you have an easy way to reproduce?

Thanks,

Ludo’.

Ludovic Courtès wrote on 8 Apr 2017 23:17

control message for bug #26201

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87wpauoayn.fsf@gnu.org

retitle 26201 Downloading substitutes is too slow upon nginx cache misses

Ludovic Courtès wrote on 8 Apr 2017 23:18

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87vaqeoayc.fsf@gnu.org

severity 26201 important

Ludovic Courtès wrote on 17 Apr 2017 23:36

Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87inm2ogxl.fsf@gnu.org

Hello,

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (28 lines)> Other solutions I’ve thought about:
>
>   1. Produce narinfos and nars periodically rather than on-demand and
>      serve them as static files.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: introduces arbitrary delays in delivering nars
>      cons: difficult/expensive to know what new store items are available
>
>   2. Produce a narinfo and corresponding nar the first time they are
>      requested.  So, the first time we receive “GET foo.narinfo”, return
>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>      200 only when both are ready.
>
>      The precomputed nar{,info}s would be kept in a cache and we could
>      make sure a narinfo and its nar have the same lifetime, which
>      addresses one of the problems we have.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      pros: helps keep narinfo/nar lifetime in sync
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: exposes inconsistency between the store contents and the HTTP
>            response (you may get 404 even if the thing is actually in
>            store), but maybe that’s not a problem

The ‘wip-publish-baking’ implements #2 as a new option to ‘guix

publish’. It gives some control on the upper bound on CPU usage since

we can specify how many worker threads are used.

I’ll finish it soon so we can experiment with it.

Thanks,

Ludo’.

Ludovic Courtès wrote on 18 Apr 2017 23:27

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87o9vts8xb.fsf@gnu.org

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (17 lines)>   2. Produce a narinfo and corresponding nar the first time they are
>      requested.  So, the first time we receive “GET foo.narinfo”, return
>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>      200 only when both are ready.
>
>      The precomputed nar{,info}s would be kept in a cache and we could
>      make sure a narinfo and its nar have the same lifetime, which
>      addresses one of the problems we have.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      pros: helps keep narinfo/nar lifetime in sync
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: exposes inconsistency between the store contents and the HTTP
>            response (you may get 404 even if the thing is actually in
>            store), but maybe that’s not a problem

Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.

I’ll set up a test instance shortly.

Ludo’.

Ludovic Courtès wrote on 19 Apr 2017 16:24

Heads-up: hydra.gnu.org uses ‘guix publish --cache’

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87vaq0o4pd.fsf_-_@gnu.org

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (23 lines)> ludo@gnu.org (Ludovic Courtès) skribis:
>
>>   2. Produce a narinfo and corresponding nar the first time they are
>>      requested.  So, the first time we receive “GET foo.narinfo”, return
>>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>>      200 only when both are ready.
>>
>>      The precomputed nar{,info}s would be kept in a cache and we could
>>      make sure a narinfo and its nar have the same lifetime, which
>>      addresses one of the problems we have.
>>
>>      pros: better HTTP latency and bandwidth
>>      pros: allows us to add a Content-Length for nars
>>      pros: helps keep narinfo/nar lifetime in sync
>>      cons: doesn’t reduce load on hydra.gnu.org
>>      cons: exposes inconsistency between the store contents and the HTTP
>>            response (you may get 404 even if the thing is actually in
>>            store), but maybe that’s not a problem
>
> Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.
>
> I’ll set up a test instance shortly.

I ended up deploying it on hydra.gnu.org directly. :-)

Progressively the cached nar/narinfo at {,mirror.}hydra.gnu.org will be

replaced with the new ones. Now, the /guix/nar URLs have a

‘Content-Length’ header you should see a progress bar when downloading

one of these:

Toggle snippet (11 lines)

$ ./pre-inst-env guix build vim

The following file will be downloaded:

/gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566

@ substituter-started /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 /gnu/store/rnpz1svz4aw75kibb5qb02hhccy2m4y0-guix-0.12.0-7.aabe/libexec/guix/substitute

Downloading https://mirror.hydra.gnu.org/guix/nar/gzip/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 (23.4MiB installed)...

vim-8.0.0566 7.8MiB 385KiB/s 00:21 [####################] 100.0%

@ substituter-succeeded /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566

/gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566

This new caching scheme should put an end to caching of truncated nars

in nginx, which has been too frequent lately.

It should also mostly avoid the problem where we have a narinfo for

something but not the corresponding nar, which leads to user frustration

(‘guix’ reports that the thing will be downloaded and eventually fails

with 410 “Gone” while trying to download it), because ‘guix publish’

caches narinfo/nar pairs together. I say “mostly” because nginx caching

in front of ‘guix publish’ makes things more complicated.

The bandwidth issue reported at the beginning of this thread should be

mostly fixed: serving a narinfo or nar URL is now just sendfile(2),

which is the best we can do; 404s on narinfo should be immediate.

Of course, when the machine is overloaded, we’ll still experience

increased latency and lower bandwidth, but that should be less acute

than with the previous setting.

Please report any problems you may have!

Ludo’.

Ludovic Courtès wrote on 25 Apr 2017 12:11

control message for bug #26201

Recipients:(address . control@debbugs.gnu.org)

Message-ID:87h91cdcfv.fsf@gnu.org

tags 26201 fixed

close 26201

Mark H Weaver wrote on 3 May 2017 10:11

Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos

Recipients:(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)

Message-ID:877f1yjr64.fsf@netris.org

Reviving an old thread...

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (12 lines)>> IMO, the best solution is to *never* generate nars on Hydra in response
>> to client requests, but rather to have the build slaves pack and
>> compress the nars, copy them to Hydra, and then serve them as static
>> files using nginx.
>
> A true mirror at last! Do we have the disc space for that?
>
> And could Hydra actually handle compressing *everything*, without an
> infinitely growing back-log? I don't have access to any statistics, but
> I'm guessing that a fair number of package+versions are never actually
> requested, and hence never compressed. This would change that.

Actually, IIUC, the build slaves are _already_ compressing everything,
and they always have.  They compress the build outputs for transmission
back to the master machine.  In the current framework, the master
machine immediately decompresses them upon receipt, and this compression
and decompression is considered an internal detail of the network
transport.

Currently, the master machine stores all build outputs uncompressed in
/gnu/store, and then later recompresses them for transmission to users
and other build slaves.  The needless decompression and recompression is
a tremendous amount of wasted work on our master machine.  That it's all
stored uncompressed is also a significant waste of disk space, which
leads to significant additional costs during garbage collection.

Essentially, my proposal is for the build slaves to be modified to
prepare the compressed NARs in a form suitable for delivery to end users
(and other build slaves) with minimal processing by our master node.
The master node would be significantly modified to receive, store, and
forward NARs explicitly, without ever decompressing them.  As far as I
can tell, this would mean strictly less work to do and less data to
store for every machine and in every case.

Ludovic has pointed out that we cannot do this because Hydra must add
its digital signature, and that this digital signature is stored within
the compressed NAR.  Therefore, we cannot avoid having the master
machine decompress and recompress every NAR that is delivered to users.

In my opinion, we should change the way we sign NARs.  Signatures should
be external to the NARs, not internal.  Not only would this allow us to
decentralize production of our NARs, but more importantly, it would
enable a community of independent builders to add their signatures to a
common pool of NARs.  Having a common pool of NARs enables us to store
these NARs in a shared distribution network without duplication.  We
cannot even have a common pool of NARs if they contain
build-farm-specific data such as signatures.

Thoughts?

      Mark

Ludovic Courtès wrote on 3 May 2017 11:25

Recipients:(name . Mark H Weaver)(address . mhw@netris.org)

Message-ID:87k25ywaul.fsf@gnu.org

Hello,

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (22 lines)> Actually, IIUC, the build slaves are _already_ compressing everything,
> and they always have.  They compress the build outputs for transmission
> back to the master machine.  In the current framework, the master
> machine immediately decompresses them upon receipt, and this compression
> and decompression is considered an internal detail of the network
> transport.
>
> Currently, the master machine stores all build outputs uncompressed in
> /gnu/store, and then later recompresses them for transmission to users
> and other build slaves.  The needless decompression and recompression is
> a tremendous amount of wasted work on our master machine.  That it's all
> stored uncompressed is also a significant waste of disk space, which
> leads to significant additional costs during garbage collection.
>
> Essentially, my proposal is for the build slaves to be modified to
> prepare the compressed NARs in a form suitable for delivery to end users
> (and other build slaves) with minimal processing by our master node.
> The master node would be significantly modified to receive, store, and
> forward NARs explicitly, without ever decompressing them.  As far as I
> can tell, this would mean strictly less work to do and less data to
> store for every machine and in every case.

I agree that the redundant compression/decompression is terrible. Yet

I’m not sure how to architect a solution where compression is performed

by build machines. The main issue is that offloading and publication

are two independent mechanisms, as things are.

Maybe each build machine for a build farm use-case we could have a

“semi-offloading” mechanism whereby the master spawns a remote build

without retrieving its result, something akin to:

GUIX_DAEMON_SOCKET=ssh://build-machine.example.org \

guix build /gnu/store/…-foo.drv

In addition, the build machine would publish its result via ‘guix

publish’, which the master could then simply mirror and cache with

nginx.

There’s the issue of signatures, but perhaps we could have a more

sophisticated PKI and have the master delegate to build machines…

Then there are other issues such as that of synchronizing the TTL of a

narinfo and its corresponding nar, which --cache addresses.

Tricky!

Toggle quote (14 lines)> Ludovic has pointed out that we cannot do this because Hydra must add
> its digital signature, and that this digital signature is stored within
> the compressed NAR.  Therefore, we cannot avoid having the master
> machine decompress and recompress every NAR that is delivered to users.
>
> In my opinion, we should change the way we sign NARs.  Signatures should
> be external to the NARs, not internal.  Not only would this allow us to
> decentralize production of our NARs, but more importantly, it would
> enable a community of independent builders to add their signatures to a
> common pool of NARs.  Having a common pool of NARs enables us to store
> these NARs in a shared distribution network without duplication.  We
> cannot even have a common pool of NARs if they contain
> build-farm-specific data such as signatures.

Currently the signature is in the narinfos, not in nars proper¹. So we

can already add signatures on an externally provided nar, for instance.

There’s a silly limitation currently, which is that the signature is

computed over all the fields of the narinfo. That’s silly because it

means that if you change, say, the compression format or the URL of the

nar, then the signature becomes invalid. We should fix that at some

point.

Ludo’.

¹ For ‘guix publish’. ‘guix archive --export’ appends a signature to

the nar set.

Your comment

This issue is archived.

To comment on this conversation send an email to 26201@debbugs.gnu.org

is:open	open issues
is:done	closed issues
submitter:<who>	search issue submitter
author:<who>	search by message author
date:yesterday..now	search by issue date
mdate:3m..2d	search by message date