From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 13 12:02:40 2016 Received: (at 24937) by debbugs.gnu.org; 13 Dec 2016 17:02:40 +0000 Received: from localhost ([127.0.0.1]:40938 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cGqTP-0001Es-KB for submit@debbugs.gnu.org; Tue, 13 Dec 2016 12:02:39 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50638) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cGqTO-0001Eg-7Q for 24937@debbugs.gnu.org; Tue, 13 Dec 2016 12:02:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cGqTE-0006Q7-Pb for 24937@debbugs.gnu.org; Tue, 13 Dec 2016 12:02:32 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47565) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cGqT7-0006OQ-Rp; Tue, 13 Dec 2016 12:02:21 -0500 Received: from 212-91-237-188.dynamic.dns-net.de ([212.91.237.188]:54652 helo=pluto) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cGqT6-0000aA-VC; Tue, 13 Dec 2016 12:02:21 -0500 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Mark H Weaver Subject: Re: bug#24937: "deleting unused links" GC phase is too slow References: <87wpg7ffbm.fsf@gnu.org> <87lgvm4lzu.fsf@gnu.org> <87twaaa6j9.fsf@netris.org> <87twaa2vjx.fsf@gnu.org> <87lgvm9sgq.fsf@netris.org> <87d1gwvgu0.fsf@gnu.org> <87wpf4yoz0.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 23 Frimaire an 225 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-unknown-linux-gnu Date: Tue, 13 Dec 2016 18:02:18 +0100 In-Reply-To: <87wpf4yoz0.fsf@netris.org> (Mark H. Weaver's message of "Tue, 13 Dec 2016 07:48:19 -0500") Message-ID: <87fulrsqxx.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -8.1 (--------) X-Debbugs-Envelope-To: 24937 Cc: Ricardo Wurmus , 24937@debbugs.gnu.org, Roel Janssen X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -8.1 (--------) Hello Mark, Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> I did some measurements with the attached program on chapters, which is >> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It >> has 600k entries in /gnu/store/.links. > > I just want to point out that 600k inodes use 150 megabytes of disk > space on ext4, which is small enough to fit in the cache, so the disk > I/O will not be multiplied for such a small test case. Right. That=E2=80=99s the only spinning-disk machine I could access without problem. :-/ Ricardo, Roel: would you be able to run that links-traversal.c from on a machine with a big store, as described at ? >> Semi-interleaved is ~12% slower here (not sure how reproducible that is >> though). > > This directory you're testing on is more than an order of magnitude > smaller than Hydra's when it's full. Unlike in your test above, all of > the inodes in Hydra's store won't fit in the cache. Good point. I=E2=80=99m trying my best to get performance figures, there= =E2=80=99s no doubt we could do better! > In my opinion, the reason Hydra performs so poorly is because efficiency > and scalability are apparently very low priorities in the design of the > software running on it. Unfortunately, I feel that my advice in this > area is discarded more often than not. Well, as you know, I=E2=80=99m currently traveling, yet I take the time to answer your email at night; I think this should suggest that far from discarding your advice, I very much value it. I=E2=80=99m a maintainer though, so I=E2=80=99m trying to understand the pr= oblem better. It=E2=80=99s not just about finding the =E2=80=9Coptimal=E2=80=9D solution,= but also about finding a tradeoff between the benefits and the maintainability costs. >> sort.c in Coreutils is very big, and we surely don=E2=80=99t want to dup= licate >> all that. Yet, I=E2=80=99d rather not shell out to =E2=80=98sort=E2=80= =99. > > The "shell" would not be involved here at all, just the "sort" program. > I guess you dislike launching external processes? Can you explain why? I find that passing strings around among programs is inelegant (subjective), but I don=E2=80=99t think you=E2=80=99re really looking to ar= gue about that, are you? :-) It remains that, if invoking =E2=80=98sort=E2=80=99 appears to be preferabl= e *both* from performance and maintenance viewpoints, then it=E2=80=99s a good choice. T= hat may be the case, but again, I prefer to have figures to back that. >> Do you know how many entries are in .links on hydra.gnu.org? > > "df -i /gnu" indicates that it currently has about 5.5M inodes, but > that's with only 29% of the disk in use. A few days ago, when the disk > was full, assuming that the average file size is the same, it may have > had closer to 5.5M / 0.29 ~=3D 19M inodes, OK, good to know. Thanks! Ludo=E2=80=99.