From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 07:10:19 2021 Received: (at 51787) by debbugs.gnu.org; 26 Nov 2021 12:10:19 +0000 Received: from localhost ([127.0.0.1]:58858 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mqa3N-0007iN-SM for submit@debbugs.gnu.org; Fri, 26 Nov 2021 07:10:19 -0500 Received: from mira.cbaines.net ([212.71.252.8]:47050) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mqa3L-0007iB-Vb for 51787@debbugs.gnu.org; Fri, 26 Nov 2021 07:10:08 -0500 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa]) by mira.cbaines.net (Postfix) with ESMTPSA id 3386927BBE9; Fri, 26 Nov 2021 12:10:07 +0000 (GMT) Received: from capella (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 03517c3f; Fri, 26 Nov 2021 12:10:06 +0000 (UTC) References: <87o86pegr3.fsf@gnu.org> <87zgpuhig7.fsf@gnu.org> User-agent: mu4e 1.6.6; emacs 27.2 From: Christopher Baines To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#51787: GC takes more than 9 hours on berlin Date: Thu, 25 Nov 2021 13:24:17 +0000 In-reply-to: <87zgpuhig7.fsf@gnu.org> Message-ID: <87wnkv15k3.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 51787 Cc: Mathieu Othacehe , 51787@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.2 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ludovic Court=C3=A8s writes: > =E2=80=A2 When discussing together with Chris Baines yesterday on IRC, = we > found that the sqlite WAL file was 8=C2=A0GiB. I later ran: > > PRAGMA wal_checkpoint(TRUNCATE); > > which emptied it immediately. However, GC time wasn=E2=80=99t partic= ularly > different today. So, as I understand it, the WAL is made up of pages, and checking for this db, I think they're the current default size of 4096 bytes. sqlite> PRAGMA page_size; 4096 From=20looking at the code, the wal_autocheckpoint value is set to 40000: /* Increase the auto-checkpoint interval to 40000 pages. This seems enough to ensure that instantiating the NixOS system derivation is done in a single fsync(). */ if (mode =3D=3D "wal" && sqlite3_exec(db, "pragma wal_autocheckpoint = =3D 40000;", 0, 0, 0) !=3D SQLITE_OK) throwSQLiteError(db, "setting autocheckpoint interval"); https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/local-store.cc= #n253 This means you'd expect the WAL to be in the region of 40000*4096 bytes, or ~160MB. Assuming the autocheckpointing is keeping up... it doesn't look to be, since the file is now much larger than this. As described here [1], the automatic checkpoints are PASSIVE ones, which has the advantage of not interrupting any readers or writers, but can also do nothing if it's being blocked by readers or writers. 1: https://www.sqlite.org/wal.html#application_initiated_checkpoints What I've found while writing the Guix Build Coordinator, is that when the service is busy (usually new builds being submitted, plus lots of builds happening), the PASSIVE checkpoints aren't sufficient. To supplement them, there's a regular check that looks at the size of the WAL file, and runs a TRUNCATE checkpoint, which is a FULL checkpoint (which blocks new writers), plus truncating the WAL file once it's finished checkpointing the entire WAL. The truncating is mostly so that it's easier to monitor the size of the WAL, by checking the size of the file. I feel like I need to defend SQLite at this point. Tuning the configuration of a database to get acceptable performance is the norm, I had to tune the PostgreSQL configuration for data.guix.gnu.org to improve the performance. It's easier to get in to trouble with SQLite because it's a lower level too, and requires you to actually initiate things like checkpoints or periodic optimisation if you want them to happen. Unfortunately, I don't know enough about the internals of the daemon to say anything specific though. > =E2=80=A2 =E2=80=98db.sqlite=E2=80=99 weighs in at 19=C2=A0GiB (!) so p= erhaps there=E2=80=99s something to > do, like the =E2=80=9CVACUUM=E2=80=9D thing maybe. Chris? Doing a VACCUM might address some fragmentation and improve performance, it's probably worth trying. > =E2=80=A2 Stracing the session=E2=80=99s guix-daemon process during GC = suggests that > most of the time goes into I/O from =E2=80=98db.sqlite=E2=80=99. It= =E2=80=99s not > surprising because that GC phase is basically about browsing the > database, but it does seem to take a little too long for each store > item. At least the way I've approached finding and fixing the poor performance issues in the Guix Build Coordinator is through adding instrumentation, so just recording the time that calling particular procedures takes, and then logging if it's longer than some threshold. Since this issue is about Cuirass, there's also the possibility of avoiding the problems of a large store, by avoiding having a large store. That's what bordeaux.guix.gnu.org does, and I thought it was part of the plan for ci.guix.gnu.org (at least around a year ago)? --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmGgzpxfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh aW5lcy5uZXQACgkQXiijOwuE9XesMg//X7T5tjRMpZwOF/VJ94u+m8rUYwZ10qkc sttZnXJnXOLrw9kK1VgB2Dp5foYisDhRoXJ9LEb57tQbjwTWMcR3whD1Cs64/Urw QZa2dYRzUDuJyVTenTNU+pOedakpKhEDG2dQbAIXxCIoT50cwTxg7JZrpgT7mQa4 Cx3vxBlpucM3SeQvQlBcg36tExLqsN633kMhye8GLdpeGj23vE3tc8doi87sFa/z LuFyWXczLsor9xNuAPh9CHX+k1hRNS+7DXxV9Ie//ZQkRr3NCUIiPPswq6DyiDn5 dSWU2MgcgrksZZdNodUMLn+ohONnExpxgowVWC9jtTqFRtaym1Mh3CXF8gdoNPsq j7w3x5zpGzSnUk7hyq1uXfUWikX+C3ps4gMcZRbk5a1kHBczDIn3VOY6ooISJjhr eg6/SAujAdT9D/6Hb649EuMOUO+BgYXK1Jlmprh1IIxAifaQRp/6qaBigjnDgyJ4 EfYGgAzLhnLdQZzD0lAJP4pwnZJoRby7Ww+S4aCo2ohpe9P39lp9h3pD4dqXFROf fB8/ev9wkBxx88R7sAyzDynjC7l20Fr/3tvnkC8Uqopq06uwvmQpPzmE75z8Mf4P yRj4U5ZhJc7K6AgIqjxAzYbRHJiS8zOUCkfmj5fNLZeYh1CNnlYn2yiNyV4x6jw8 C7msGLYcll8= =rYUT -----END PGP SIGNATURE----- --=-=-=--