From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 23 06:12:49 2019 Received: (at 35350) by debbugs.gnu.org; 23 Apr 2019 10:12:49 +0000 Received: from localhost ([127.0.0.1]:52470 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hIsPx-0000L7-53 for submit@debbugs.gnu.org; Tue, 23 Apr 2019 06:12:49 -0400 Received: from eggs.gnu.org ([209.51.188.92]:56649) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hIsPu-0000Ks-Mf for 35350@debbugs.gnu.org; Tue, 23 Apr 2019 06:12:48 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:58403) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hIsPp-00006A-8v; Tue, 23 Apr 2019 06:12:41 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=50668 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hIsPm-0001Ed-6m; Tue, 23 Apr 2019 06:12:39 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mark H Weaver Subject: Re: bug#35350: Some compile output still leaks through with --verbosity=1 References: <87mukkfd2j.fsf@netris.org> <87r29v2jz2.fsf@gnu.org> <87ftq9silk.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 4 =?utf-8?Q?Flor=C3=A9al?= an 227 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 23 Apr 2019 12:12:34 +0200 In-Reply-To: <87ftq9silk.fsf@netris.org> (Mark H. Weaver's message of "Mon, 22 Apr 2019 19:52:28 -0400") Message-ID: <87imv5jai5.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35350 Cc: 35350@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Mark, Mark H Weaver skribis: > Ludovic Court=C3=A8s writes: > >> Mark H Weaver skribis: >> >>> Sometimes when compiling a package with --verbosity=3D1, some parts of = the >>> compile output leak through. For example, see the transcript below. >> >> Weird. > > FWIW, a few observations, possibly relevant: > > (1) Each chunk of leaked output begins with 1 or 2 Unicode Replacement > characters (U+FFFD). In the transcript I provided, the first leak > began with 1 replacement char, and all later leaks began with 2. > > (2) The replacement characters are immediately followed by > "@ build-log 30033 4096\n", and that string is also sprinkled > throughout the leaked output, with approximately ~4060-4070 > characters of leaked output between each occurrence of > "@ build-log 30033 4096\n". Indeed. I managed to reproduce it while building modem-manager. I strace=E2=80=99d =E2=80=98guix build=E2=80=99 with an additional =E2=80=98p= k=E2=80=99=C2=B9 to see what happens, and here=E2=80=99s what leads to the wrong =E2=80=9Cwrite(2, "=EF=BF=BD@ build-= log=E2=80=A6")=E2=80=9D call: --8<---------------cut here---------------start------------->8--- read(13, "gmlo\0\0\0\0", 8) =3D 8 read(13, "\27\20\0\0\0\0\0\0", 8) =3D 8 read(13, "@ build-log 22090 4096\n [=E2=80=A6] warning: \342\200", 4119) = =3D 4119 read(13, "\0", 1) =3D 1 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) =3D 25 write(1, "\n", 1) =3D 1 write(1, ";;; (write 985 <> 22090 4096)\n", 30) =3D 30 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 3111)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 2103)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 1095)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 88 <> 22090 87)\n", 27) =3D 27 write(2, "\r\33[K\\ 'build' phase", 19) =3D 19 [=E2=80=A6] write(2, "\r\33[K\\ 'build' phase", 19) =3D 19 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1 <> #f 0)\n", 22) =3D 22 read(13, "gmlo\0\0\0\0", 8) =3D 8 read(13, "\27\20\0\0\0\0\0\0", 8) =3D 8 read(13, "@ build-log 22090 4096\n\230g_simple_async_result_take_error\342\= 200\231 is deprecated[=E2=80=A6]", 4119) =3D 4119 read(13, "\0", 1) =3D 1 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) =3D 25 write(2, "\357\277\275@ build-log 22090 4096\n", 26) =3D 26 --8<---------------cut here---------------end--------------->8--- The third read(2) call here ends on a partial UTF-8 sequence for LEFT SINGLE QUOTATION MARK (we get the first two bytes of a three byte sequence.) What happens is that =E2=80=98process-stderr=E2=80=99 in (guix store) gets = that byte string from the daemon, passes it through =E2=80=98read-maybe-utf8-string= =E2=80=99, which replaces the last two bytes with REPLACEMENT CHARACTER, which is itself a 3-byte sequence. Thus, we have this extra byte that=E2=80=99s being inserted. That confuses= the whole machinery since the build log was announced as being 4096-byte long, and it=E2=80=99s now 4097-byte long. Internally, =E2=80=98build-event-output-port=E2=80=99 keeps the last byte o= f the REPLACEMENT CHARACTER sequence in the =E2=80=98%fragments=E2=80=99 buffer. Consequently, the =E2=80=9C@ build-log=E2=80=9D string that comes next does= n=E2=80=99t start on a newline, and thus it is considered build output. Since the first byte does not constitute a valid UTF-8 sequence, another REPLACEMENT CHARACTER is inserted there when it gets printed. So =E2=80=98build-event-output-port=E2=80=99 is working as expected. The p= roblem is the first layer of UTF-8 decoding that happens in =E2=80=98process-stderr=E2=80= =99, in the =E2=80=98%stderr-next=E2=80=99 case. We would need to disable it, but only= if the build output port is =E2=80=98build-event-output-port=E2=80=99 (i.e., it=E2=80=99= s capable of interpreting =E2=80=9Cmultiplexed build output=E2=80=9D correctly.) Thanks, Ludo=E2=80=99. =C2=B9 pk: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/guix/status.scm b/guix/status.scm index cbea4151f2..4dcbcb0c1f 100644 --- a/guix/status.scm +++ b/guix/status.scm @@ -717,6 +717,7 @@ The second return value is a thunk to retrieve the current state." (pointer->bytevector ptr count))) (define (write! bv offset count) + (pk 'write count '<> %build-output-pid %build-output-left) (if %build-output-pid (let ((keep (min count %build-output-left))) (set! %build-output --=-=-=--