[PATCH] Fix guile-fibers resource leak

  • Done
  • quality assurance status badge
Details
2 participants
  • Caleb Ristvedt
  • Christopher Baines
Owner
unassigned
Submitted by
Caleb Ristvedt
Severity
normal
C
C
Caleb Ristvedt wrote on 22 May 2020 03:52
(address . guix-patches@gnu.org)
87zha0zqft.fsf@cune.org
This adds a patch to guile-fibers to fix a resource leak that caused
file descriptors to be opened and never closed with each invocation of
`run-fibers'. This is presumably what was causing the tests to fail, as
guile will abort when it gets EMFILE while attempting to create a new
thread. I've verified that it builds on my system, but it's only a
4-core machine, and the rate at which file descriptors leak scales with
the number of cores, so it's possible it would have built successfully
here regardless. Could someone with access to a system with more cores
verify that it now builds properly there?

Hopefully a bug fix release will show up soon enough and we can get rid
of this.

- reepca
From 659fa6b70cb8364187753e240076cdb107320070 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Thu, 21 May 2020 20:30:58 -0500
Subject: [PATCH] gnu: guile-fibers: Add patch to fix resource leak.

guile-fibers@1.0.0 has a resource leak where run-fibers will only destroy one
scheduler, but it creates as many as there are cpu cores by default (see
https://github.com/wingo/fibers/issues/36). This causes the tests to fail on
systems with many cores, and can cause guile to crash under certain
circumstances. This fixes that resource leak. At present neither git master
nor the latest release has fixed this yet.

* gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch: new patch.
* gnu/local.mk: add it to the list of patches.
* gnu/packages/guile-xyz.scm (guile-fibers): use it.
---
gnu/local.mk | 1 +
gnu/packages/guile-xyz.scm | 5 +++-
...guile-fibers-destroy-peer-schedulers.patch | 24 +++++++++++++++++++
3 files changed, 29 insertions(+), 1 deletion(-)
create mode 100644 gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch

Toggle diff (60 lines)
diff --git a/gnu/local.mk b/gnu/local.mk
index 1d9de9a57e..2f24f892b1 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -1053,6 +1053,7 @@ dist_patch_DATA = \
%D%/packages/patches/guile-3.0-relocatable.patch \
%D%/packages/patches/guile-linux-syscalls.patch \
%D%/packages/patches/guile-3.0-linux-syscalls.patch \
+ %D%/packages/patches/guile-fibers-destroy-peer-schedulers.patch \
%D%/packages/patches/guile-gdbm-ffi-support-gdbm-1.14.patch \
%D%/packages/patches/guile-present-coding.patch \
%D%/packages/patches/guile-rsvg-pkgconfig.patch \
diff --git a/gnu/packages/guile-xyz.scm b/gnu/packages/guile-xyz.scm
index 674b1f922b..a1deee32d1 100644
--- a/gnu/packages/guile-xyz.scm
+++ b/gnu/packages/guile-xyz.scm
@@ -523,7 +523,10 @@ Unix-style DSV format and RFC 4180 format.")
(("#:use-module \\(fibers\\)")
(string-append "#:use-module (fibers)\n"
"#:use-module (ice-9 threads)\n")))
- #t))))
+ #t))
+ (patches
+ ;; fixes a resource leak that causes crashes in the tests
+ (search-patches "guile-fibers-destroy-peer-schedulers.patch"))))
(build-system gnu-build-system)
(arguments
'(;; The code uses 'scm_t_uint64' et al., which are deprecated in 3.0.
diff --git a/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch b/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch
new file mode 100644
index 0000000000..8bb7153153
--- /dev/null
+++ b/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch
@@ -0,0 +1,24 @@
+Fibers 1.0.0 has a bug in run-fibers in which peer schedulers aren't destroyed -
+so if you had 4 cores, 1 would be destroyed when run-fibers returned, but the
+other 3 would stay around. Each scheduler uses 3 file descriptors, so for
+machines with many cores, this resource leak adds up quickly - quickly enough
+that the test suite can even fail because of it.
+
+See https://github.com/wingo/fibers/issues/36.
+
+This fixes that. It should be safe to destroy the peer schedulers at the given
+point because the threads that could be running them are all either dead or the
+current thread.
+
+As of May 21, 2020, this bug still existed in the 1.0.0 (latest) release and in
+git master.
+--- a/fibers.scm 2020-05-21 18:38:06.890690154 -0500
++++ b/fibers.scm 2020-05-21 18:38:56.395686693 -0500
+@@ -137,5 +137,6 @@
+ (%run-fibers scheduler hz finished? affinity))
+ (lambda ()
+ (stop-auxiliary-threads scheduler)))))
++ (for-each destroy-scheduler (scheduler-remote-peers scheduler))
+ (destroy-scheduler scheduler)
+ (apply values (atomic-box-ref ret))))))
+
--
2.26.2
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEdNapMPRLm4SepVYGwWaqSV9/GJwFAl7HMGYACgkQwWaqSV9/
GJyeoAf/d/jVHgzjj4V/5g2JAfGhzz02VaFoI8xi5hTR/QzGuV1oHtQTqOW52P//
L2EXz4U0jA2fKoBn3wrmu2J8X87N/o02G+uai42MFshOVn8wjqrtYOm9TIVLYZPp
oQYusoklnR123viu/QEABfzP7GF0u3dizMqF/q1ukdTMb+fNy7sLZ/UrYl7wimS3
JyRq2pisaOFJWxSQkg8WROsPHxOQBGE8/2aPp6KdTu2QZpuyDm8pYApd4/rG8ZrY
Xrx1naejA8SA5GyEmdkAZWbphze3l2Hj0aNTa8PzHfsa7sRuod2wvYgLlVmg3E3g
1+SlYaBnTAHCsLyUqx62t29p/harLA==
=oNOA
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 22 May 2020 19:44
87eerbc19g.fsf@cbaines.net
Caleb Ristvedt <caleb.ristvedt@cune.org> writes:

Toggle quote (10 lines)
> This adds a patch to guile-fibers to fix a resource leak that caused
> file descriptors to be opened and never closed with each invocation of
> `run-fibers'. This is presumably what was causing the tests to fail, as
> guile will abort when it gets EMFILE while attempting to create a new
> thread. I've verified that it builds on my system, but it's only a
> 4-core machine, and the rate at which file descriptors leak scales with
> the number of cores, so it's possible it would have built successfully
> here regardless. Could someone with access to a system with more cores
> verify that it now builds properly there?

I've tried this on bayfront.guix.gnu.org which has 32 cores and I'm very
glad to say it seems to work!

Maybe tweak the capitalisation in the commit message, "New patch", "Add
it ...", "Use it", but yeah, I'm all for merging this, with it I'll be
able to reconfigure bayfront hopefully. Are you set to tweak the commit
message and push?

Thanks,

Chris
-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEEPonu50WOcg2XVOCyXiijOwuE9XcFAl7ID5tfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcACgkQXiijOwuE
9XeMjg/9FjyEOHwUerdwzDNIz8l/KikV3SWCM0W7bFdoohfeMHHhq8MI7SPzWPu3
pL750T/kgjHW6cyE/Q5yil7wqnPTItEXP+IL1vIw9IRTjmKCU4iBqvFzXG/wj3N1
TKGlxbUGfs8tkinn4JU3zaQL+zwQtfS8krXUoMdzcD7S4cGi8+tDh/JIMvFV9R1k
YzduQRr60YTO2q24QY9jfwrlHH9+kDLU5gAs2dRMLcOjdCwyKQid1CGttOge6K8N
qUB4Wh/RreGt5OojzjWRtgngN3hlIbUUX7fbT5tiFh6nl1SkT/X73XRKg8CAjzAo
XTGexUNZNbwPGd1yqkc/xh3+FOvixDB9M3kRlZOB9MfJM32hbPY2HY7G93hQDvHH
jJ97CT+OPtD3YFZeoS9q3K7Jlwg1hZluk/V++y1chDsgSgbDWvPEIryP7AOnmhxW
uLzCAly/NkjI/0h+G/7H7qfmcE0PRKg856cVaUh89VyifLQ+XJDXpgbYGSSL2cBj
WTbSyMczzVWsLM5YIpGnofJ7svCUNCZwxOoYC9yg4qCc0I3Meqk4z+k0iR58uYe+
SuQ1Cr64xad0UnuzFLO/vdlDSR/sjcf0FbXalQKtzB31kCwywjryZwJfuyvUwmm8
iS6y893Ywq9FHfw0DLVnlc2LtRdffTP+x7efQMmcHlGRpCbqnGo=
=oaFa
-----END PGP SIGNATURE-----

C
C
Caleb Ristvedt wrote on 22 May 2020 21:38
(name . Christopher Baines)(address . mail@cbaines.net)(address . 41444@debbugs.gnu.org)
87tv07zrnh.fsf@cune.org
Christopher Baines <mail@cbaines.net> writes:

Toggle quote (3 lines)
> I've tried this on bayfront.guix.gnu.org which has 32 cores and I'm very
> glad to say it seems to work!

Excellent.

Toggle quote (3 lines)
> Maybe tweak the capitalisation in the commit message, "New patch", "Add
> it ...", "Use it",

Done.

Toggle quote (2 lines)
> Are you set to tweak the commit message and push?

Indeed, tweaked and pushed as 9af90aafdfd8afd5fb7b5377ca5daf2215d38d7a.

- reepca
C
C
Caleb Ristvedt wrote on 22 May 2020 21:39
done
(address . 41444-done@debbugs.gnu.org)
87pnavzrmj.fsf@cune.org
closed.
Closed
?