Message ID | ZmCB_zqdu2cynJ1M@astro.su.se (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rpc.idmapd runs out of file descriptors | expand |
Hi all, On Wed, Jun 05, 2024 at 05:19:27PM +0200, Sergio Gelato wrote: > Observed on Debian 12 (nfs-utils 2.6.2): > > May 28 09:40:25 HOSTNAME rpc.idmapd[3602614]: dirscancb: scandir(/run/rpc_pipefs/nfs): Too many open files > [repeated multiple times] > > Investigation with lsof on one of the affected systems shows that file desciptors are not being closed: > > [...] > rpc.idmap 675 root 126r DIR 0,40 0 10813 /run/rpc_pipefs/nfs/clnt11e6 (deleted) > rpc.idmap 675 root 127u FIFO 0,40 0t0 10817 /run/rpc_pipefs/nfs/clnt11e6/idmap (deleted) > rpc.idmap 675 root 128r DIR 0,40 0 10834 /run/rpc_pipefs/nfs/clnt11ef (deleted) > rpc.idmap 675 root 129u FIFO 0,40 0t0 10838 /run/rpc_pipefs/nfs/clnt11ef/idmap (deleted) > rpc.idmap 675 root 130r DIR 0,40 0 10855 /run/rpc_pipefs/nfs/clnt11f8 (deleted) > rpc.idmap 675 root 131u FIFO 0,40 0t0 10859 /run/rpc_pipefs/nfs/clnt11f8/idmap (deleted) > > Raising the verbosity level to 3 results in no "Stale client:" lines. > strace shows no close() calls other than for the /run/rpc_pipefs/nfs directory. > > I believe this is because in dirscancb() the loop is exited prematurely > the first time nfsopen() returns -1, preventing later entries in the queue > from being reaped. I've tested the patch below, which seems indeed to cure > the problem. The bug appears to be still unfixed in the current master branch. > From: Sergio Gelato <Sergio.Gelato@astro.su.se> > Date: Tue, 4 Jun 2024 16:02:59 +0200 > Subject: rpc.idmapd: nfsopen() failures should not be fatal > > dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. > Some of these directories contain /idmap files, others don't. nfsopen() > returns -1 for the latter; we then want to skip the directory, not abort > the entire scan. > --- > utils/idmapd/idmapd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c > index e79c124..f3c540d 100644 > --- a/utils/idmapd/idmapd.c > +++ b/utils/idmapd/idmapd.c > @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) > if (nfsopen(ic) == -1) { > close(ic->ic_dirfd); > free(ic); > - goto out; > + continue; > } > > if (verbose > 2) Did this felt trough the cracks? Does the patch from Sergio looks good to you? Regards, Salvatore
On Sat, 14 Sep 2024, Salvatore Bonaccorso wrote: > Hi all, > > On Wed, Jun 05, 2024 at 05:19:27PM +0200, Sergio Gelato wrote: > > Observed on Debian 12 (nfs-utils 2.6.2): > > > > May 28 09:40:25 HOSTNAME rpc.idmapd[3602614]: dirscancb: scandir(/run/rpc_pipefs/nfs): Too many open files > > [repeated multiple times] > > > > Investigation with lsof on one of the affected systems shows that file desciptors are not being closed: > > > > [...] > > rpc.idmap 675 root 126r DIR 0,40 0 10813 /run/rpc_pipefs/nfs/clnt11e6 (deleted) > > rpc.idmap 675 root 127u FIFO 0,40 0t0 10817 /run/rpc_pipefs/nfs/clnt11e6/idmap (deleted) > > rpc.idmap 675 root 128r DIR 0,40 0 10834 /run/rpc_pipefs/nfs/clnt11ef (deleted) > > rpc.idmap 675 root 129u FIFO 0,40 0t0 10838 /run/rpc_pipefs/nfs/clnt11ef/idmap (deleted) > > rpc.idmap 675 root 130r DIR 0,40 0 10855 /run/rpc_pipefs/nfs/clnt11f8 (deleted) > > rpc.idmap 675 root 131u FIFO 0,40 0t0 10859 /run/rpc_pipefs/nfs/clnt11f8/idmap (deleted) > > > > Raising the verbosity level to 3 results in no "Stale client:" lines. > > strace shows no close() calls other than for the /run/rpc_pipefs/nfs directory. > > > > I believe this is because in dirscancb() the loop is exited prematurely > > the first time nfsopen() returns -1, preventing later entries in the queue > > from being reaped. I've tested the patch below, which seems indeed to cure > > the problem. The bug appears to be still unfixed in the current master branch. > > > From: Sergio Gelato <Sergio.Gelato@astro.su.se> > > Date: Tue, 4 Jun 2024 16:02:59 +0200 > > Subject: rpc.idmapd: nfsopen() failures should not be fatal > > > > dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. > > Some of these directories contain /idmap files, others don't. nfsopen() > > returns -1 for the latter; we then want to skip the directory, not abort > > the entire scan. > > --- > > utils/idmapd/idmapd.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c > > index e79c124..f3c540d 100644 > > --- a/utils/idmapd/idmapd.c > > +++ b/utils/idmapd/idmapd.c > > @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) > > if (nfsopen(ic) == -1) { > > close(ic->ic_dirfd); > > free(ic); > > - goto out; > > + continue; > > } > > > > if (verbose > 2) > > Did this felt trough the cracks? Does the patch from Sergio looks good > to you? > > Regards, > Salvatore > It probably fell through the cracks. It is always good to mail nfs-utils patches to SteveD as well as the list. You've done that now so hopefully he will pick it up. I think that patch looks good and sensible. Reviewed-by: NeilBrown <neilb@suse.de> Thanks, NeilBrown
On 9/14/24 3:29 AM, Salvatore Bonaccorso wrote: > Hi all, > > On Wed, Jun 05, 2024 at 05:19:27PM +0200, Sergio Gelato wrote: >> Observed on Debian 12 (nfs-utils 2.6.2): >> >> May 28 09:40:25 HOSTNAME rpc.idmapd[3602614]: dirscancb: scandir(/run/rpc_pipefs/nfs): Too many open files >> [repeated multiple times] >> >> Investigation with lsof on one of the affected systems shows that file desciptors are not being closed: >> >> [...] >> rpc.idmap 675 root 126r DIR 0,40 0 10813 /run/rpc_pipefs/nfs/clnt11e6 (deleted) >> rpc.idmap 675 root 127u FIFO 0,40 0t0 10817 /run/rpc_pipefs/nfs/clnt11e6/idmap (deleted) >> rpc.idmap 675 root 128r DIR 0,40 0 10834 /run/rpc_pipefs/nfs/clnt11ef (deleted) >> rpc.idmap 675 root 129u FIFO 0,40 0t0 10838 /run/rpc_pipefs/nfs/clnt11ef/idmap (deleted) >> rpc.idmap 675 root 130r DIR 0,40 0 10855 /run/rpc_pipefs/nfs/clnt11f8 (deleted) >> rpc.idmap 675 root 131u FIFO 0,40 0t0 10859 /run/rpc_pipefs/nfs/clnt11f8/idmap (deleted) >> >> Raising the verbosity level to 3 results in no "Stale client:" lines. >> strace shows no close() calls other than for the /run/rpc_pipefs/nfs directory. >> >> I believe this is because in dirscancb() the loop is exited prematurely >> the first time nfsopen() returns -1, preventing later entries in the queue >> from being reaped. I've tested the patch below, which seems indeed to cure >> the problem. The bug appears to be still unfixed in the current master branch. > >> From: Sergio Gelato <Sergio.Gelato@astro.su.se> >> Date: Tue, 4 Jun 2024 16:02:59 +0200 >> Subject: rpc.idmapd: nfsopen() failures should not be fatal >> >> dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. >> Some of these directories contain /idmap files, others don't. nfsopen() >> returns -1 for the latter; we then want to skip the directory, not abort >> the entire scan. >> --- >> utils/idmapd/idmapd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c >> index e79c124..f3c540d 100644 >> --- a/utils/idmapd/idmapd.c >> +++ b/utils/idmapd/idmapd.c >> @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) >> if (nfsopen(ic) == -1) { >> close(ic->ic_dirfd); >> free(ic); >> - goto out; >> + continue; >> } >> >> if (verbose > 2) > > Did this felt trough the cracks? Does the patch from Sergio looks good > to you? It did because it was not in the appropriate format... The patch was an attachment, not in-line, no Signed-off-by: line and the patch was not create by git format-patch command (which adds PATCH in the subject line). I have filters that look for things like that and I just didn't see it... steved.
On Mon, Sep 16, 2024 at 01:54:35PM +0200, Steve Dickson wrote: > It did because it was not in the appropriate format... The patch > was an attachment, not in-line, no Signed-off-by: line and > the patch was not create by git format-patch command (which > adds PATCH in the subject line). I see no mention of formatting requirements at https://www.linux-nfs.org/wiki/index.php/Reporting_bugs (not even by reference to the Linux kernel tree).
Hi all, On Mon, Sep 16, 2024 at 02:28:50PM +0200, Sergio.Gelato@astro.su.se wrote: > On Mon, Sep 16, 2024 at 01:54:35PM +0200, Steve Dickson wrote: > > It did because it was not in the appropriate format... The patch > > was an attachment, not in-line, no Signed-off-by: line and > > the patch was not create by git format-patch command (which > > adds PATCH in the subject line). > > I see no mention of formatting requirements at > https://www.linux-nfs.org/wiki/index.php/Reporting_bugs > (not even by reference to the Linux kernel tree). Thank you all for looking into it. Steve, do you need to have it re-submitted in a git format-patch format? At least a Signed-off-by line by Sergio would be needed in my understanding. Regards, Salvatore
Hi all, On Mon, Sep 16, 2024 at 05:42:13PM +0200, Salvatore Bonaccorso wrote: > Hi all, > > On Mon, Sep 16, 2024 at 02:28:50PM +0200, Sergio.Gelato@astro.su.se wrote: > > On Mon, Sep 16, 2024 at 01:54:35PM +0200, Steve Dickson wrote: > > > It did because it was not in the appropriate format... The patch > > > was an attachment, not in-line, no Signed-off-by: line and > > > the patch was not create by git format-patch command (which > > > adds PATCH in the subject line). > > > > I see no mention of formatting requirements at > > https://www.linux-nfs.org/wiki/index.php/Reporting_bugs > > (not even by reference to the Linux kernel tree). > > Thank you all for looking into it. Steve, do you need to have it > re-submitted in a git format-patch format? At least a Signed-off-by > line by Sergio would be needed in my understanding. I guess otherwise we can use soemthing like the following though a Signed-off-by is probably not right here, or is it enough if I say Reported-by: and Patch-originally-by: although the later is not an official kernel doc mentioned tag? Regards, Salvatore From 997981dd8d2b5977dd4cd54dab59de4ee3f57725 Mon Sep 17 00:00:00 2001 From: Sergio Gelato <Sergio.Gelato@astro.su.se> Date: Tue, 4 Jun 2024 16:02:59 +0200 Subject: [PATCH] rpc.idmapd: nfsopen() failures should not be fatal dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. Some of these directories contain /idmap files, others don't. nfsopen() returns -1 for the latter; we then want to skip the directory, not abort the entire scan. Reported-by: Sergio Gelato <sergio.gelato@astro.su.se> Patch-originally-by: Sergio Gelato <sergio.gelato@astro.su.se> Link: https://lore.kernel.org/linux-nfs/ZmCB_zqdu2cynJ1M@astro.su.se/ Link: https:/bugs.debian.org/1072573 Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> --- utils/idmapd/idmapd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c index cd9a965f8fbc..5231f56d24a8 100644 --- a/utils/idmapd/idmapd.c +++ b/utils/idmapd/idmapd.c @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) if (nfsopen(ic) == -1) { close(ic->ic_dirfd); free(ic); - goto out; + continue; } if (verbose > 2)
Hi, On Mon, Sep 16, 2024 at 07:19:58PM +0200, Salvatore Bonaccorso wrote: > Hi all, > > On Mon, Sep 16, 2024 at 05:42:13PM +0200, Salvatore Bonaccorso wrote: > > Hi all, > > > > On Mon, Sep 16, 2024 at 02:28:50PM +0200, Sergio.Gelato@astro.su.se wrote: > > > On Mon, Sep 16, 2024 at 01:54:35PM +0200, Steve Dickson wrote: > > > > It did because it was not in the appropriate format... The patch > > > > was an attachment, not in-line, no Signed-off-by: line and > > > > the patch was not create by git format-patch command (which > > > > adds PATCH in the subject line). > > > > > > I see no mention of formatting requirements at > > > https://www.linux-nfs.org/wiki/index.php/Reporting_bugs > > > (not even by reference to the Linux kernel tree). > > > > Thank you all for looking into it. Steve, do you need to have it > > re-submitted in a git format-patch format? At least a Signed-off-by > > line by Sergio would be needed in my understanding. > > I guess otherwise we can use soemthing like the following though a > Signed-off-by is probably not right here, or is it enough if I say > Reported-by: and Patch-originally-by: although the later is not an > official kernel doc mentioned tag? And then one seems to be able to do so many mistakes around one patch. Here is an improved version (v2), fixing a typo in the Link reference and using a Closes: tag after the Reported-by: This now should be better than the previous one. Regards, Salvatore From 46989ce210d27dbb3ad89095030c687db70c78be Mon Sep 17 00:00:00 2001 From: Sergio Gelato <Sergio.Gelato@astro.su.se> Date: Tue, 4 Jun 2024 16:02:59 +0200 Subject: [PATCH v2] rpc.idmapd: nfsopen() failures should not be fatal dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. Some of these directories contain /idmap files, others don't. nfsopen() returns -1 for the latter; we then want to skip the directory, not abort the entire scan. Reported-by: Sergio Gelato <sergio.gelato@astro.su.se> Closes: https://lore.kernel.org/linux-nfs/ZmCB_zqdu2cynJ1M@astro.su.se/ Link: https://bugs.debian.org/1072573 Patch-originally-by: Sergio Gelato <sergio.gelato@astro.su.se> Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> --- Changes in v2: - Fix typo in URL for Debian bug reference - Reorder patch tags and use Closes tag after Reported-by utils/idmapd/idmapd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c index cd9a965f8fbc..5231f56d24a8 100644 --- a/utils/idmapd/idmapd.c +++ b/utils/idmapd/idmapd.c @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) if (nfsopen(ic) == -1) { close(ic->ic_dirfd); free(ic); - goto out; + continue; } if (verbose > 2)
On Mon, Sep 16, 2024 at 08:18:16PM +0200, Salvatore Bonaccorso wrote: > > > > On Mon, Sep 16, 2024 at 01:54:35PM +0200, Steve Dickson wrote: > > > > > It did because it was not in the appropriate format... The patch > > > > > was an attachment, not in-line, no Signed-off-by: line and > > > > > the patch was not create by git format-patch command (which > > > > > adds PATCH in the subject line). > > > > (not even by reference to the Linux kernel tree). > > > > > > Thank you all for looking into it. Steve, do you need to have it > > > re-submitted in a git format-patch format? At least a Signed-off-by > > > line by Sergio would be needed in my understanding. > > > > I guess otherwise we can use soemthing like the following though a > > Signed-off-by is probably not right here, or is it enough if I say > > Reported-by: and Patch-originally-by: although the later is not an > > official kernel doc mentioned tag? I don't object to (nor do I require) your adding a Signed-off-by: Sergio Gelato <Sergio.Gelato@astro.su.se> (I am indeed the author of the patch, and it's too trivial to be copyrightable anyway). I guess Neil's comment earlier can be taken as equivalent to an Acked-by:. By the way, I have wondered whether to change the other two instances of "goto out" in dirscancb() as well. For the first one (calloc() failure) I think this would be counterproductive (one is likely to get the same error on all subsequent iterations), the second (directory open()) is more debatable (but failure of the directory open() should be so rare that it doesn't matter). > And then one seems to be able to do so many mistakes around one patch. > Here is an improved version (v2), fixing a typo in the Link reference > and using a Closes: tag after the Reported-by: > > This now should be better than the previous one. > > Regards, > Salvatore > > From 46989ce210d27dbb3ad89095030c687db70c78be Mon Sep 17 00:00:00 2001 > From: Sergio Gelato <Sergio.Gelato@astro.su.se> > Date: Tue, 4 Jun 2024 16:02:59 +0200 > Subject: [PATCH v2] rpc.idmapd: nfsopen() failures should not be fatal > > dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. > Some of these directories contain /idmap files, others don't. nfsopen() > returns -1 for the latter; we then want to skip the directory, not abort > the entire scan. > > Reported-by: Sergio Gelato <sergio.gelato@astro.su.se> > Closes: https://lore.kernel.org/linux-nfs/ZmCB_zqdu2cynJ1M@astro.su.se/ > Link: https://bugs.debian.org/1072573 > Patch-originally-by: Sergio Gelato <sergio.gelato@astro.su.se> > Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> > --- > Changes in v2: > - Fix typo in URL for Debian bug reference > - Reorder patch tags and use Closes tag after Reported-by > > utils/idmapd/idmapd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c > index cd9a965f8fbc..5231f56d24a8 100644 > --- a/utils/idmapd/idmapd.c > +++ b/utils/idmapd/idmapd.c > @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) > if (nfsopen(ic) == -1) { > close(ic->ic_dirfd); > free(ic); > - goto out; > + continue; > } > > if (verbose > 2) > -- > 2.45.2 >
On 9/16/24 2:18 PM, Salvatore Bonaccorso wrote: > dirscancb() loops over all clnt* subdirectories of/run/rpc_pipefs/nfs/. > Some of these directories contain /idmap files, others don't. nfsopen() > returns -1 for the latter; we then want to skip the directory, not abort > the entire scan. > > Reported-by: Sergio Gelato<sergio.gelato@astro.su.se> > Closes:https://lore.kernel.org/linux-nfs/ZmCB_zqdu2cynJ1M@astro.su.se/ > Link:https://bugs.debian.org/1072573 > Patch-originally-by: Sergio Gelato<sergio.gelato@astro.su.se> > Signed-off-by: Salvatore Bonaccorso<carnil@debian.org> Committed... (tag: nfs-utils-2-7-2-rc1) steved. > --- > Changes in v2: > - Fix typo in URL for Debian bug reference > - Reorder patch tags and use Closes tag after Reported-by > > utils/idmapd/idmapd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c > index cd9a965f8fbc..5231f56d24a8 100644 > --- a/utils/idmapd/idmapd.c > +++ b/utils/idmapd/idmapd.c > @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) > if (nfsopen(ic) == -1) { > close(ic->ic_dirfd); > free(ic); > - goto out; > + continue; > } > > if (verbose > 2) > --
From: Sergio Gelato <Sergio.Gelato@astro.su.se> Date: Tue, 4 Jun 2024 16:02:59 +0200 Subject: rpc.idmapd: nfsopen() failures should not be fatal dirscancb() loops over all clnt* subdirectories of /run/rpc_pipefs/nfs/. Some of these directories contain /idmap files, others don't. nfsopen() returns -1 for the latter; we then want to skip the directory, not abort the entire scan. --- utils/idmapd/idmapd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utils/idmapd/idmapd.c b/utils/idmapd/idmapd.c index e79c124..f3c540d 100644 --- a/utils/idmapd/idmapd.c +++ b/utils/idmapd/idmapd.c @@ -556,7 +556,7 @@ dirscancb(int fd, short UNUSED(which), void *data) if (nfsopen(ic) == -1) { close(ic->ic_dirfd); free(ic); - goto out; + continue; } if (verbose > 2)