diff mbox

[v2] xfs: remove kmem_zalloc_greedy

Message ID 20170308003528.GK5280@birch.djwong.org (mailing list archive)
State Accepted
Headers show

Commit Message

Darrick J. Wong March 8, 2017, 12:35 a.m. UTC
The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
it to grab 1-4 pages for staging of inobt records.  The infinite loop in
the greedy allocation function is causing hangs[1] in generic/269, so
just get rid of the greedy allocator in favor of kmem_zalloc_large.
This makes bulkstat somewhat more likely to ENOMEM if there's really no
pages to spare, but eliminates a source of hangs.

[1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: remove single-page fallback
---
 fs/xfs/kmem.c       |   18 ------------------
 fs/xfs/kmem.h       |    2 --
 fs/xfs/xfs_itable.c |    6 ++----
 3 files changed, 2 insertions(+), 24 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Luis Chamberlain March 14, 2017, 4:57 p.m. UTC | #1
On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> the greedy allocation function is causing hangs[1] in generic/269, so
> just get rid of the greedy allocator in favor of kmem_zalloc_large.
> This makes bulkstat somewhat more likely to ENOMEM if there's really no
> pages to spare, but eliminates a source of hangs.
> 
> [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: remove single-page fallback
> ---

Since this fixes a hang how about *at the very least* a respective Fixes tag ?
This fixes an existing hang so what are the stable considerations here ? I
realize the answer is not easy but figured its worth asking.

FWIW I trace kmem_zalloc_greedy()'s introduction back to 2006 77e4635ae1917
("[XFS] Add a greedy allocation interface, allocating within a min/max size
range.") through v2.6.19 days...

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 14, 2017, 6:07 p.m. UTC | #2
On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> > the greedy allocation function is causing hangs[1] in generic/269, so
> > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > pages to spare, but eliminates a source of hangs.
> > 
> > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: remove single-page fallback
> > ---
> 
> Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> This fixes an existing hang so what are the stable considerations here ? I
> realize the answer is not easy but figured its worth asking.

I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
not fixing _greedy so much as we are killing it.  The patch fixes an
infinite retry hang when bulkstat tries a memory allocation that cannot
be satisfied; and having done that, realizes there are no remaining
callers of _greedy and garbage collects it.  The code that was there
before also seems capable of sleeping forever, I think.

So the minimally invasive fix is to apply the allocation conversion in
bulkstat, and if there aren't any other callers of _greedy then you can
get rid of it too.

> FWIW I trace kmem_zalloc_greedy()'s introduction back to 2006 77e4635ae1917
> ("[XFS] Add a greedy allocation interface, allocating within a min/max size
> range.") through v2.6.19 days...

--D

> 
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain March 15, 2017, 12:14 a.m. UTC | #3
On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote:
> On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > > it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> > > the greedy allocation function is causing hangs[1] in generic/269, so
> > > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > > pages to spare, but eliminates a source of hangs.
> > > 
> > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > v2: remove single-page fallback
> > > ---
> > 
> > Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> > This fixes an existing hang so what are the stable considerations here ? I
> > realize the answer is not easy but figured its worth asking.
> 
> I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
> not fixing _greedy so much as we are killing it.  The patch fixes an
> infinite retry hang when bulkstat tries a memory allocation that cannot
> be satisfied; and having done that, realizes there are no remaining
> callers of _greedy and garbage collects it.  The code that was there
> before also seems capable of sleeping forever, I think.
> 
> So the minimally invasive fix is to apply the allocation conversion in
> bulkstat, and if there aren't any other callers of _greedy then you can
> get rid of it too.

For the stake of stable XFS users then why not do the less invasive change
first, Cc stable, and then move on to the less backward portable solution ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Hocko March 15, 2017, 8:35 a.m. UTC | #4
On Wed 15-03-17 01:14:27, Luis R. Rodriguez wrote:
> On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> > > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > > > it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> > > > the greedy allocation function is causing hangs[1] in generic/269, so
> > > > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > > > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > > > pages to spare, but eliminates a source of hangs.
> > > > 
> > > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > v2: remove single-page fallback
> > > > ---
> > > 
> > > Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> > > This fixes an existing hang so what are the stable considerations here ? I
> > > realize the answer is not easy but figured its worth asking.
> > 
> > I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
> > not fixing _greedy so much as we are killing it.  The patch fixes an
> > infinite retry hang when bulkstat tries a memory allocation that cannot
> > be satisfied; and having done that, realizes there are no remaining
> > callers of _greedy and garbage collects it.  The code that was there
> > before also seems capable of sleeping forever, I think.
> > 
> > So the minimally invasive fix is to apply the allocation conversion in
> > bulkstat, and if there aren't any other callers of _greedy then you can
> > get rid of it too.
> 
> For the stake of stable XFS users then why not do the less invasive change
> first, Cc stable, and then move on to the less backward portable solution ?

The thing is that the permanent failures for vmalloc were so unlikely
prior to 5d17a73a2ebe ("vmalloc: back off when the current task is
killed") that this was basically a non-issue before this (4.11) merge
window.
Luis Chamberlain March 15, 2017, 3:43 p.m. UTC | #5
On Wed, Mar 15, 2017 at 09:35:29AM +0100, Michal Hocko wrote:
> On Wed 15-03-17 01:14:27, Luis R. Rodriguez wrote:
> > On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote:
> > > On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> > > > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > > > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > > > > it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> > > > > the greedy allocation function is causing hangs[1] in generic/269, so
> > > > > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > > > > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > > > > pages to spare, but eliminates a source of hangs.
> > > > > 
> > > > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > > v2: remove single-page fallback
> > > > > ---
> > > > 
> > > > Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> > > > This fixes an existing hang so what are the stable considerations here ? I
> > > > realize the answer is not easy but figured its worth asking.
> > > 
> > > I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
> > > not fixing _greedy so much as we are killing it.  The patch fixes an
> > > infinite retry hang when bulkstat tries a memory allocation that cannot
> > > be satisfied; and having done that, realizes there are no remaining
> > > callers of _greedy and garbage collects it.  The code that was there
> > > before also seems capable of sleeping forever, I think.
> > > 
> > > So the minimally invasive fix is to apply the allocation conversion in
> > > bulkstat, and if there aren't any other callers of _greedy then you can
> > > get rid of it too.
> > 
> > For the stake of stable XFS users then why not do the less invasive change
> > first, Cc stable, and then move on to the less backward portable solution ?
> 
> The thing is that the permanent failures for vmalloc were so unlikely
> prior to 5d17a73a2ebe ("vmalloc: back off when the current task is
> killed") that this was basically a non-issue before this (4.11) merge
> window.

I see, this seems like critical information to add to the commit log.
Also, will this be at least pushed to v4.11 ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 15, 2017, 4:40 p.m. UTC | #6
On Wed, Mar 15, 2017 at 04:43:27PM +0100, Luis R. Rodriguez wrote:
> On Wed, Mar 15, 2017 at 09:35:29AM +0100, Michal Hocko wrote:
> > On Wed 15-03-17 01:14:27, Luis R. Rodriguez wrote:
> > > On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote:
> > > > On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> > > > > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > > > > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > > > > > it to grab 1-4 pages for staging of inobt records.  The infinite loop in
> > > > > > the greedy allocation function is causing hangs[1] in generic/269, so
> > > > > > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > > > > > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > > > > > pages to spare, but eliminates a source of hangs.
> > > > > > 
> > > > > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > > v2: remove single-page fallback
> > > > > > ---
> > > > > 
> > > > > Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> > > > > This fixes an existing hang so what are the stable considerations here ? I
> > > > > realize the answer is not easy but figured its worth asking.
> > > > 
> > > > I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
> > > > not fixing _greedy so much as we are killing it.  The patch fixes an
> > > > infinite retry hang when bulkstat tries a memory allocation that cannot
> > > > be satisfied; and having done that, realizes there are no remaining
> > > > callers of _greedy and garbage collects it.  The code that was there
> > > > before also seems capable of sleeping forever, I think.
> > > > 
> > > > So the minimally invasive fix is to apply the allocation conversion in
> > > > bulkstat, and if there aren't any other callers of _greedy then you can
> > > > get rid of it too.
> > > 
> > > For the stake of stable XFS users then why not do the less invasive change
> > > first, Cc stable, and then move on to the less backward portable solution ?
> > 
> > The thing is that the permanent failures for vmalloc were so unlikely
> > prior to 5d17a73a2ebe ("vmalloc: back off when the current task is
> > killed") that this was basically a non-issue before this (4.11) merge
> > window.
> 
> I see, this seems like critical information to add to the commit log.
> Also, will this be at least pushed to v4.11 ?

It's already in rc2.

--D

> 
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 2dfdc62..70a5b55 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -25,24 +25,6 @@ 
 #include "kmem.h"
 #include "xfs_message.h"
 
-/*
- * Greedy allocation.  May fail and may return vmalloced memory.
- */
-void *
-kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
-{
-	void		*ptr;
-	size_t		kmsize = maxsize;
-
-	while (!(ptr = vzalloc(kmsize))) {
-		if ((kmsize >>= 1) <= minsize)
-			kmsize = minsize;
-	}
-	if (ptr)
-		*size = kmsize;
-	return ptr;
-}
-
 void *
 kmem_alloc(size_t size, xfs_km_flags_t flags)
 {
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 689f746..f0fc84f 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -69,8 +69,6 @@  static inline void  kmem_free(const void *ptr)
 }
 
 
-extern void *kmem_zalloc_greedy(size_t *, size_t, size_t);
-
 static inline void *
 kmem_zalloc(size_t size, xfs_km_flags_t flags)
 {
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 8b2150d..e775f78 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -362,7 +362,6 @@  xfs_bulkstat(
 	xfs_agino_t		agino;	/* inode # in allocation group */
 	xfs_agnumber_t		agno;	/* allocation group number */
 	xfs_btree_cur_t		*cur;	/* btree cursor for ialloc btree */
-	size_t			irbsize; /* size of irec buffer in bytes */
 	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
 	int			nirbuf;	/* size of irbuf */
 	int			ubcount; /* size of user's buffer */
@@ -389,11 +388,10 @@  xfs_bulkstat(
 	*ubcountp = 0;
 	*done = 0;
 
-	irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4);
+	irbuf = kmem_zalloc_large(PAGE_SIZE * 4, KM_SLEEP);
 	if (!irbuf)
 		return -ENOMEM;
-
-	nirbuf = irbsize / sizeof(*irbuf);
+	nirbuf = (PAGE_SIZE * 4) / sizeof(*irbuf);
 
 	/*
 	 * Loop over the allocation groups, starting from the last