Message ID | 20190723150017.31891-1-cmaiolino@redhat.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | xfs: Do not free xfs_extent_busy from inside a spinlock | expand |
On Tue, Jul 23, 2019 at 05:00:17PM +0200, Carlos Maiolino wrote: > xfs_extent_busy_clear_one() calls kmem_free() with the pag spinlock > locked. Er, what problem does this solve? Does holding on to the pag spinlock too long while memory freeing causes everything else to stall? When is memory freeing slow enough to cause a noticeable impact? > Fix this by adding a new temporary list, and, make > xfs_extent_busy_clear_one() to move the extent_busy items to this new > list, instead of freeing them. > > Free the objects in the temporary list after we drop the pagb_lock > > Reported-by: Jeff Layton <jlayton@kernel.org> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > fs/xfs/xfs_extent_busy.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c > index 0ed68379e551..0a7dcf03340b 100644 > --- a/fs/xfs/xfs_extent_busy.c > +++ b/fs/xfs/xfs_extent_busy.c > @@ -523,7 +523,8 @@ STATIC void > xfs_extent_busy_clear_one( > struct xfs_mount *mp, > struct xfs_perag *pag, > - struct xfs_extent_busy *busyp) > + struct xfs_extent_busy *busyp, > + struct list_head *list) > { > if (busyp->length) { > trace_xfs_extent_busy_clear(mp, busyp->agno, busyp->bno, > @@ -531,8 +532,7 @@ xfs_extent_busy_clear_one( > rb_erase(&busyp->rb_node, &pag->pagb_tree); > } > > - list_del_init(&busyp->list); > - kmem_free(busyp); > + list_move(&busyp->list, list); > } > > static void > @@ -565,6 +565,7 @@ xfs_extent_busy_clear( > struct xfs_perag *pag = NULL; > xfs_agnumber_t agno = NULLAGNUMBER; > bool wakeup = false; > + LIST_HEAD(busy_list); > > list_for_each_entry_safe(busyp, n, list, list) { > if (busyp->agno != agno) { > @@ -580,13 +581,18 @@ xfs_extent_busy_clear( > !(busyp->flags & XFS_EXTENT_BUSY_SKIP_DISCARD)) { > busyp->flags = XFS_EXTENT_BUSY_DISCARDED; > } else { > - xfs_extent_busy_clear_one(mp, pag, busyp); > + xfs_extent_busy_clear_one(mp, pag, busyp, &busy_list); ...and why not just put the busyp on the busy_list here so you don't have to pass the list_head pointer around? --D > wakeup = true; > } > } > > if (pag) > xfs_extent_busy_put_pag(pag, wakeup); > + > + list_for_each_entry_safe(busyp, n, &busy_list, list) { > + list_del_init(&busyp->list); > + kmem_free(busyp); > + } > } > > /* > -- > 2.20.1 >
I forgot to Cc Jeff on the patch. Sorry Jeff, cc'ing now. On Tue, Jul 23, 2019 at 05:00:17PM +0200, Carlos Maiolino wrote: > xfs_extent_busy_clear_one() calls kmem_free() with the pag spinlock > locked. > > Fix this by adding a new temporary list, and, make > xfs_extent_busy_clear_one() to move the extent_busy items to this new > list, instead of freeing them. > > Free the objects in the temporary list after we drop the pagb_lock > > Reported-by: Jeff Layton <jlayton@kernel.org> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > fs/xfs/xfs_extent_busy.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c > index 0ed68379e551..0a7dcf03340b 100644 > --- a/fs/xfs/xfs_extent_busy.c > +++ b/fs/xfs/xfs_extent_busy.c > @@ -523,7 +523,8 @@ STATIC void > xfs_extent_busy_clear_one( > struct xfs_mount *mp, > struct xfs_perag *pag, > - struct xfs_extent_busy *busyp) > + struct xfs_extent_busy *busyp, > + struct list_head *list) > { > if (busyp->length) { > trace_xfs_extent_busy_clear(mp, busyp->agno, busyp->bno, > @@ -531,8 +532,7 @@ xfs_extent_busy_clear_one( > rb_erase(&busyp->rb_node, &pag->pagb_tree); > } > > - list_del_init(&busyp->list); > - kmem_free(busyp); > + list_move(&busyp->list, list); > } > > static void > @@ -565,6 +565,7 @@ xfs_extent_busy_clear( > struct xfs_perag *pag = NULL; > xfs_agnumber_t agno = NULLAGNUMBER; > bool wakeup = false; > + LIST_HEAD(busy_list); > > list_for_each_entry_safe(busyp, n, list, list) { > if (busyp->agno != agno) { > @@ -580,13 +581,18 @@ xfs_extent_busy_clear( > !(busyp->flags & XFS_EXTENT_BUSY_SKIP_DISCARD)) { > busyp->flags = XFS_EXTENT_BUSY_DISCARDED; > } else { > - xfs_extent_busy_clear_one(mp, pag, busyp); > + xfs_extent_busy_clear_one(mp, pag, busyp, &busy_list); > wakeup = true; > } > } > > if (pag) > xfs_extent_busy_put_pag(pag, wakeup); > + > + list_for_each_entry_safe(busyp, n, &busy_list, list) { > + list_del_init(&busyp->list); > + kmem_free(busyp); > + } > } > > /* > -- > 2.20.1 >
On Tue, Jul 23, 2019 at 08:11:02AM -0700, Darrick J. Wong wrote: > On Tue, Jul 23, 2019 at 05:00:17PM +0200, Carlos Maiolino wrote: > > xfs_extent_busy_clear_one() calls kmem_free() with the pag spinlock > > locked. > CC'ing Jeff so he can maybe chime in too. > Er, what problem does this solve? Does holding on to the pag spinlock > too long while memory freeing causes everything else to stall? When is > memory freeing slow enough to cause a noticeable impact? Jeff detected it when using this patch: https://marc.info/?l=linux-mm&m=156388753722881&w=2 At first I don't see any specific problem, but I don't think we are supposed to use kmem_free() inside interrupt context anyway. So, even though there is no visible side effect, it should be fixed IMHO. With the patch above, the side effect is a bunch of warnings :P > > > Fix this by adding a new temporary list, and, make > > xfs_extent_busy_clear_one() to move the extent_busy items to this new > > list, instead of freeing them. > > > > Free the objects in the temporary list after we drop the pagb_lock > > > > Reported-by: Jeff Layton <jlayton@kernel.org> > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > > --- > > fs/xfs/xfs_extent_busy.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c > > index 0ed68379e551..0a7dcf03340b 100644 > > --- a/fs/xfs/xfs_extent_busy.c > > +++ b/fs/xfs/xfs_extent_busy.c > > @@ -523,7 +523,8 @@ STATIC void > > xfs_extent_busy_clear_one( > > struct xfs_mount *mp, > > struct xfs_perag *pag, > > - struct xfs_extent_busy *busyp) > > + struct xfs_extent_busy *busyp, > > + struct list_head *list) > > { > > if (busyp->length) { > > trace_xfs_extent_busy_clear(mp, busyp->agno, busyp->bno, > > @@ -531,8 +532,7 @@ xfs_extent_busy_clear_one( > > rb_erase(&busyp->rb_node, &pag->pagb_tree); > > } > > > > - list_del_init(&busyp->list); > > - kmem_free(busyp); > > + list_move(&busyp->list, list); > > } > > > > static void > > @@ -565,6 +565,7 @@ xfs_extent_busy_clear( > > struct xfs_perag *pag = NULL; > > xfs_agnumber_t agno = NULLAGNUMBER; > > bool wakeup = false; > > + LIST_HEAD(busy_list); > > > > list_for_each_entry_safe(busyp, n, list, list) { > > if (busyp->agno != agno) { > > @@ -580,13 +581,18 @@ xfs_extent_busy_clear( > > !(busyp->flags & XFS_EXTENT_BUSY_SKIP_DISCARD)) { > > busyp->flags = XFS_EXTENT_BUSY_DISCARDED; > > } else { > > - xfs_extent_busy_clear_one(mp, pag, busyp); > > + xfs_extent_busy_clear_one(mp, pag, busyp, &busy_list); > > ...and why not just put the busyp on the busy_list here so you don't > have to pass the list_head pointer around? Just left it inside _clear_one to keep manipulation of busyp in a single place. We already remove it from the rb tree there, so, remove it from the extent busy list also there, just looked clear to do all the cleanup in the same place. > > --D > > > wakeup = true; > > } > > } > > > > if (pag) > > xfs_extent_busy_put_pag(pag, wakeup); > > + > > + list_for_each_entry_safe(busyp, n, &busy_list, list) { > > + list_del_init(&busyp->list); > > + kmem_free(busyp); > > + } > > } > > > > /* > > -- > > 2.20.1 > >
On Tue, Jul 23, 2019 at 05:31:33PM +0200, Carlos Maiolino wrote: > CC'ing Jeff so he can maybe chime in too. > > > > Er, what problem does this solve? Does holding on to the pag spinlock > > too long while memory freeing causes everything else to stall? When is > > memory freeing slow enough to cause a noticeable impact? > > Jeff detected it when using this patch: > > https://marc.info/?l=linux-mm&m=156388753722881&w=2 > > At first I don't see any specific problem, but I don't think we are supposed to > use kmem_free() inside interrupt context anyway. So, even though there is no > visible side effect, it should be fixed IMHO. With the patch above, the side > effect is a bunch of warnings :P This is going to break lots of places in xfs. While we have separate allocation side wrappers for plain kmalloc vs using a vmalloc fallback we always use the same free side wrapper. We could fix this by adding a kmem_free_large and switch all places that allocated using kmem_alloc_large to that, but it will require a bit of work.
On Tue, 2019-07-23 at 08:51 -0700, Christoph Hellwig wrote: > On Tue, Jul 23, 2019 at 05:31:33PM +0200, Carlos Maiolino wrote: > > CC'ing Jeff so he can maybe chime in too. > > > > > > > Er, what problem does this solve? Does holding on to the pag spinlock > > > too long while memory freeing causes everything else to stall? When is > > > memory freeing slow enough to cause a noticeable impact? > > > > Jeff detected it when using this patch: > > > > https://marc.info/?l=linux-mm&m=156388753722881&w=2 > > > > At first I don't see any specific problem, but I don't think we are supposed to > > use kmem_free() inside interrupt context anyway. So, even though there is no > > visible side effect, it should be fixed IMHO. With the patch above, the side > > effect is a bunch of warnings :P > > This is going to break lots of places in xfs. While we have separate > allocation side wrappers for plain kmalloc vs using a vmalloc fallback we > always use the same free side wrapper. We could fix this by adding a > kmem_free_large and switch all places that allocated using > kmem_alloc_large to that, but it will require a bit of work. (cc'ing Al) Note that those places are already broken. AIUI, the basic issue is that vmalloc/vfree have to fix up page tables and that requires being able to sleep. This patch just makes this situation more evident. If that patch gets merged, I imagine we'll have a lot of places to clean up (not just in xfs). Anyway, in the case of being in an interrupt, we currently queue the freeing to a workqueue. Al mentioned that we could create a new kvfree_atomic that we could use from atomic contexts like this. That may be another option (though Carlos' patch looked reasonable to me and would probably be more efficient).
On Tue, Jul 23, 2019 at 01:07:00PM -0400, Jeff Layton wrote: > Note that those places are already broken. AIUI, the basic issue is that > vmalloc/vfree have to fix up page tables and that requires being able to > sleep. This patch just makes this situation more evident. If that patch > gets merged, I imagine we'll have a lot of places to clean up (not just > in xfs). > > Anyway, in the case of being in an interrupt, we currently queue the > freeing to a workqueue. Al mentioned that we could create a new > kvfree_atomic that we could use from atomic contexts like this. That may > be another option (though Carlos' patch looked reasonable to me and > would probably be more efficient). The point is for XFS we generally only use kmem_free for pure kmalloc allocations under spinlocks. But yes, the interfac is a little suboptimal and a kmem_free_large would be nicer and then warnings like this that might be pretty useful could be added.
On Tue, 2019-07-23 at 10:08 -0700, Christoph Hellwig wrote: > On Tue, Jul 23, 2019 at 01:07:00PM -0400, Jeff Layton wrote: > > Note that those places are already broken. AIUI, the basic issue is that > > vmalloc/vfree have to fix up page tables and that requires being able to > > sleep. This patch just makes this situation more evident. If that patch > > gets merged, I imagine we'll have a lot of places to clean up (not just > > in xfs). > > > > Anyway, in the case of being in an interrupt, we currently queue the > > freeing to a workqueue. Al mentioned that we could create a new > > kvfree_atomic that we could use from atomic contexts like this. That may > > be another option (though Carlos' patch looked reasonable to me and > > would probably be more efficient). > > The point is for XFS we generally only use kmem_free for pure kmalloc > allocations under spinlocks. But yes, the interfac is a little > suboptimal and a kmem_free_large would be nicer and then warnings like > this that might be pretty useful could be added. Ahh ok, I get it now. You're using it as a generic "free this, no matter what it is" wrapper, and relying on the caller to ensure that it will never try to free a vmalloc'ed addr from an atomic context. I wonder how many other places are doing that? I count 858 call sites for kvfree. If significant portion of those are doing this, then we may have to re-think my patch. It seems like the right thing to do, but we there may be more fallout than I expected.
On Tue, Jul 23, 2019 at 01:38:08PM -0400, Jeff Layton wrote: > Ahh ok, I get it now. You're using it as a generic "free this, no matter > what it is" wrapper, and relying on the caller to ensure that it will > never try to free a vmalloc'ed addr from an atomic context. > > I wonder how many other places are doing that? I count 858 call sites > for kvfree. If significant portion of those are doing this, then we may > have to re-think my patch. It seems like the right thing to do, but we > there may be more fallout than I expected. For xfs we only have 4 direct callers of kmem_alloc_large, and 8 callers of kmem_zalloc_large, so it they aren't too many, even assuming that due to error handling we usually have a few more sites that free the buffers.
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index 0ed68379e551..0a7dcf03340b 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -523,7 +523,8 @@ STATIC void xfs_extent_busy_clear_one( struct xfs_mount *mp, struct xfs_perag *pag, - struct xfs_extent_busy *busyp) + struct xfs_extent_busy *busyp, + struct list_head *list) { if (busyp->length) { trace_xfs_extent_busy_clear(mp, busyp->agno, busyp->bno, @@ -531,8 +532,7 @@ xfs_extent_busy_clear_one( rb_erase(&busyp->rb_node, &pag->pagb_tree); } - list_del_init(&busyp->list); - kmem_free(busyp); + list_move(&busyp->list, list); } static void @@ -565,6 +565,7 @@ xfs_extent_busy_clear( struct xfs_perag *pag = NULL; xfs_agnumber_t agno = NULLAGNUMBER; bool wakeup = false; + LIST_HEAD(busy_list); list_for_each_entry_safe(busyp, n, list, list) { if (busyp->agno != agno) { @@ -580,13 +581,18 @@ xfs_extent_busy_clear( !(busyp->flags & XFS_EXTENT_BUSY_SKIP_DISCARD)) { busyp->flags = XFS_EXTENT_BUSY_DISCARDED; } else { - xfs_extent_busy_clear_one(mp, pag, busyp); + xfs_extent_busy_clear_one(mp, pag, busyp, &busy_list); wakeup = true; } } if (pag) xfs_extent_busy_put_pag(pag, wakeup); + + list_for_each_entry_safe(busyp, n, &busy_list, list) { + list_del_init(&busyp->list); + kmem_free(busyp); + } } /*
xfs_extent_busy_clear_one() calls kmem_free() with the pag spinlock locked. Fix this by adding a new temporary list, and, make xfs_extent_busy_clear_one() to move the extent_busy items to this new list, instead of freeing them. Free the objects in the temporary list after we drop the pagb_lock Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- fs/xfs/xfs_extent_busy.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)