diff mbox series

xfs: fix soft lockup via spinning in filestream ag selection loop

Message ID 20220422141226.1831426-1-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series xfs: fix soft lockup via spinning in filestream ag selection loop | expand

Commit Message

Brian Foster April 22, 2022, 2:12 p.m. UTC
The filestream AG selection loop uses pagf data to aid in AG
selection, which depends on pagf initialization. If the in-core
structure is not initialized, the caller invokes the AGF read path
to do so and carries on. If another task enters the loop and finds
a pagf init already in progress, the AGF read returns -EAGAIN and
the task continues the loop. This does not increment the current ag
index, however, which means the task spins on the current AGF buffer
until unlocked.

If the AGF read I/O submitted by the initial task happens to be
delayed for whatever reason, this results in soft lockup warnings
via the spinning task. This is reproduced by xfs/170. To avoid this
problem, fix the AGF trylock failure path to properly iterate to the
next AG. If a task iterates all AGs without making progress, the
trylock behavior is dropped in favor of blocking locks and thus a
soft lockup is no longer possible.

Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

I included the Fixes: tag because this looks like a regression in said
commit, but I've not explicitly verified.

Brian

 fs/xfs/xfs_filestream.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Darrick J. Wong April 22, 2022, 4 p.m. UTC | #1
On Fri, Apr 22, 2022 at 10:12:26AM -0400, Brian Foster wrote:
> The filestream AG selection loop uses pagf data to aid in AG
> selection, which depends on pagf initialization. If the in-core
> structure is not initialized, the caller invokes the AGF read path
> to do so and carries on. If another task enters the loop and finds
> a pagf init already in progress, the AGF read returns -EAGAIN and
> the task continues the loop. This does not increment the current ag
> index, however, which means the task spins on the current AGF buffer
> until unlocked.
> 
> If the AGF read I/O submitted by the initial task happens to be
> delayed for whatever reason, this results in soft lockup warnings

Is there a specific 'whatever reason' going on here?

> via the spinning task. This is reproduced by xfs/170. To avoid this
> problem, fix the AGF trylock failure path to properly iterate to the
> next AG. If a task iterates all AGs without making progress, the
> trylock behavior is dropped in favor of blocking locks and thus a
> soft lockup is no longer possible.
> 
> Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers")

Ooops, this was a major braino on my part.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> I included the Fixes: tag because this looks like a regression in said
> commit, but I've not explicitly verified.
> 
> Brian
> 
>  fs/xfs/xfs_filestream.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 6a3ce0f6dc9e..be9bcf8a1f99 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -128,11 +128,12 @@ xfs_filestream_pick_ag(
>  		if (!pag->pagf_init) {
>  			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
>  			if (err) {
> -				xfs_perag_put(pag);
> -				if (err != -EAGAIN)
> +				if (err != -EAGAIN) {
> +					xfs_perag_put(pag);
>  					return err;
> +				}
>  				/* Couldn't lock the AGF, skip this AG. */
> -				continue;
> +				goto next_ag;
>  			}
>  		}
>  
> -- 
> 2.34.1
>
Brian Foster April 22, 2022, 4:22 p.m. UTC | #2
On Fri, Apr 22, 2022 at 09:00:21AM -0700, Darrick J. Wong wrote:
> On Fri, Apr 22, 2022 at 10:12:26AM -0400, Brian Foster wrote:
> > The filestream AG selection loop uses pagf data to aid in AG
> > selection, which depends on pagf initialization. If the in-core
> > structure is not initialized, the caller invokes the AGF read path
> > to do so and carries on. If another task enters the loop and finds
> > a pagf init already in progress, the AGF read returns -EAGAIN and
> > the task continues the loop. This does not increment the current ag
> > index, however, which means the task spins on the current AGF buffer
> > until unlocked.
> > 
> > If the AGF read I/O submitted by the initial task happens to be
> > delayed for whatever reason, this results in soft lockup warnings
> 
> Is there a specific 'whatever reason' going on here?
> 

Presumably.. given this seems to reproduce reliably or not at all in
certain environments/configs, my suspicion was that either the timing of
the test changes enough such that some other task involved with the test
is able to load the bdev, or otherwise timing changes just enough to
trigger the pagf_init race and the subsequent spinning is what
exacerbates the delay (i.e. burning cpu and subsequent soft lockup BUG
starve out some part(s) of the I/O submission/completion processing).
I've no tangible evidence for either aside from the latter seems fairly
logical when you consider that the test consistently completes in 3-4
seconds with the fix in place, but without it we consistently hit
multiple instances of the soft lockup detector (on ~20s intervals IIRC)
and the system seems to melt down indefinitely. *shrug*

Brian

> > via the spinning task. This is reproduced by xfs/170. To avoid this
> > problem, fix the AGF trylock failure path to properly iterate to the
> > next AG. If a task iterates all AGs without making progress, the
> > trylock behavior is dropped in favor of blocking locks and thus a
> > soft lockup is no longer possible.
> > 
> > Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers")
> 
> Ooops, this was a major braino on my part.
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > I included the Fixes: tag because this looks like a regression in said
> > commit, but I've not explicitly verified.
> > 
> > Brian
> > 
> >  fs/xfs/xfs_filestream.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> > index 6a3ce0f6dc9e..be9bcf8a1f99 100644
> > --- a/fs/xfs/xfs_filestream.c
> > +++ b/fs/xfs/xfs_filestream.c
> > @@ -128,11 +128,12 @@ xfs_filestream_pick_ag(
> >  		if (!pag->pagf_init) {
> >  			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
> >  			if (err) {
> > -				xfs_perag_put(pag);
> > -				if (err != -EAGAIN)
> > +				if (err != -EAGAIN) {
> > +					xfs_perag_put(pag);
> >  					return err;
> > +				}
> >  				/* Couldn't lock the AGF, skip this AG. */
> > -				continue;
> > +				goto next_ag;
> >  			}
> >  		}
> >  
> > -- 
> > 2.34.1
> > 
>
Christoph Hellwig April 24, 2022, 5:23 a.m. UTC | #3
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong April 25, 2022, 4:15 p.m. UTC | #4
On Fri, Apr 22, 2022 at 12:22:28PM -0400, Brian Foster wrote:
> On Fri, Apr 22, 2022 at 09:00:21AM -0700, Darrick J. Wong wrote:
> > On Fri, Apr 22, 2022 at 10:12:26AM -0400, Brian Foster wrote:
> > > The filestream AG selection loop uses pagf data to aid in AG
> > > selection, which depends on pagf initialization. If the in-core
> > > structure is not initialized, the caller invokes the AGF read path
> > > to do so and carries on. If another task enters the loop and finds
> > > a pagf init already in progress, the AGF read returns -EAGAIN and
> > > the task continues the loop. This does not increment the current ag
> > > index, however, which means the task spins on the current AGF buffer
> > > until unlocked.
> > > 
> > > If the AGF read I/O submitted by the initial task happens to be
> > > delayed for whatever reason, this results in soft lockup warnings
> > 
> > Is there a specific 'whatever reason' going on here?
> > 
> 
> Presumably.. given this seems to reproduce reliably or not at all in
> certain environments/configs, my suspicion was that either the timing of
> the test changes enough such that some other task involved with the test
> is able to load the bdev, or otherwise timing changes just enough to
> trigger the pagf_init race and the subsequent spinning is what
> exacerbates the delay (i.e. burning cpu and subsequent soft lockup BUG
> starve out some part(s) of the I/O submission/completion processing).
> I've no tangible evidence for either aside from the latter seems fairly
> logical when you consider that the test consistently completes in 3-4
> seconds with the fix in place, but without it we consistently hit
> multiple instances of the soft lockup detector (on ~20s intervals IIRC)
> and the system seems to melt down indefinitely. *shrug*

Ah, ok, so there wasn't any specific event that was causing AGF IO to
take a long time, it's just that a thread running the filestream
allocator could fail the trylock loop for any reason for long enough to
trip the hangcheck warning.

--D

> Brian
> 
> > > via the spinning task. This is reproduced by xfs/170. To avoid this
> > > problem, fix the AGF trylock failure path to properly iterate to the
> > > next AG. If a task iterates all AGs without making progress, the
> > > trylock behavior is dropped in favor of blocking locks and thus a
> > > soft lockup is no longer possible.
> > > 
> > > Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers")
> > 
> > Ooops, this was a major braino on my part.
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > --D
> > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > I included the Fixes: tag because this looks like a regression in said
> > > commit, but I've not explicitly verified.
> > > 
> > > Brian
> > > 
> > >  fs/xfs/xfs_filestream.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> > > index 6a3ce0f6dc9e..be9bcf8a1f99 100644
> > > --- a/fs/xfs/xfs_filestream.c
> > > +++ b/fs/xfs/xfs_filestream.c
> > > @@ -128,11 +128,12 @@ xfs_filestream_pick_ag(
> > >  		if (!pag->pagf_init) {
> > >  			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
> > >  			if (err) {
> > > -				xfs_perag_put(pag);
> > > -				if (err != -EAGAIN)
> > > +				if (err != -EAGAIN) {
> > > +					xfs_perag_put(pag);
> > >  					return err;
> > > +				}
> > >  				/* Couldn't lock the AGF, skip this AG. */
> > > -				continue;
> > > +				goto next_ag;
> > >  			}
> > >  		}
> > >  
> > > -- 
> > > 2.34.1
> > > 
> > 
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 6a3ce0f6dc9e..be9bcf8a1f99 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -128,11 +128,12 @@  xfs_filestream_pick_ag(
 		if (!pag->pagf_init) {
 			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
 			if (err) {
-				xfs_perag_put(pag);
-				if (err != -EAGAIN)
+				if (err != -EAGAIN) {
+					xfs_perag_put(pag);
 					return err;
+				}
 				/* Couldn't lock the AGF, skip this AG. */
-				continue;
+				goto next_ag;
 			}
 		}