diff mbox series

[2/3] xfs: don't assert fail on perag references on teardown

Message ID 20220524022158.1849458-3-david@fromorbit.com (mailing list archive)
State Accepted
Headers show
Series xfs: small fixes for 5.19 cycle | expand

Commit Message

Dave Chinner May 24, 2022, 2:21 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Not fatal, the assert is there to catch developer attention. I'm
seeing this occasionally during recoveryloop testing after a
shutdown, and I don't want this to stop an overnight recoveryloop
run as it is currently doing.

Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
corruption report into the log and cause a test failure that way,
but it won't stop the machine dead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Darrick J. Wong May 24, 2022, 3:48 a.m. UTC | #1
On Tue, May 24, 2022 at 12:21:57PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Not fatal, the assert is there to catch developer attention. I'm
> seeing this occasionally during recoveryloop testing after a
> shutdown, and I don't want this to stop an overnight recoveryloop
> run as it is currently doing.
> 
> Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
> corruption report into the log and cause a test failure that way,
> but it won't stop the machine dead.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 1e4ee042d52f..3e920cf1b454 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -173,7 +173,6 @@ __xfs_free_perag(
>  	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
>  
>  	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
> -	ASSERT(atomic_read(&pag->pag_ref) == 0);

Er, shouldn't this also be converted to XFS_IS_CORRUPT?  That's what the
commit message said...

--D

>  	kmem_free(pag);
>  }
>  
> @@ -192,7 +191,7 @@ xfs_free_perag(
>  		pag = radix_tree_delete(&mp->m_perag_tree, agno);
>  		spin_unlock(&mp->m_perag_lock);
>  		ASSERT(pag);
> -		ASSERT(atomic_read(&pag->pag_ref) == 0);
> +		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
>  
>  		cancel_delayed_work_sync(&pag->pag_blockgc_work);
>  		xfs_iunlink_destroy(pag);
> -- 
> 2.35.1
>
Dave Chinner May 24, 2022, 4 a.m. UTC | #2
On Mon, May 23, 2022 at 08:48:06PM -0700, Darrick J. Wong wrote:
> On Tue, May 24, 2022 at 12:21:57PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Not fatal, the assert is there to catch developer attention. I'm
> > seeing this occasionally during recoveryloop testing after a
> > shutdown, and I don't want this to stop an overnight recoveryloop
> > run as it is currently doing.
> > 
> > Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
> > corruption report into the log and cause a test failure that way,
> > but it won't stop the machine dead.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_ag.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index 1e4ee042d52f..3e920cf1b454 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -173,7 +173,6 @@ __xfs_free_perag(
> >  	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
> >  
> >  	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
> > -	ASSERT(atomic_read(&pag->pag_ref) == 0);
> 
> Er, shouldn't this also be converted to XFS_IS_CORRUPT?  That's what the
> commit message said...

That's in the RCU callback context and we never get here when the
ASSERT fires. i.e. the assert in xfs_free_perag fires before we
queue the rcu callback to free this, so checking it here is kinda
redundant.

i.e. it's not where this issue is being caught - it's
being caught by the check below (in xfs_free_perag()) where the
conversion to XFS_IS_CORRUPT is done....

Cheers,

Dave.

> >  	kmem_free(pag);
> >  }
> >  
> > @@ -192,7 +191,7 @@ xfs_free_perag(
> >  		pag = radix_tree_delete(&mp->m_perag_tree, agno);
> >  		spin_unlock(&mp->m_perag_lock);
> >  		ASSERT(pag);
> > -		ASSERT(atomic_read(&pag->pag_ref) == 0);
> > +		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
> >  
> >  		cancel_delayed_work_sync(&pag->pag_blockgc_work);
> >  		xfs_iunlink_destroy(pag);
> > -- 
> > 2.35.1
> > 
>
Darrick J. Wong May 24, 2022, 4:10 a.m. UTC | #3
On Tue, May 24, 2022 at 02:00:15PM +1000, Dave Chinner wrote:
> On Mon, May 23, 2022 at 08:48:06PM -0700, Darrick J. Wong wrote:
> > On Tue, May 24, 2022 at 12:21:57PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Not fatal, the assert is there to catch developer attention. I'm
> > > seeing this occasionally during recoveryloop testing after a
> > > shutdown, and I don't want this to stop an overnight recoveryloop
> > > run as it is currently doing.
> > > 
> > > Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
> > > corruption report into the log and cause a test failure that way,
> > > but it won't stop the machine dead.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_ag.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > > index 1e4ee042d52f..3e920cf1b454 100644
> > > --- a/fs/xfs/libxfs/xfs_ag.c
> > > +++ b/fs/xfs/libxfs/xfs_ag.c
> > > @@ -173,7 +173,6 @@ __xfs_free_perag(
> > >  	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
> > >  
> > >  	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
> > > -	ASSERT(atomic_read(&pag->pag_ref) == 0);
> > 
> > Er, shouldn't this also be converted to XFS_IS_CORRUPT?  That's what the
> > commit message said...
> 
> That's in the RCU callback context and we never get here when the
> ASSERT fires. i.e. the assert in xfs_free_perag fires before we
> queue the rcu callback to free this, so checking it here is kinda
> redundant.
> 
> i.e. it's not where this issue is being caught - it's
> being caught by the check below (in xfs_free_perag()) where the
> conversion to XFS_IS_CORRUPT is done....

Ah, right.  Ok then,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Cheers,
> 
> Dave.
> 
> > >  	kmem_free(pag);
> > >  }
> > >  
> > > @@ -192,7 +191,7 @@ xfs_free_perag(
> > >  		pag = radix_tree_delete(&mp->m_perag_tree, agno);
> > >  		spin_unlock(&mp->m_perag_lock);
> > >  		ASSERT(pag);
> > > -		ASSERT(atomic_read(&pag->pag_ref) == 0);
> > > +		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
> > >  
> > >  		cancel_delayed_work_sync(&pag->pag_blockgc_work);
> > >  		xfs_iunlink_destroy(pag);
> > > -- 
> > > 2.35.1
> > > 
> > 
> 
> -- 
> Dave Chinner
> david@fromorbit.com
Christoph Hellwig May 24, 2022, 8:14 a.m. UTC | #4
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 1e4ee042d52f..3e920cf1b454 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -173,7 +173,6 @@  __xfs_free_perag(
 	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
 
 	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
-	ASSERT(atomic_read(&pag->pag_ref) == 0);
 	kmem_free(pag);
 }
 
@@ -192,7 +191,7 @@  xfs_free_perag(
 		pag = radix_tree_delete(&mp->m_perag_tree, agno);
 		spin_unlock(&mp->m_perag_lock);
 		ASSERT(pag);
-		ASSERT(atomic_read(&pag->pag_ref) == 0);
+		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
 
 		cancel_delayed_work_sync(&pag->pag_blockgc_work);
 		xfs_iunlink_destroy(pag);