diff mbox series

[3/4] xfs: defered work could create precommits

Message ID 20230517000449.3997582-4-david@fromorbit.com (mailing list archive)
State Accepted
Headers show
Series [1/4] xfs: buffer pins need to hold a buffer reference | expand

Commit Message

Dave Chinner May 17, 2023, 12:04 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
inode cluster buffer oeprations to the ->iop_precommit() method.
However, this means that deferred operations can require precommits
to be run on the final transaction that the deferred ops pass back
to xfs_trans_commit() context. This will be exposed by attribute
handling, in that the last changes to the inode in the attr set
state machine "disappear" because the precommit operation is not run.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Darrick J. Wong May 17, 2023, 1:20 a.m. UTC | #1
On Wed, May 17, 2023 at 10:04:48AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
> inode cluster buffer oeprations to the ->iop_precommit() method.

                       operations

> However, this means that deferred operations can require precommits
> to be run on the final transaction that the deferred ops pass back
> to xfs_trans_commit() context. This will be exposed by attribute
> handling, in that the last changes to the inode in the attr set
> state machine "disappear" because the precommit operation is not run.

Wait, what?

OH, this is because the LARP state machine can log the inode in the
final transaction in its chain.  __xfs_trans_commit calls the noroll
variant of xfs_defer_finish, which means that when we get to...

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_trans.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 8afc0c080861..664084509af5 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -970,6 +970,11 @@ __xfs_trans_commit(
>  		error = xfs_defer_finish_noroll(&tp);
>  		if (error)
>  			goto out_unreserve;

...here, tp might actually be a dirty transaction.  Prior to
xlog_cil_committing the dirty transaction, we need to run the precommit
hooks or else we don't actually (re)attach the inode cluster buffer to
the transaction, and everything goes batty from there.  Right?

This isn't specific to LARP; any defer item that logs an item with a
precommit hook is subject to this.  Right?

> +
> +		/* Run precomits from final tx in defer chain */

                       precommits

If the answers to the last 2 questions are 'yes' and the spelling errors
get fixed, I think I'm ok enough with this one to throw it at
fstestscloud.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +		error = xfs_trans_run_precommits(tp);
> +		if (error)
> +			goto out_unreserve;
>  	}
>  
>  	/*
> -- 
> 2.40.1
>
Dave Chinner May 17, 2023, 1:44 a.m. UTC | #2
On Tue, May 16, 2023 at 06:20:59PM -0700, Darrick J. Wong wrote:
> On Wed, May 17, 2023 at 10:04:48AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
> > inode cluster buffer oeprations to the ->iop_precommit() method.
> 
>                        operations
> 
> > However, this means that deferred operations can require precommits
> > to be run on the final transaction that the deferred ops pass back
> > to xfs_trans_commit() context. This will be exposed by attribute
> > handling, in that the last changes to the inode in the attr set
> > state machine "disappear" because the precommit operation is not run.
> 
> Wait, what?

That was exactly the reaction I had when attribute tests failed
unexpectedly. Then a quick bit of traceprintk debugging (which I
probably forgot to remove!) pointed me at shortform attr updates
where no precommit was running...

> OH, this is because the LARP state machine can log the inode in the
> final transaction in its chain.  __xfs_trans_commit calls the noroll
> variant of xfs_defer_finish, which means that when we get to...

Yes.

> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_trans.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 8afc0c080861..664084509af5 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -970,6 +970,11 @@ __xfs_trans_commit(
> >  		error = xfs_defer_finish_noroll(&tp);
> >  		if (error)
> >  			goto out_unreserve;
> 
> ...here, tp might actually be a dirty transaction.  Prior to
> xlog_cil_committing the dirty transaction, we need to run the precommit
> hooks or else we don't actually (re)attach the inode cluster buffer to
> the transaction, and everything goes batty from there.  Right?

Yes.

> This isn't specific to LARP; any defer item that logs an item with a
> precommit hook is subject to this.  Right?

Yes.

But right now, the only precommit hook is the unlinked inode
processing, which doesn't run from defer-ops context...

> > +
> > +		/* Run precomits from final tx in defer chain */
> 
>                        precommits
> 
> If the answers to the last 2 questions are 'yes' and the spelling errors
> get fixed, I think I'm ok enough with this one to throw it at
> fstestscloud.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
Christoph Hellwig June 1, 2023, 3:01 p.m. UTC | #3
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 8afc0c080861..664084509af5 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -970,6 +970,11 @@  __xfs_trans_commit(
 		error = xfs_defer_finish_noroll(&tp);
 		if (error)
 			goto out_unreserve;
+
+		/* Run precomits from final tx in defer chain */
+		error = xfs_trans_run_precommits(tp);
+		if (error)
+			goto out_unreserve;
 	}
 
 	/*