diff mbox series

[2/5] xfs: ensuere deleting item from AIL after shutdown in dquot flush

Message ID 20240823110439.1585041-3-leo.lilong@huawei.com (mailing list archive)
State New
Headers show
Series xfs: fix and cleanups for log item push | expand

Commit Message

Long Li Aug. 23, 2024, 11:04 a.m. UTC
Deleting items from the AIL before the log is shut down can result in the
log tail moving forward in the journal on disk because log writes can still
be taking place. As a result, items that have been deleted from the AIL
might not be recovered during the next mount, even though they should be,
as they were never written back to disk.

Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/xfs_dquot.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Aug. 23, 2024, 5 p.m. UTC | #1
On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote:
> Deleting items from the AIL before the log is shut down can result in the
> log tail moving forward in the journal on disk because log writes can still
> be taking place. As a result, items that have been deleted from the AIL
> might not be recovered during the next mount, even though they should be,
> as they were never written back to disk.
> 
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/xfs_dquot.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index c1b211c260a9..4cbe3db6fc32 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1332,9 +1332,15 @@ xfs_qm_dqflush(
>  	return 0;
>  
>  out_abort:
> +	/*
> +	 * Shutdown first to stop the log before deleting items from the AIL.
> +	 * Deleting items from the AIL before the log is shut down can result
> +	 * in the log tail moving forward in the journal on disk because log
> +	 * writes can still be taking place.
> +	 */
> +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>  	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
>  	xfs_trans_ail_delete(lip, 0);
> -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);

I see the logic in shutting down the log before letting go of the dquot
log item that triggered the shutdown, but I wonder, why do we delete the
item from the AIL?  AFAICT the inode items don't do that on iflush
failure, but OTOH I couldn't figure out how the log items in the AIL get
deleted from the AIL after a shutdown.  Or maybe during a shutdown we
just stop xfsaild and let the higher level objects free the log items
during reclaim?

--D

>  out_unlock:
>  	xfs_dqfunlock(dqp);
>  	return error;
> -- 
> 2.39.2
> 
>
Long Li Aug. 24, 2024, 3:08 a.m. UTC | #2
On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote:
> > Deleting items from the AIL before the log is shut down can result in the
> > log tail moving forward in the journal on disk because log writes can still
> > be taking place. As a result, items that have been deleted from the AIL
> > might not be recovered during the next mount, even though they should be,
> > as they were never written back to disk.
> > 
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> >  fs/xfs/xfs_dquot.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index c1b211c260a9..4cbe3db6fc32 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush(
> >  	return 0;
> >  
> >  out_abort:
> > +	/*
> > +	 * Shutdown first to stop the log before deleting items from the AIL.
> > +	 * Deleting items from the AIL before the log is shut down can result
> > +	 * in the log tail moving forward in the journal on disk because log
> > +	 * writes can still be taking place.
> > +	 */
> > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >  	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
> >  	xfs_trans_ail_delete(lip, 0);
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> 
> I see the logic in shutting down the log before letting go of the dquot
> log item that triggered the shutdown, but I wonder, why do we delete the
> item from the AIL?  AFAICT the inode items don't do that on iflush
> failure, but OTOH I couldn't figure out how the log items in the AIL get
> deleted from the AIL after a shutdown.  Or maybe during a shutdown we
> just stop xfsaild and let the higher level objects free the log items
> during reclaim?
> 
> --D
> 

When inode flush failure, the inode item is also removed from the AIL. Since
the inode item has already been added to bp->b_li_list during precommit, it
can be deleted through the error handling xfs_buf_ioend_fail(bp), and this
deletion occurs after the log shutdown. However, during dquot item push,
the dquot item has not yet been attached to the buffer. Therefore, in this
case, the dquot item is directly removed from the AIL.

Thanks,
Long Li
Dave Chinner Aug. 27, 2024, 9:40 a.m. UTC | #3
On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote:
> On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote:
> > Deleting items from the AIL before the log is shut down can result in the
> > log tail moving forward in the journal on disk because log writes can still
> > be taking place. As a result, items that have been deleted from the AIL
> > might not be recovered during the next mount, even though they should be,
> > as they were never written back to disk.
> > 
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> >  fs/xfs/xfs_dquot.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index c1b211c260a9..4cbe3db6fc32 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush(
> >  	return 0;
> >  
> >  out_abort:
> > +	/*
> > +	 * Shutdown first to stop the log before deleting items from the AIL.
> > +	 * Deleting items from the AIL before the log is shut down can result
> > +	 * in the log tail moving forward in the journal on disk because log
> > +	 * writes can still be taking place.
> > +	 */
> > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >  	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
> >  	xfs_trans_ail_delete(lip, 0);
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> 
> I see the logic in shutting down the log before letting go of the dquot
> log item that triggered the shutdown, but I wonder, why do we delete the
> item from the AIL?  AFAICT the inode items don't do that on iflush
> failure, but OTOH I couldn't figure out how the log items in the AIL get
> deleted from the AIL after a shutdown. 

Intents are removed from the AIL when the transaction containing
the deferred intent chain is cancelled instead of committed due the
log being shut down.

For everything else in the AIL, the ->iop_push method is supposed to
do any cleanup that is necessary by failing the item push and
running the item failure method itself.

For buffers, this is running IO completion as if an IO error
occurred. Error handling sees the shutdown and removes the item from
the AIL.

For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO
error occurred, that then runs the individual inode abort code that
removes the inode items from the AIL.

For dquots, it has the ancient cleanup method that inodes used to
have. i.e. if the dquot has been flushed to the buffer, it is attached to
the buffer and then the buffer submission will fail and run IO
completion with an error. If the dquot hasn't been flushed to the
buffer because either it or the underlying dquot buffer is corrupt
it will remove the dquot from the AIL and then shut down the
filesystem.

It's the latter case that could be an issue. It's not the same as
the inode item case, because the tail pinning that the INODE_ALLOC
inode item type flag causes does not happen with dquots. There is
still a potential window where the dquot could be at the tail of the
log, and remocing it moves the tail forward at exactly the same time
the log tail is being sampled during a log write, and the shutdown
doesn't happen fast enough to prevent the log write going out to
disk.

To make timing of such a race even more unlikely, it would have to
race with a log write that contains a commit record, otherwise the
log tail lsn in the iclog will be ignored because it wasn't
contained within a complete checkpoint in the journal.  It's very
unlikely that a filesystem will read a corrupt dquot from disk at
exactly the same point in time these other journal pre-conditions
are met, but it could happen...

> Or maybe during a shutdown we just stop xfsaild and let the higher
> level objects free the log items during reclaim?

The AIL contains objects that have no references elsewhere in the
filesystem. It must be pushed until empty during unmount after a
shutdown to ensure that all the items in it have been pushed,
failed, removed from the AIL and freed...

-Dave.
Long Li Aug. 31, 2024, 1:45 p.m. UTC | #4
On Tue, Aug 27, 2024 at 07:40:14PM +1000, Dave Chinner wrote:
> On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote:
> > On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote:
> > > Deleting items from the AIL before the log is shut down can result in the
> > > log tail moving forward in the journal on disk because log writes can still
> > > be taking place. As a result, items that have been deleted from the AIL
> > > might not be recovered during the next mount, even though they should be,
> > > as they were never written back to disk.
> > > 
> > > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > > ---
> > >  fs/xfs/xfs_dquot.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > index c1b211c260a9..4cbe3db6fc32 100644
> > > --- a/fs/xfs/xfs_dquot.c
> > > +++ b/fs/xfs/xfs_dquot.c
> > > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush(
> > >  	return 0;
> > >  
> > >  out_abort:
> > > +	/*
> > > +	 * Shutdown first to stop the log before deleting items from the AIL.
> > > +	 * Deleting items from the AIL before the log is shut down can result
> > > +	 * in the log tail moving forward in the journal on disk because log
> > > +	 * writes can still be taking place.
> > > +	 */
> > > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > >  	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
> > >  	xfs_trans_ail_delete(lip, 0);
> > > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > 
> > I see the logic in shutting down the log before letting go of the dquot
> > log item that triggered the shutdown, but I wonder, why do we delete the
> > item from the AIL?  AFAICT the inode items don't do that on iflush
> > failure, but OTOH I couldn't figure out how the log items in the AIL get
> > deleted from the AIL after a shutdown. 
> 
> Intents are removed from the AIL when the transaction containing
> the deferred intent chain is cancelled instead of committed due the
> log being shut down.
> 
> For everything else in the AIL, the ->iop_push method is supposed to
> do any cleanup that is necessary by failing the item push and
> running the item failure method itself.
> 
> For buffers, this is running IO completion as if an IO error
> occurred. Error handling sees the shutdown and removes the item from
> the AIL.
> 
> For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO
> error occurred, that then runs the individual inode abort code that
> removes the inode items from the AIL.
> 
> For dquots, it has the ancient cleanup method that inodes used to
> have. i.e. if the dquot has been flushed to the buffer, it is attached to
> the buffer and then the buffer submission will fail and run IO
> completion with an error. If the dquot hasn't been flushed to the
> buffer because either it or the underlying dquot buffer is corrupt
> it will remove the dquot from the AIL and then shut down the
> filesystem.
> 
> It's the latter case that could be an issue. It's not the same as
> the inode item case, because the tail pinning that the INODE_ALLOC
> inode item type flag causes does not happen with dquots. There is

I'd like to know if the "INODE_ALLOC inode item" refers to a buf
item with the XFS_BLI_INODE_ALLOC_BUF flag? I understand that when
this type of buf item undergoes relog, the tail lsn might be pinned,
but I'm not sure why it's mentioned here, Why does it cause inode
and dquot to be very different?

> still a potential window where the dquot could be at the tail of the
> log, and remocing it moves the tail forward at exactly the same time
> the log tail is being sampled during a log write, and the shutdown
> doesn't happen fast enough to prevent the log write going out to
> disk.
> 
> To make timing of such a race even more unlikely, it would have to
> race with a log write that contains a commit record, otherwise the
> log tail lsn in the iclog will be ignored because it wasn't
> contained within a complete checkpoint in the journal.  It's very
> unlikely that a filesystem will read a corrupt dquot from disk at
> exactly the same point in time these other journal pre-conditions
> are met, but it could happen...
> 

This is a very detailed explanation. I will add this to my commit
message in the next version. Yes, although the conditions for it
to occur are strict, it's still possible to happen.

Thanks,
Long Li

> > Or maybe during a shutdown we just stop xfsaild and let the higher
> > level objects free the log items during reclaim?
> 
> The AIL contains objects that have no references elsewhere in the
> filesystem. It must be pushed until empty during unmount after a
> shutdown to ensure that all the items in it have been pushed,
> failed, removed from the AIL and freed...
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index c1b211c260a9..4cbe3db6fc32 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1332,9 +1332,15 @@  xfs_qm_dqflush(
 	return 0;
 
 out_abort:
+	/*
+	 * Shutdown first to stop the log before deleting items from the AIL.
+	 * Deleting items from the AIL before the log is shut down can result
+	 * in the log tail moving forward in the journal on disk because log
+	 * writes can still be taking place.
+	 */
+	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
 	xfs_trans_ail_delete(lip, 0);
-	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 out_unlock:
 	xfs_dqfunlock(dqp);
 	return error;