Message ID | 20240823110439.1585041-3-leo.lilong@huawei.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | xfs: fix and cleanups for log item push | expand |
On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote: > Deleting items from the AIL before the log is shut down can result in the > log tail moving forward in the journal on disk because log writes can still > be taking place. As a result, items that have been deleted from the AIL > might not be recovered during the next mount, even though they should be, > as they were never written back to disk. > > Signed-off-by: Long Li <leo.lilong@huawei.com> > --- > fs/xfs/xfs_dquot.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > index c1b211c260a9..4cbe3db6fc32 100644 > --- a/fs/xfs/xfs_dquot.c > +++ b/fs/xfs/xfs_dquot.c > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( > return 0; > > out_abort: > + /* > + * Shutdown first to stop the log before deleting items from the AIL. > + * Deleting items from the AIL before the log is shut down can result > + * in the log tail moving forward in the journal on disk because log > + * writes can still be taking place. > + */ > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > dqp->q_flags &= ~XFS_DQFLAG_DIRTY; > xfs_trans_ail_delete(lip, 0); > - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); I see the logic in shutting down the log before letting go of the dquot log item that triggered the shutdown, but I wonder, why do we delete the item from the AIL? AFAICT the inode items don't do that on iflush failure, but OTOH I couldn't figure out how the log items in the AIL get deleted from the AIL after a shutdown. Or maybe during a shutdown we just stop xfsaild and let the higher level objects free the log items during reclaim? --D > out_unlock: > xfs_dqfunlock(dqp); > return error; > -- > 2.39.2 > >
On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote: > On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote: > > Deleting items from the AIL before the log is shut down can result in the > > log tail moving forward in the journal on disk because log writes can still > > be taking place. As a result, items that have been deleted from the AIL > > might not be recovered during the next mount, even though they should be, > > as they were never written back to disk. > > > > Signed-off-by: Long Li <leo.lilong@huawei.com> > > --- > > fs/xfs/xfs_dquot.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > > index c1b211c260a9..4cbe3db6fc32 100644 > > --- a/fs/xfs/xfs_dquot.c > > +++ b/fs/xfs/xfs_dquot.c > > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( > > return 0; > > > > out_abort: > > + /* > > + * Shutdown first to stop the log before deleting items from the AIL. > > + * Deleting items from the AIL before the log is shut down can result > > + * in the log tail moving forward in the journal on disk because log > > + * writes can still be taking place. > > + */ > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > dqp->q_flags &= ~XFS_DQFLAG_DIRTY; > > xfs_trans_ail_delete(lip, 0); > > - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > I see the logic in shutting down the log before letting go of the dquot > log item that triggered the shutdown, but I wonder, why do we delete the > item from the AIL? AFAICT the inode items don't do that on iflush > failure, but OTOH I couldn't figure out how the log items in the AIL get > deleted from the AIL after a shutdown. Or maybe during a shutdown we > just stop xfsaild and let the higher level objects free the log items > during reclaim? > > --D > When inode flush failure, the inode item is also removed from the AIL. Since the inode item has already been added to bp->b_li_list during precommit, it can be deleted through the error handling xfs_buf_ioend_fail(bp), and this deletion occurs after the log shutdown. However, during dquot item push, the dquot item has not yet been attached to the buffer. Therefore, in this case, the dquot item is directly removed from the AIL. Thanks, Long Li
On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote: > On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote: > > Deleting items from the AIL before the log is shut down can result in the > > log tail moving forward in the journal on disk because log writes can still > > be taking place. As a result, items that have been deleted from the AIL > > might not be recovered during the next mount, even though they should be, > > as they were never written back to disk. > > > > Signed-off-by: Long Li <leo.lilong@huawei.com> > > --- > > fs/xfs/xfs_dquot.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > > index c1b211c260a9..4cbe3db6fc32 100644 > > --- a/fs/xfs/xfs_dquot.c > > +++ b/fs/xfs/xfs_dquot.c > > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( > > return 0; > > > > out_abort: > > + /* > > + * Shutdown first to stop the log before deleting items from the AIL. > > + * Deleting items from the AIL before the log is shut down can result > > + * in the log tail moving forward in the journal on disk because log > > + * writes can still be taking place. > > + */ > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > dqp->q_flags &= ~XFS_DQFLAG_DIRTY; > > xfs_trans_ail_delete(lip, 0); > > - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > I see the logic in shutting down the log before letting go of the dquot > log item that triggered the shutdown, but I wonder, why do we delete the > item from the AIL? AFAICT the inode items don't do that on iflush > failure, but OTOH I couldn't figure out how the log items in the AIL get > deleted from the AIL after a shutdown. Intents are removed from the AIL when the transaction containing the deferred intent chain is cancelled instead of committed due the log being shut down. For everything else in the AIL, the ->iop_push method is supposed to do any cleanup that is necessary by failing the item push and running the item failure method itself. For buffers, this is running IO completion as if an IO error occurred. Error handling sees the shutdown and removes the item from the AIL. For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO error occurred, that then runs the individual inode abort code that removes the inode items from the AIL. For dquots, it has the ancient cleanup method that inodes used to have. i.e. if the dquot has been flushed to the buffer, it is attached to the buffer and then the buffer submission will fail and run IO completion with an error. If the dquot hasn't been flushed to the buffer because either it or the underlying dquot buffer is corrupt it will remove the dquot from the AIL and then shut down the filesystem. It's the latter case that could be an issue. It's not the same as the inode item case, because the tail pinning that the INODE_ALLOC inode item type flag causes does not happen with dquots. There is still a potential window where the dquot could be at the tail of the log, and remocing it moves the tail forward at exactly the same time the log tail is being sampled during a log write, and the shutdown doesn't happen fast enough to prevent the log write going out to disk. To make timing of such a race even more unlikely, it would have to race with a log write that contains a commit record, otherwise the log tail lsn in the iclog will be ignored because it wasn't contained within a complete checkpoint in the journal. It's very unlikely that a filesystem will read a corrupt dquot from disk at exactly the same point in time these other journal pre-conditions are met, but it could happen... > Or maybe during a shutdown we just stop xfsaild and let the higher > level objects free the log items during reclaim? The AIL contains objects that have no references elsewhere in the filesystem. It must be pushed until empty during unmount after a shutdown to ensure that all the items in it have been pushed, failed, removed from the AIL and freed... -Dave.
On Tue, Aug 27, 2024 at 07:40:14PM +1000, Dave Chinner wrote: > On Fri, Aug 23, 2024 at 10:00:06AM -0700, Darrick J. Wong wrote: > > On Fri, Aug 23, 2024 at 07:04:36PM +0800, Long Li wrote: > > > Deleting items from the AIL before the log is shut down can result in the > > > log tail moving forward in the journal on disk because log writes can still > > > be taking place. As a result, items that have been deleted from the AIL > > > might not be recovered during the next mount, even though they should be, > > > as they were never written back to disk. > > > > > > Signed-off-by: Long Li <leo.lilong@huawei.com> > > > --- > > > fs/xfs/xfs_dquot.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > > > index c1b211c260a9..4cbe3db6fc32 100644 > > > --- a/fs/xfs/xfs_dquot.c > > > +++ b/fs/xfs/xfs_dquot.c > > > @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( > > > return 0; > > > > > > out_abort: > > > + /* > > > + * Shutdown first to stop the log before deleting items from the AIL. > > > + * Deleting items from the AIL before the log is shut down can result > > > + * in the log tail moving forward in the journal on disk because log > > > + * writes can still be taking place. > > > + */ > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > > dqp->q_flags &= ~XFS_DQFLAG_DIRTY; > > > xfs_trans_ail_delete(lip, 0); > > > - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > > > I see the logic in shutting down the log before letting go of the dquot > > log item that triggered the shutdown, but I wonder, why do we delete the > > item from the AIL? AFAICT the inode items don't do that on iflush > > failure, but OTOH I couldn't figure out how the log items in the AIL get > > deleted from the AIL after a shutdown. > > Intents are removed from the AIL when the transaction containing > the deferred intent chain is cancelled instead of committed due the > log being shut down. > > For everything else in the AIL, the ->iop_push method is supposed to > do any cleanup that is necessary by failing the item push and > running the item failure method itself. > > For buffers, this is running IO completion as if an IO error > occurred. Error handling sees the shutdown and removes the item from > the AIL. > > For inodes, xfs_iflush_cluster() fails the inode buffer as if an IO > error occurred, that then runs the individual inode abort code that > removes the inode items from the AIL. > > For dquots, it has the ancient cleanup method that inodes used to > have. i.e. if the dquot has been flushed to the buffer, it is attached to > the buffer and then the buffer submission will fail and run IO > completion with an error. If the dquot hasn't been flushed to the > buffer because either it or the underlying dquot buffer is corrupt > it will remove the dquot from the AIL and then shut down the > filesystem. > > It's the latter case that could be an issue. It's not the same as > the inode item case, because the tail pinning that the INODE_ALLOC > inode item type flag causes does not happen with dquots. There is I'd like to know if the "INODE_ALLOC inode item" refers to a buf item with the XFS_BLI_INODE_ALLOC_BUF flag? I understand that when this type of buf item undergoes relog, the tail lsn might be pinned, but I'm not sure why it's mentioned here, Why does it cause inode and dquot to be very different? > still a potential window where the dquot could be at the tail of the > log, and remocing it moves the tail forward at exactly the same time > the log tail is being sampled during a log write, and the shutdown > doesn't happen fast enough to prevent the log write going out to > disk. > > To make timing of such a race even more unlikely, it would have to > race with a log write that contains a commit record, otherwise the > log tail lsn in the iclog will be ignored because it wasn't > contained within a complete checkpoint in the journal. It's very > unlikely that a filesystem will read a corrupt dquot from disk at > exactly the same point in time these other journal pre-conditions > are met, but it could happen... > This is a very detailed explanation. I will add this to my commit message in the next version. Yes, although the conditions for it to occur are strict, it's still possible to happen. Thanks, Long Li > > Or maybe during a shutdown we just stop xfsaild and let the higher > > level objects free the log items during reclaim? > > The AIL contains objects that have no references elsewhere in the > filesystem. It must be pushed until empty during unmount after a > shutdown to ensure that all the items in it have been pushed, > failed, removed from the AIL and freed... > > -Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index c1b211c260a9..4cbe3db6fc32 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1332,9 +1332,15 @@ xfs_qm_dqflush( return 0; out_abort: + /* + * Shutdown first to stop the log before deleting items from the AIL. + * Deleting items from the AIL before the log is shut down can result + * in the log tail moving forward in the journal on disk because log + * writes can still be taking place. + */ + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); dqp->q_flags &= ~XFS_DQFLAG_DIRTY; xfs_trans_ail_delete(lip, 0); - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); out_unlock: xfs_dqfunlock(dqp); return error;
Deleting items from the AIL before the log is shut down can result in the log tail moving forward in the journal on disk because log writes can still be taking place. As a result, items that have been deleted from the AIL might not be recovered during the next mount, even though they should be, as they were never written back to disk. Signed-off-by: Long Li <leo.lilong@huawei.com> --- fs/xfs/xfs_dquot.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)