diff mbox series

[1/8] xfs: don't reverse order of items in bulk AIL insertion

Message ID 20230627224412.2242198-2-david@fromorbit.com (mailing list archive)
State Deferred, archived
Headers show
Series xfs: various fixes for 6.5 | expand

Commit Message

Dave Chinner June 27, 2023, 10:44 p.m. UTC
From: Dave Chinner <dchinner@redhat.com>

XFS has strict metadata ordering requirements. One of the things it
does is maintain the commit order of items from transaction commit
through the CIL and into the AIL. That is, if a transaction logs
item A before item B in a modification, then they will be inserted
into the CIL in the order {A, B}. These items are then written into
the iclog during checkpointing in the order {A, B}. When the
checkpoint commits, they are supposed to be inserted into the AIL in
the order {A, B}, and when they are pushed from the AIL, they are
pushed in the order {A, B}.

If we crash, log recovery then replays the two items from the
checkpoint in the order {A, B}, resulting in the objects the items
apply to being queued for writeback at the end of the checkpoint
in the order {A, B}. This means recovery behaves the same way as the
runtime code.

In places, we have subtle dependencies on this ordering being
maintained. One of this place is performing intent recovery from the
log. It assumes that recovering an intent will result in a
non-intent object being the first thing that is modified in the
recovery transaction, and so when the transaction commits and the
journal flushes, the first object inserted into the AIL beyond the
intent recovery range will be a non-intent item.  It uses the
transistion from intent items to non-intent items to stop the
recovery pass.

A recent log recovery issue indicated that an intent was appearing
as the first item in the AIL beyond the recovery range, hence
breaking the end of recovery detection that exists.

Tracing indicated insertion of the items into the AIL was apparently
occurring in the right order (the intent was last in the commit item
list), but the intent was appearing first in the AIL. IOWs, the
order of items in the AIL was {D,C,B,A}, not {A,B,C,D}, and bulk
insertion was reversing the order of the items in the batch of items
being inserted.

Lucky for us, all the items fed to bulk insertion have the same LSN,
so the reversal of order does not affect the log head/tail tracking
that is based on the contents of the AIL. It only impacts on code
that has implicit, subtle dependencies on object order, and AFAICT
only the intent recovery loop is impacted by it.

Make sure bulk AIL insertion does not reorder items incorrectly.

Fixes: 0e57f6a36f9b ("xfs: bulk AIL insertion during transaction commit")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_ail.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig June 28, 2023, 6:03 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Chandan Babu R June 28, 2023, 9:55 a.m. UTC | #2
On Wed, Jun 28, 2023 at 08:44:05 AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> XFS has strict metadata ordering requirements. One of the things it
> does is maintain the commit order of items from transaction commit
> through the CIL and into the AIL. That is, if a transaction logs
> item A before item B in a modification, then they will be inserted
> into the CIL in the order {A, B}. These items are then written into
> the iclog during checkpointing in the order {A, B}. When the
> checkpoint commits, they are supposed to be inserted into the AIL in
> the order {A, B}, and when they are pushed from the AIL, they are
> pushed in the order {A, B}.
>
> If we crash, log recovery then replays the two items from the
> checkpoint in the order {A, B}, resulting in the objects the items
> apply to being queued for writeback at the end of the checkpoint
> in the order {A, B}. This means recovery behaves the same way as the
> runtime code.
>
> In places, we have subtle dependencies on this ordering being
> maintained. One of this place is performing intent recovery from the
> log. It assumes that recovering an intent will result in a
> non-intent object being the first thing that is modified in the
> recovery transaction, and so when the transaction commits and the
> journal flushes, the first object inserted into the AIL beyond the
> intent recovery range will be a non-intent item.  It uses the
> transistion from intent items to non-intent items to stop the
> recovery pass.
>
> A recent log recovery issue indicated that an intent was appearing
> as the first item in the AIL beyond the recovery range, hence
> breaking the end of recovery detection that exists.
>
> Tracing indicated insertion of the items into the AIL was apparently
> occurring in the right order (the intent was last in the commit item
> list), but the intent was appearing first in the AIL. IOWs, the
> order of items in the AIL was {D,C,B,A}, not {A,B,C,D}, and bulk
> insertion was reversing the order of the items in the batch of items
> being inserted.
>
> Lucky for us, all the items fed to bulk insertion have the same LSN,
> so the reversal of order does not affect the log head/tail tracking
> that is based on the contents of the AIL. It only impacts on code
> that has implicit, subtle dependencies on object order, and AFAICT
> only the intent recovery loop is impacted by it.
>
> Make sure bulk AIL insertion does not reorder items incorrectly.

Looks good to me.

Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>

>
> Fixes: 0e57f6a36f9b ("xfs: bulk AIL insertion during transaction commit")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_trans_ail.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 7d4109af193e..1098452e7f95 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -823,7 +823,7 @@ xfs_trans_ail_update_bulk(
>  			trace_xfs_ail_insert(lip, 0, lsn);
>  		}
>  		lip->li_lsn = lsn;
> -		list_add(&lip->li_ail, &tmp);
> +		list_add_tail(&lip->li_ail, &tmp);
>  	}
>  
>  	if (!list_empty(&tmp))
Darrick J. Wong June 28, 2023, 5:46 p.m. UTC | #3
On Wed, Jun 28, 2023 at 08:44:05AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> XFS has strict metadata ordering requirements. One of the things it
> does is maintain the commit order of items from transaction commit
> through the CIL and into the AIL. That is, if a transaction logs
> item A before item B in a modification, then they will be inserted
> into the CIL in the order {A, B}. These items are then written into
> the iclog during checkpointing in the order {A, B}. When the
> checkpoint commits, they are supposed to be inserted into the AIL in
> the order {A, B}, and when they are pushed from the AIL, they are
> pushed in the order {A, B}.
> 
> If we crash, log recovery then replays the two items from the
> checkpoint in the order {A, B}, resulting in the objects the items
> apply to being queued for writeback at the end of the checkpoint
> in the order {A, B}. This means recovery behaves the same way as the
> runtime code.
> 
> In places, we have subtle dependencies on this ordering being
> maintained. One of this place is performing intent recovery from the
> log. It assumes that recovering an intent will result in a
> non-intent object being the first thing that is modified in the
> recovery transaction, and so when the transaction commits and the
> journal flushes, the first object inserted into the AIL beyond the
> intent recovery range will be a non-intent item.  It uses the
> transistion from intent items to non-intent items to stop the
> recovery pass.
> 
> A recent log recovery issue indicated that an intent was appearing
> as the first item in the AIL beyond the recovery range, hence
> breaking the end of recovery detection that exists.
> 
> Tracing indicated insertion of the items into the AIL was apparently
> occurring in the right order (the intent was last in the commit item
> list), but the intent was appearing first in the AIL. IOWs, the
> order of items in the AIL was {D,C,B,A}, not {A,B,C,D}, and bulk
> insertion was reversing the order of the items in the batch of items
> being inserted.
> 
> Lucky for us, all the items fed to bulk insertion have the same LSN,
> so the reversal of order does not affect the log head/tail tracking
> that is based on the contents of the AIL. It only impacts on code
> that has implicit, subtle dependencies on object order, and AFAICT
> only the intent recovery loop is impacted by it.
> 
> Make sure bulk AIL insertion does not reorder items incorrectly.
> 
> Fixes: 0e57f6a36f9b ("xfs: bulk AIL insertion during transaction commit")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Coulda sworn I already RVBd this but cannot find any evidence I actually
did.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_trans_ail.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 7d4109af193e..1098452e7f95 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -823,7 +823,7 @@ xfs_trans_ail_update_bulk(
>  			trace_xfs_ail_insert(lip, 0, lsn);
>  		}
>  		lip->li_lsn = lsn;
> -		list_add(&lip->li_ail, &tmp);
> +		list_add_tail(&lip->li_ail, &tmp);
>  	}
>  
>  	if (!list_empty(&tmp))
> -- 
> 2.40.1
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 7d4109af193e..1098452e7f95 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -823,7 +823,7 @@  xfs_trans_ail_update_bulk(
 			trace_xfs_ail_insert(lip, 0, lsn);
 		}
 		lip->li_lsn = lsn;
-		list_add(&lip->li_ail, &tmp);
+		list_add_tail(&lip->li_ail, &tmp);
 	}
 
 	if (!list_empty(&tmp))