Message ID | 20220315064241.3133751-4-david@fromorbit.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | xfs: log recovery fixes | expand |
On 15 Mar 2022 at 12:12, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_ail_push_all_sync() has a loop like this: > > while max_ail_lsn { > prepare_to_wait(ail_empty) > target = max_ail_lsn > wake_up(ail_task); > schedule() > } > > Which is designed to sleep until the AIL is emptied. When > xfs_ail_finish_update() moves the tail of the log, it does: ... xfs_ail_update_finish() ... > > if (list_empty(&ailp->ail_head)) > wake_up_all(&ailp->ail_empty); > > So it will only wake up the sync push waiter when the AIL goes > empty. If, by the time the push waiter has woken, the AIL has more > in it, it will reset the target, wake the push task and go back to > sleep. > > The problem here is that if the AIL is having items added to it > when xfs_ail_push_all_sync() is called, then they may get inserted > into the AIL at a LSN higher than the target LSN. At this point, > xfsaild_push() will see that the target is X, the item LSNs are > (X+N) and skip over them, hence never pushing the out. > > The result of this the AIL will not get emptied by the AIL push > thread, hence xfs_ail_finish_update() will never see the AIL being > empty even if it moves the tail. Hence xfs_ail_push_all_sync() never > gets woken and hence cannot update the push target to capture the > items beyond the current target on the LSN. > > This is a TOCTOU type of issue so the way to avoid it is to not > use the push target at all for sync pushes. We know that a sync push > is being requested by the fact the ail_empty wait queue is active, > hence the xfsaild can just set the target to max_ail_lsn on every > push that we see the wait queue active. Hence we no longer will > leave items on the AIL that are beyond the LSN sampled at the start > of a sync push. > The fix seems to logically correct. Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_trans_ail.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 2a8c8dc54c95..1b52952097c1 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -448,10 +448,22 @@ xfsaild_push( > > spin_lock(&ailp->ail_lock); > > - /* barrier matches the ail_target update in xfs_ail_push() */ > - smp_rmb(); > - target = ailp->ail_target; > - ailp->ail_target_prev = target; > + /* > + * If we have a sync push waiter, we always have to push till the AIL is > + * empty. Update the target to point to the end of the AIL so that > + * capture updates that occur after the sync push waiter has gone to > + * sleep. > + */ > + if (waitqueue_active(&ailp->ail_empty)) { > + lip = xfs_ail_max(ailp); > + if (lip) > + target = lip->li_lsn; > + } else { > + /* barrier matches the ail_target update in xfs_ail_push() */ > + smp_rmb(); > + target = ailp->ail_target; > + ailp->ail_target_prev = target; > + } > > /* we're done if the AIL is empty or our push has reached the end */ > lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); > @@ -724,7 +736,6 @@ xfs_ail_push_all_sync( > spin_lock(&ailp->ail_lock); > while ((lip = xfs_ail_max(ailp)) != NULL) { > prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE); > - ailp->ail_target = lip->li_lsn; > wake_up_process(ailp->ail_task); > spin_unlock(&ailp->ail_lock); > schedule();
On Tue, Mar 15, 2022 at 05:42:37PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_ail_push_all_sync() has a loop like this: > > while max_ail_lsn { > prepare_to_wait(ail_empty) > target = max_ail_lsn > wake_up(ail_task); > schedule() > } > > Which is designed to sleep until the AIL is emptied. When > xfs_ail_finish_update() moves the tail of the log, it does: > > if (list_empty(&ailp->ail_head)) > wake_up_all(&ailp->ail_empty); > > So it will only wake up the sync push waiter when the AIL goes > empty. If, by the time the push waiter has woken, the AIL has more > in it, it will reset the target, wake the push task and go back to > sleep. > > The problem here is that if the AIL is having items added to it > when xfs_ail_push_all_sync() is called, then they may get inserted > into the AIL at a LSN higher than the target LSN. At this point, > xfsaild_push() will see that the target is X, the item LSNs are > (X+N) and skip over them, hence never pushing the out. > > The result of this the AIL will not get emptied by the AIL push > thread, hence xfs_ail_finish_update() will never see the AIL being > empty even if it moves the tail. Hence xfs_ail_push_all_sync() never > gets woken and hence cannot update the push target to capture the > items beyond the current target on the LSN. > > This is a TOCTOU type of issue so the way to avoid it is to not > use the push target at all for sync pushes. We know that a sync push > is being requested by the fact the ail_empty wait queue is active, > hence the xfsaild can just set the target to max_ail_lsn on every > push that we see the wait queue active. Hence we no longer will > leave items on the AIL that are beyond the LSN sampled at the start > of a sync push. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_trans_ail.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 2a8c8dc54c95..1b52952097c1 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -448,10 +448,22 @@ xfsaild_push( > > spin_lock(&ailp->ail_lock); > > - /* barrier matches the ail_target update in xfs_ail_push() */ > - smp_rmb(); > - target = ailp->ail_target; > - ailp->ail_target_prev = target; > + /* > + * If we have a sync push waiter, we always have to push till the AIL is > + * empty. Update the target to point to the end of the AIL so that > + * capture updates that occur after the sync push waiter has gone to > + * sleep. > + */ > + if (waitqueue_active(&ailp->ail_empty)) { > + lip = xfs_ail_max(ailp); > + if (lip) > + target = lip->li_lsn; > + } else { > + /* barrier matches the ail_target update in xfs_ail_push() */ > + smp_rmb(); Doesn't the spin_lock provide the required rmb? I think it's unnecessary given that, but I also don't think it hurts anything, so: Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + target = ailp->ail_target; > + ailp->ail_target_prev = target; > + } > > /* we're done if the AIL is empty or our push has reached the end */ > lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); > @@ -724,7 +736,6 @@ xfs_ail_push_all_sync( > spin_lock(&ailp->ail_lock); > while ((lip = xfs_ail_max(ailp)) != NULL) { > prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE); > - ailp->ail_target = lip->li_lsn; > wake_up_process(ailp->ail_task); > spin_unlock(&ailp->ail_lock); > schedule(); > -- > 2.35.1 >
On Tue, Mar 15, 2022 at 12:17:35PM -0700, Darrick J. Wong wrote: > On Tue, Mar 15, 2022 at 05:42:37PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > xfs_ail_push_all_sync() has a loop like this: > > > > while max_ail_lsn { > > prepare_to_wait(ail_empty) > > target = max_ail_lsn > > wake_up(ail_task); > > schedule() > > } > > > > Which is designed to sleep until the AIL is emptied. When > > xfs_ail_finish_update() moves the tail of the log, it does: > > > > if (list_empty(&ailp->ail_head)) > > wake_up_all(&ailp->ail_empty); > > > > So it will only wake up the sync push waiter when the AIL goes > > empty. If, by the time the push waiter has woken, the AIL has more > > in it, it will reset the target, wake the push task and go back to > > sleep. > > > > The problem here is that if the AIL is having items added to it > > when xfs_ail_push_all_sync() is called, then they may get inserted > > into the AIL at a LSN higher than the target LSN. At this point, > > xfsaild_push() will see that the target is X, the item LSNs are > > (X+N) and skip over them, hence never pushing the out. > > > > The result of this the AIL will not get emptied by the AIL push > > thread, hence xfs_ail_finish_update() will never see the AIL being > > empty even if it moves the tail. Hence xfs_ail_push_all_sync() never > > gets woken and hence cannot update the push target to capture the > > items beyond the current target on the LSN. > > > > This is a TOCTOU type of issue so the way to avoid it is to not > > use the push target at all for sync pushes. We know that a sync push > > is being requested by the fact the ail_empty wait queue is active, > > hence the xfsaild can just set the target to max_ail_lsn on every > > push that we see the wait queue active. Hence we no longer will > > leave items on the AIL that are beyond the LSN sampled at the start > > of a sync push. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_trans_ail.c | 21 ++++++++++++++++----- > > 1 file changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > index 2a8c8dc54c95..1b52952097c1 100644 > > --- a/fs/xfs/xfs_trans_ail.c > > +++ b/fs/xfs/xfs_trans_ail.c > > @@ -448,10 +448,22 @@ xfsaild_push( > > > > spin_lock(&ailp->ail_lock); > > > > - /* barrier matches the ail_target update in xfs_ail_push() */ > > - smp_rmb(); > > - target = ailp->ail_target; > > - ailp->ail_target_prev = target; > > + /* > > + * If we have a sync push waiter, we always have to push till the AIL is > > + * empty. Update the target to point to the end of the AIL so that > > + * capture updates that occur after the sync push waiter has gone to > > + * sleep. > > + */ > > + if (waitqueue_active(&ailp->ail_empty)) { > > + lip = xfs_ail_max(ailp); > > + if (lip) > > + target = lip->li_lsn; > > + } else { > > + /* barrier matches the ail_target update in xfs_ail_push() */ > > + smp_rmb(); > > Doesn't the spin_lock provide the required rmb? I think it's > unnecessary given that, but I also don't think it hurts anything, so: No. xfs_ail_push() does not take the ail_lock to update ail->ail_target on 64 bit systems(*). Spin locks only provide memory barriers between critical sections within the lock/unlock calls, and even then the barrier is in the unlock -> lock direction only. i.e. what is written before the unlock in one critical section is guaranteed to be read after the lock that starts the next critical section. Instead, xfs_ail_push() has smp_wmb() calls around setting the target to ensure that all ail state updates done -before the wmb- are seen by reads done -after the rmb- above. These memory barriers could probably be replaced with a smp_store_release() and smp_load_acquire() pair, because that is effectively what they are implementing but the implementation predates those primitives. OTOH, we don't need a rmb before the new waitqueue_active check because all the waitqueue manipulations are done under the ail_lock. Hence the ail_lock provides the memory barriers for that branch. IOWs, the smp_rmb() is still necessary for the lockless xfs_ail_push() update path, just like it was before this patch. > Reviewed-by: Darrick J. Wong <djwong@kernel.org> Thanks! Cheers, Dave.
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 2a8c8dc54c95..1b52952097c1 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -448,10 +448,22 @@ xfsaild_push( spin_lock(&ailp->ail_lock); - /* barrier matches the ail_target update in xfs_ail_push() */ - smp_rmb(); - target = ailp->ail_target; - ailp->ail_target_prev = target; + /* + * If we have a sync push waiter, we always have to push till the AIL is + * empty. Update the target to point to the end of the AIL so that + * capture updates that occur after the sync push waiter has gone to + * sleep. + */ + if (waitqueue_active(&ailp->ail_empty)) { + lip = xfs_ail_max(ailp); + if (lip) + target = lip->li_lsn; + } else { + /* barrier matches the ail_target update in xfs_ail_push() */ + smp_rmb(); + target = ailp->ail_target; + ailp->ail_target_prev = target; + } /* we're done if the AIL is empty or our push has reached the end */ lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn); @@ -724,7 +736,6 @@ xfs_ail_push_all_sync( spin_lock(&ailp->ail_lock); while ((lip = xfs_ail_max(ailp)) != NULL) { prepare_to_wait(&ailp->ail_empty, &wait, TASK_UNINTERRUPTIBLE); - ailp->ail_target = lip->li_lsn; wake_up_process(ailp->ail_task); spin_unlock(&ailp->ail_lock); schedule();