diff mbox series

[4/7] xfs: async CIL flushes need pending pushes to be made stable

Message ID 20220315064241.3133751-5-david@fromorbit.com (mailing list archive)
State Superseded
Headers show
Series xfs: log recovery fixes | expand

Commit Message

Dave Chinner March 15, 2022, 6:42 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When the AIL tries to flush the CIL, it relies on the CIL push
ending up on stable storage without having to wait for and
manipulate iclog state directly. However, if there is already a
pending CIL push when the AIL tries to flush the CIL, it won't set
the cil->xc_push_commit_stable flag and so the CIL push will not
actively flush the commit record iclog.

generic/530 when run on a single CPU test VM can trigger this fairly
reliably. This test exercises unlinked inode recovery, and can
result in inodes being pinned in memory by ongoing modifications to
the inode cluster buffer to record unlinked list modifications. As a
result, the first inode unlinked in a buffer can pin the tail of the
log whilst the inode cluster buffer is pinned by the current
checkpoint that has been pushed but isn't on stable storage because
because the cil->xc_push_commit_stable was not set. This results in
the log/AIL effectively deadlocking until something triggers the
commit record iclog to be pushed to stable storage (i.e. the
periodic log worker calling xfs_log_force()).

The fix is two-fold - first we should always set the
cil->xc_push_commit_stable when xlog_cil_flush() is called,
regardless of whether there is already a pending push or not.

Second, if the CIL is empty, we should trigger an iclog flush to
ensure that the iclogs of the last checkpoint have actually been
submitted to disk as that checkpoint may not have been run under
stable completion constraints.

Reported-and-tested-by: Matthew Wilcox <willy@infradead.org>
Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

Comments

Darrick J. Wong March 15, 2022, 7:36 p.m. UTC | #1
On Tue, Mar 15, 2022 at 05:42:38PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the AIL tries to flush the CIL, it relies on the CIL push
> ending up on stable storage without having to wait for and
> manipulate iclog state directly. However, if there is already a
> pending CIL push when the AIL tries to flush the CIL, it won't set
> the cil->xc_push_commit_stable flag and so the CIL push will not
> actively flush the commit record iclog.
> 
> generic/530 when run on a single CPU test VM can trigger this fairly
> reliably. This test exercises unlinked inode recovery, and can
> result in inodes being pinned in memory by ongoing modifications to
> the inode cluster buffer to record unlinked list modifications. As a
> result, the first inode unlinked in a buffer can pin the tail of the
> log whilst the inode cluster buffer is pinned by the current
> checkpoint that has been pushed but isn't on stable storage because
> because the cil->xc_push_commit_stable was not set. This results in
> the log/AIL effectively deadlocking until something triggers the
> commit record iclog to be pushed to stable storage (i.e. the
> periodic log worker calling xfs_log_force()).
> 
> The fix is two-fold - first we should always set the
> cil->xc_push_commit_stable when xlog_cil_flush() is called,
> regardless of whether there is already a pending push or not.
> 
> Second, if the CIL is empty, we should trigger an iclog flush to
> ensure that the iclogs of the last checkpoint have actually been
> submitted to disk as that checkpoint may not have been run under
> stable completion constraints.

Can it ever be the case that the CIL is not empty but the last
checkpoint wasn't committed to disk?  Let's say someone else
commits a transaction after the worker samples xc_push_commit_stable?

IOWs, why does a not-empty CIL mean that the last checkpoint is on disk?

> Reported-and-tested-by: Matthew Wilcox <willy@infradead.org>

MMMM testing. :D

--D

> Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 3d8ebf2a1e55..48b16a5feb27 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -1369,18 +1369,27 @@ xlog_cil_push_now(
>  	if (!async)
>  		flush_workqueue(cil->xc_push_wq);
>  
> +	spin_lock(&cil->xc_push_lock);
> +
> +	/*
> +	 * If this is an async flush request, we always need to set the
> +	 * xc_push_commit_stable flag even if something else has already queued
> +	 * a push. The flush caller is asking for the CIL to be on stable
> +	 * storage when the next push completes, so regardless of who has queued
> +	 * the push, the flush requires stable semantics from it.
> +	 */
> +	cil->xc_push_commit_stable = async;
> +
>  	/*
>  	 * If the CIL is empty or we've already pushed the sequence then
> -	 * there's no work we need to do.
> +	 * there's no more work that we need to do.
>  	 */
> -	spin_lock(&cil->xc_push_lock);
>  	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
>  		spin_unlock(&cil->xc_push_lock);
>  		return;
>  	}
>  
>  	cil->xc_push_seq = push_seq;
> -	cil->xc_push_commit_stable = async;
>  	queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work);
>  	spin_unlock(&cil->xc_push_lock);
>  }
> @@ -1520,6 +1529,13 @@ xlog_cil_flush(
>  
>  	trace_xfs_log_force(log->l_mp, seq, _RET_IP_);
>  	xlog_cil_push_now(log, seq, true);
> +
> +	/*
> +	 * If the CIL is empty, make sure that any previous checkpoint that may
> +	 * still be in an active iclog is pushed to stable storage.
> +	 */
> +	if (list_empty(&log->l_cilp->xc_cil))
> +		xfs_log_force(log->l_mp, 0);
>  }
>  
>  /*
> -- 
> 2.35.1
>
Dave Chinner March 15, 2022, 9:47 p.m. UTC | #2
On Tue, Mar 15, 2022 at 12:36:24PM -0700, Darrick J. Wong wrote:
> On Tue, Mar 15, 2022 at 05:42:38PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When the AIL tries to flush the CIL, it relies on the CIL push
> > ending up on stable storage without having to wait for and
> > manipulate iclog state directly. However, if there is already a
> > pending CIL push when the AIL tries to flush the CIL, it won't set
> > the cil->xc_push_commit_stable flag and so the CIL push will not
> > actively flush the commit record iclog.
> > 
> > generic/530 when run on a single CPU test VM can trigger this fairly
> > reliably. This test exercises unlinked inode recovery, and can
> > result in inodes being pinned in memory by ongoing modifications to
> > the inode cluster buffer to record unlinked list modifications. As a
> > result, the first inode unlinked in a buffer can pin the tail of the
> > log whilst the inode cluster buffer is pinned by the current
> > checkpoint that has been pushed but isn't on stable storage because
> > because the cil->xc_push_commit_stable was not set. This results in
> > the log/AIL effectively deadlocking until something triggers the
> > commit record iclog to be pushed to stable storage (i.e. the
> > periodic log worker calling xfs_log_force()).
> > 
> > The fix is two-fold - first we should always set the
> > cil->xc_push_commit_stable when xlog_cil_flush() is called,
> > regardless of whether there is already a pending push or not.
> > 
> > Second, if the CIL is empty, we should trigger an iclog flush to
> > ensure that the iclogs of the last checkpoint have actually been
> > submitted to disk as that checkpoint may not have been run under
> > stable completion constraints.
> 
> Can it ever be the case that the CIL is not empty but the last
> checkpoint wasn't committed to disk?

Yes. But xlog_cil_push_now() will capture that, queue it and mark
it as xc_push_commit_stable. 

Remember that the push_now() code updates the push seq/stable
state under down_read(ctx lock) + spin_lock(push lock) context. The
push seq/stable state is cleared by the push worker under
down_write(ctx lock) + spin_lock(push lock) conditions when it
atomically swaps in the new empty CIL context. Hence the push worker
will always see the stable flag if it has been set for that push
sequence.

> Let's say someone else
> commits a transaction after the worker samples xc_push_commit_stable?

If we race with commits between the xlog_cil_push_now(log, seq,
true) and the CIL list_empty check in xlog_cil_flush(), there are
two posibilities:

1. the CIL push worker hasn't run and atomically switched in a new
CIL context.
2. the CIL push worker has run and switched contexts

In the first case, the commit will end up in the same context that
xlog_cil_flush() pushed, and it will be stable. That will result in
an empty CIL after the CIL push worker runs, but the racing commit
will be stable as per the xc_push_commit_stable flag. This can also
lead to the CIL being empty by the time the list_empty check is done
(because pre-empt), in which case the log force will be a no-op
because none of the iclogs need flushing.

> IOWs, why does a not-empty CIL mean that the last checkpoint is on disk?

In the second case, the CIL push triggered by xlog_cil_push_now()
will be stable because xc_push_commit_stable says it must be, and
the racing commit will end up in the new CIL context and the CIL
won't be empty. We don't need a log force in this case because the
previous sequence that was flushed with stable semantics as required.

In the case of AIL pushing, we don't actually care about racing CIL
commits because we are trying to get pinned AIL items unpinned so we
can move the tail of the log forwards. If those pinned items are
relogged by racing transactions, then the next call to
xlog_cil_flush() from the AIL will get them unpinned and that will
move them forward in the log, anyway.

Cheers,

Dave.
Darrick J. Wong March 16, 2022, 2 a.m. UTC | #3
On Wed, Mar 16, 2022 at 08:47:45AM +1100, Dave Chinner wrote:
> On Tue, Mar 15, 2022 at 12:36:24PM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 15, 2022 at 05:42:38PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When the AIL tries to flush the CIL, it relies on the CIL push
> > > ending up on stable storage without having to wait for and
> > > manipulate iclog state directly. However, if there is already a
> > > pending CIL push when the AIL tries to flush the CIL, it won't set
> > > the cil->xc_push_commit_stable flag and so the CIL push will not
> > > actively flush the commit record iclog.
> > > 
> > > generic/530 when run on a single CPU test VM can trigger this fairly
> > > reliably. This test exercises unlinked inode recovery, and can
> > > result in inodes being pinned in memory by ongoing modifications to
> > > the inode cluster buffer to record unlinked list modifications. As a
> > > result, the first inode unlinked in a buffer can pin the tail of the
> > > log whilst the inode cluster buffer is pinned by the current
> > > checkpoint that has been pushed but isn't on stable storage because
> > > because the cil->xc_push_commit_stable was not set. This results in
> > > the log/AIL effectively deadlocking until something triggers the
> > > commit record iclog to be pushed to stable storage (i.e. the
> > > periodic log worker calling xfs_log_force()).
> > > 
> > > The fix is two-fold - first we should always set the
> > > cil->xc_push_commit_stable when xlog_cil_flush() is called,
> > > regardless of whether there is already a pending push or not.
> > > 
> > > Second, if the CIL is empty, we should trigger an iclog flush to
> > > ensure that the iclogs of the last checkpoint have actually been
> > > submitted to disk as that checkpoint may not have been run under
> > > stable completion constraints.
> > 
> > Can it ever be the case that the CIL is not empty but the last
> > checkpoint wasn't committed to disk?
> 
> Yes. But xlog_cil_push_now() will capture that, queue it and mark
> it as xc_push_commit_stable. 
> 
> Remember that the push_now() code updates the push seq/stable
> state under down_read(ctx lock) + spin_lock(push lock) context. The
> push seq/stable state is cleared by the push worker under
> down_write(ctx lock) + spin_lock(push lock) conditions when it
> atomically swaps in the new empty CIL context. Hence the push worker
> will always see the stable flag if it has been set for that push
> sequence.
> 
> > Let's say someone else
> > commits a transaction after the worker samples xc_push_commit_stable?
> 
> If we race with commits between the xlog_cil_push_now(log, seq,
> true) and the CIL list_empty check in xlog_cil_flush(), there are
> two posibilities:
> 
> 1. the CIL push worker hasn't run and atomically switched in a new
> CIL context.
> 2. the CIL push worker has run and switched contexts
> 
> In the first case, the commit will end up in the same context that
> xlog_cil_flush() pushed, and it will be stable. That will result in
> an empty CIL after the CIL push worker runs, but the racing commit
> will be stable as per the xc_push_commit_stable flag. This can also
> lead to the CIL being empty by the time the list_empty check is done
> (because pre-empt), in which case the log force will be a no-op
> because none of the iclogs need flushing.
> 
> > IOWs, why does a not-empty CIL mean that the last checkpoint is on disk?
> 
> In the second case, the CIL push triggered by xlog_cil_push_now()
> will be stable because xc_push_commit_stable says it must be, and
> the racing commit will end up in the new CIL context and the CIL
> won't be empty. We don't need a log force in this case because the
> previous sequence that was flushed with stable semantics as required.
> 
> In the case of AIL pushing, we don't actually care about racing CIL
> commits because we are trying to get pinned AIL items unpinned so we
> can move the tail of the log forwards. If those pinned items are
> relogged by racing transactions, then the next call to
> xlog_cil_flush() from the AIL will get them unpinned and that will
> move them forward in the log, anyway.

Ok, that's kinda what I was thinking, but wasn't sure there wasn't some
other weird subtlety that I hadn't figured out.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


--D

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
Chandan Babu R March 16, 2022, 10:34 a.m. UTC | #4
On 15 Mar 2022 at 12:12, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When the AIL tries to flush the CIL, it relies on the CIL push
> ending up on stable storage without having to wait for and
> manipulate iclog state directly. However, if there is already a
> pending CIL push when the AIL tries to flush the CIL, it won't set
> the cil->xc_push_commit_stable flag and so the CIL push will not
> actively flush the commit record iclog.

I think the above sentence maps to the following snippet from
xlog_cil_push_now(),

	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
		spin_unlock(&cil->xc_push_lock);
		return;
	}

i.e. if the CIL sequence that we are trying to push is already being pushed
then xlog_cil_push_now() returns without queuing work on cil->xc_push_wq.

However, the push_seq could have been previously pushed by,
1. xfsaild_push()
   In this case, cil->xc_push_commit_stable is set to true. Hence,
   xlog_cil_push_work() will definitely make sure to submit the commit record
   iclog for write I/O.
2. xfs_log_force_seq() => xlog_cil_force_seq()
   xfs_log_force_seq() invokes xlog_force_lsn() after executing
   xlog_cil_force_seq(). Here, A partially filled iclog will be in
   XLOG_STATE_ACTIVE state. This will cause xlog_force_and_check_iclog() to be
   invoked and hence the iclog is submitted for write I/O.

In both the cases listed above, iclog is guaranteed to be submitted for I/O
without any help from the log worker task.

Looks like I am missing something obvious here.

>
> generic/530 when run on a single CPU test VM can trigger this fairly
> reliably. This test exercises unlinked inode recovery, and can
> result in inodes being pinned in memory by ongoing modifications to
> the inode cluster buffer to record unlinked list modifications. As a
> result, the first inode unlinked in a buffer can pin the tail of the
> log whilst the inode cluster buffer is pinned by the current
> checkpoint that has been pushed but isn't on stable storage because
> because the cil->xc_push_commit_stable was not set. This results in
> the log/AIL effectively deadlocking until something triggers the
> commit record iclog to be pushed to stable storage (i.e. the
> periodic log worker calling xfs_log_force()).
>
> The fix is two-fold - first we should always set the
> cil->xc_push_commit_stable when xlog_cil_flush() is called,
> regardless of whether there is already a pending push or not.
>
> Second, if the CIL is empty, we should trigger an iclog flush to
> ensure that the iclogs of the last checkpoint have actually been
> submitted to disk as that checkpoint may not have been run under
> stable completion constraints.
>
> Reported-and-tested-by: Matthew Wilcox <willy@infradead.org>
> Fixes: 0020a190cf3e ("xfs: AIL needs asynchronous CIL forcing")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 3d8ebf2a1e55..48b16a5feb27 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -1369,18 +1369,27 @@ xlog_cil_push_now(
>  	if (!async)
>  		flush_workqueue(cil->xc_push_wq);
>  
> +	spin_lock(&cil->xc_push_lock);
> +
> +	/*
> +	 * If this is an async flush request, we always need to set the
> +	 * xc_push_commit_stable flag even if something else has already queued
> +	 * a push. The flush caller is asking for the CIL to be on stable
> +	 * storage when the next push completes, so regardless of who has queued
> +	 * the push, the flush requires stable semantics from it.
> +	 */
> +	cil->xc_push_commit_stable = async;
> +
>  	/*
>  	 * If the CIL is empty or we've already pushed the sequence then
> -	 * there's no work we need to do.
> +	 * there's no more work that we need to do.
>  	 */
> -	spin_lock(&cil->xc_push_lock);
>  	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
>  		spin_unlock(&cil->xc_push_lock);
>  		return;
>  	}
>  
>  	cil->xc_push_seq = push_seq;
> -	cil->xc_push_commit_stable = async;
>  	queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work);
>  	spin_unlock(&cil->xc_push_lock);
>  }
> @@ -1520,6 +1529,13 @@ xlog_cil_flush(
>  
>  	trace_xfs_log_force(log->l_mp, seq, _RET_IP_);
>  	xlog_cil_push_now(log, seq, true);
> +
> +	/*
> +	 * If the CIL is empty, make sure that any previous checkpoint that may
> +	 * still be in an active iclog is pushed to stable storage.
> +	 */
> +	if (list_empty(&log->l_cilp->xc_cil))
> +		xfs_log_force(log->l_mp, 0);
>  }
>  
>  /*
Dave Chinner March 16, 2022, 11:24 p.m. UTC | #5
On Wed, Mar 16, 2022 at 04:04:55PM +0530, Chandan Babu R wrote:
> On 15 Mar 2022 at 12:12, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > When the AIL tries to flush the CIL, it relies on the CIL push
> > ending up on stable storage without having to wait for and
> > manipulate iclog state directly. However, if there is already a
> > pending CIL push when the AIL tries to flush the CIL, it won't set
> > the cil->xc_push_commit_stable flag and so the CIL push will not
> > actively flush the commit record iclog.
> 
> I think the above sentence maps to the following snippet from
> xlog_cil_push_now(),
> 
> 	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
> 		spin_unlock(&cil->xc_push_lock);
> 		return;
> 	}
> 
> i.e. if the CIL sequence that we are trying to push is already being pushed
> then xlog_cil_push_now() returns without queuing work on cil->xc_push_wq.
> 
> However, the push_seq could have been previously pushed by,
> 1. xfsaild_push()
>    In this case, cil->xc_push_commit_stable is set to true. Hence,
>    xlog_cil_push_work() will definitely make sure to submit the commit record
>    iclog for write I/O.
> 2. xfs_log_force_seq() => xlog_cil_force_seq()
>    xfs_log_force_seq() invokes xlog_force_lsn() after executing
>    xlog_cil_force_seq(). Here, A partially filled iclog will be in
>    XLOG_STATE_ACTIVE state. This will cause xlog_force_and_check_iclog() to be
>    invoked and hence the iclog is submitted for write I/O.
> 
> In both the cases listed above, iclog is guaranteed to be submitted for I/O
> without any help from the log worker task.
> 
> Looks like I am missing something obvious here.

Pushes triggered by xlog_cil_push_background() can complete leaving
the partially filled iclog in ACTIVE state. Then xlog_cil_push_now()
does nothing because it doesn't trigger a new CIL push and so
setting the cil->xc_push_commit_stable flag doesn't trigger a flush
of the ACTIVE iclog.

The AIL flush does not use xfs_log_force_seq() because that blocks
waiting for the entire CIL to hit the disk before it can force the
last iclog to disk. Hence the second piece of this patch is
necessary, and that is to call xfs_log_force() if the CIL is empty
(i.e. the case where xlog_cil_push_now() is a no-op because the
CIL is empty due to background pushes).


Cheers,

Dave.
Chandan Babu R March 17, 2022, 6:49 a.m. UTC | #6
On 17 Mar 2022 at 04:54, Dave Chinner wrote:
> On Wed, Mar 16, 2022 at 04:04:55PM +0530, Chandan Babu R wrote:
>> On 15 Mar 2022 at 12:12, Dave Chinner wrote:
>> > From: Dave Chinner <dchinner@redhat.com>
>> >
>> > When the AIL tries to flush the CIL, it relies on the CIL push
>> > ending up on stable storage without having to wait for and
>> > manipulate iclog state directly. However, if there is already a
>> > pending CIL push when the AIL tries to flush the CIL, it won't set
>> > the cil->xc_push_commit_stable flag and so the CIL push will not
>> > actively flush the commit record iclog.
>> 
>> I think the above sentence maps to the following snippet from
>> xlog_cil_push_now(),
>> 
>> 	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
>> 		spin_unlock(&cil->xc_push_lock);
>> 		return;
>> 	}
>> 
>> i.e. if the CIL sequence that we are trying to push is already being pushed
>> then xlog_cil_push_now() returns without queuing work on cil->xc_push_wq.
>> 
>> However, the push_seq could have been previously pushed by,
>> 1. xfsaild_push()
>>    In this case, cil->xc_push_commit_stable is set to true. Hence,
>>    xlog_cil_push_work() will definitely make sure to submit the commit record
>>    iclog for write I/O.
>> 2. xfs_log_force_seq() => xlog_cil_force_seq()
>>    xfs_log_force_seq() invokes xlog_force_lsn() after executing
>>    xlog_cil_force_seq(). Here, A partially filled iclog will be in
>>    XLOG_STATE_ACTIVE state. This will cause xlog_force_and_check_iclog() to be
>>    invoked and hence the iclog is submitted for write I/O.
>> 
>> In both the cases listed above, iclog is guaranteed to be submitted for I/O
>> without any help from the log worker task.
>> 
>> Looks like I am missing something obvious here.
>
> Pushes triggered by xlog_cil_push_background() can complete leaving
> the partially filled iclog in ACTIVE state. Then xlog_cil_push_now()
> does nothing because it doesn't trigger a new CIL push and so
> setting the cil->xc_push_commit_stable flag doesn't trigger a flush
> of the ACTIVE iclog.

Ah. I had missed xlog_cil_push_background().

>
> The AIL flush does not use xfs_log_force_seq() because that blocks
> waiting for the entire CIL to hit the disk before it can force the
> last iclog to disk. Hence the second piece of this patch is
> necessary, and that is to call xfs_log_force() if the CIL is empty
> (i.e. the case where xlog_cil_push_now() is a no-op because the
> CIL is empty due to background pushes).
>

Thanks for clarifying my doubts.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 3d8ebf2a1e55..48b16a5feb27 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1369,18 +1369,27 @@  xlog_cil_push_now(
 	if (!async)
 		flush_workqueue(cil->xc_push_wq);
 
+	spin_lock(&cil->xc_push_lock);
+
+	/*
+	 * If this is an async flush request, we always need to set the
+	 * xc_push_commit_stable flag even if something else has already queued
+	 * a push. The flush caller is asking for the CIL to be on stable
+	 * storage when the next push completes, so regardless of who has queued
+	 * the push, the flush requires stable semantics from it.
+	 */
+	cil->xc_push_commit_stable = async;
+
 	/*
 	 * If the CIL is empty or we've already pushed the sequence then
-	 * there's no work we need to do.
+	 * there's no more work that we need to do.
 	 */
-	spin_lock(&cil->xc_push_lock);
 	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
 		spin_unlock(&cil->xc_push_lock);
 		return;
 	}
 
 	cil->xc_push_seq = push_seq;
-	cil->xc_push_commit_stable = async;
 	queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work);
 	spin_unlock(&cil->xc_push_lock);
 }
@@ -1520,6 +1529,13 @@  xlog_cil_flush(
 
 	trace_xfs_log_force(log->l_mp, seq, _RET_IP_);
 	xlog_cil_push_now(log, seq, true);
+
+	/*
+	 * If the CIL is empty, make sure that any previous checkpoint that may
+	 * still be in an active iclog is pushed to stable storage.
+	 */
+	if (list_empty(&log->l_cilp->xc_cil))
+		xfs_log_force(log->l_mp, 0);
 }
 
 /*