diff mbox series

[v2] xfs: up(ic_sema) if flushing data device fails

Message ID 20231025225848.3437904-1-leah.rumancik@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] xfs: up(ic_sema) if flushing data device fails | expand

Commit Message

Leah Rumancik Oct. 25, 2023, 10:58 p.m. UTC
We flush the data device cache before we issue external log IO. If
the flush fails, we shut down the log immediately and return. However,
the iclog->ic_sema is left in a decremented state so let's add an up().
Prior to this patch, xfs/438 would fail consistently when running with
an external log device:

sync
  -> xfs_log_force
  -> xlog_write_iclog
      -> down(&iclog->ic_sema)
      -> blkdev_issue_flush (fail causes us to intiate shutdown)
          -> xlog_force_shutdown
          -> return

unmount
  -> xfs_log_umount
      -> xlog_wait_iclog_completion
          -> down(&iclog->ic_sema) --------> HANG

There is a second early return / shutdown. Make sure the up() happens
for it as well.

Fixes: b5d721eaae47 ("xfs: external logs need to flush data device")
Fixes: 842a42d126b4 ("xfs: shutdown on failure to add page to log bio")
Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices")
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---

Notes:
    v1->v2:
     - use goto instead of multiple returns
     - add Fixes tags

 fs/xfs/xfs_log.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Dave Chinner Oct. 26, 2023, 12:41 a.m. UTC | #1
On Wed, Oct 25, 2023 at 03:58:48PM -0700, Leah Rumancik wrote:
> We flush the data device cache before we issue external log IO. If
> the flush fails, we shut down the log immediately and return. However,
> the iclog->ic_sema is left in a decremented state so let's add an up().
> Prior to this patch, xfs/438 would fail consistently when running with
> an external log device:
> 
> sync
>   -> xfs_log_force
>   -> xlog_write_iclog
>       -> down(&iclog->ic_sema)
>       -> blkdev_issue_flush (fail causes us to intiate shutdown)
>           -> xlog_force_shutdown
>           -> return
> 
> unmount
>   -> xfs_log_umount
>       -> xlog_wait_iclog_completion
>           -> down(&iclog->ic_sema) --------> HANG
> 
> There is a second early return / shutdown. Make sure the up() happens
> for it as well.
> 
> Fixes: b5d721eaae47 ("xfs: external logs need to flush data device")
> Fixes: 842a42d126b4 ("xfs: shutdown on failure to add page to log bio")
> Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices")
> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
> 
> Notes:
>     v1->v2:
>      - use goto instead of multiple returns
>      - add Fixes tags
> 
>  fs/xfs/xfs_log.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 51c100c86177..8ae923e00eb6 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1925,20 +1925,17 @@ xlog_write_iclog(
>  		 * avoid shutdown re-entering this path and erroring out again.
>  		 */
>  		if (log->l_targ != log->l_mp->m_ddev_targp &&
> -		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) {
> -			xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> -			return;
> -		}
> +		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev))
> +			goto shutdown;
>  	}
>  	if (iclog->ic_flags & XLOG_ICL_NEED_FUA)
>  		iclog->ic_bio.bi_opf |= REQ_FUA;
>  
>  	iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
>  
> -	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) {
> -		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> -		return;
> -	}
> +	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count))
> +		goto shutdown;
> +
>  	if (is_vmalloc_addr(iclog->ic_data))
>  		flush_kernel_vmap_range(iclog->ic_data, count);
>  
> @@ -1959,6 +1956,10 @@ xlog_write_iclog(
>  	}
>  
>  	submit_bio(&iclog->ic_bio);
> +	return;
> +shutdown:
> +	up(&iclog->ic_sema);
> +	xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);

Hmmmm. So we'll leave an unreferenced, unlocked iclog in a SYNCING
state where something else can access it, then hope that the
shutdown gets to it first and that it cleans it all up? That doesn't
seem completely safe to me.

At entry to this function, if the log is already shut down, it runs
xlog_state_done_syncing() to force the iclog and it's attached state
to be cleaned up before dropping the lock and returning.

If I look at xlog_ioend_work(), if it gets an error it does the
shutdown, then calls xlog_state_done_syncing() to clean up the iclog
and attached state, then drops the lock.

Hence it appears to me that the error handling for a fatal errors in
IO submission should match the io completion error handling. i.e the
error stack for this function should look like this:

....
	if (xlog_is_shutdown(log))
		goto out_done_sync;
....
	if (error)
		goto out_do_shutdown;
....
out_do_shutdown:
	xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
out_done_sync:
	xlog_state_done_syncing(iclog);
	up(&iclog->ic_sema);
}

Thoughts?

-Dave.
Leah Rumancik Oct. 27, 2023, 11:11 p.m. UTC | #2
On Thu, Oct 26, 2023 at 11:41:03AM +1100, Dave Chinner wrote:
> Hmmmm. So we'll leave an unreferenced, unlocked iclog in a SYNCING
> state where something else can access it, then hope that the
> shutdown gets to it first and that it cleans it all up? That doesn't
> seem completely safe to me.
> 
> At entry to this function, if the log is already shut down, it runs
> xlog_state_done_syncing() to force the iclog and it's attached state
> to be cleaned up before dropping the lock and returning.
> 
> If I look at xlog_ioend_work(), if it gets an error it does the
> shutdown, then calls xlog_state_done_syncing() to clean up the iclog
> and attached state, then drops the lock.
> 
> Hence it appears to me that the error handling for a fatal errors in
> IO submission should match the io completion error handling. i.e the
> error stack for this function should look like this:
> 
> ....
> 	if (xlog_is_shutdown(log))
> 		goto out_done_sync;
> ....
> 	if (error)
> 		goto out_do_shutdown;
> ....
> out_do_shutdown:
> 	xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> out_done_sync:
> 	xlog_state_done_syncing(iclog);
> 	up(&iclog->ic_sema);
> }
> 
> Thoughts?
Sure, that makes sense to me and will make things more consistent. I'll
send out a new version. Thanks for the review!

- Leah

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 51c100c86177..8ae923e00eb6 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1925,20 +1925,17 @@  xlog_write_iclog(
 		 * avoid shutdown re-entering this path and erroring out again.
 		 */
 		if (log->l_targ != log->l_mp->m_ddev_targp &&
-		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) {
-			xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
-			return;
-		}
+		    blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev))
+			goto shutdown;
 	}
 	if (iclog->ic_flags & XLOG_ICL_NEED_FUA)
 		iclog->ic_bio.bi_opf |= REQ_FUA;
 
 	iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
 
-	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) {
-		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
-		return;
-	}
+	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count))
+		goto shutdown;
+
 	if (is_vmalloc_addr(iclog->ic_data))
 		flush_kernel_vmap_range(iclog->ic_data, count);
 
@@ -1959,6 +1956,10 @@  xlog_write_iclog(
 	}
 
 	submit_bio(&iclog->ic_bio);
+	return;
+shutdown:
+	up(&iclog->ic_sema);
+	xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
 }
 
 /*