diff mbox series

[7/6] xfs: xfs: shutdown during log recovery needs to mark the log shutdown

Message ID 20220327225534.GQ1544202@dread.disaster.area (mailing list archive)
State New, archived
Headers show
Series xfs: more shutdown/recovery fixes | expand

Commit Message

Dave Chinner March 27, 2022, 10:55 p.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When a checkpoint writeback is run by log recovery, corruption
propagated from the log can result in writeback verifiers failing
and calling xfs_force_shutdown() from
xfs_buf_delwri_submit_buffers().

This results in the mount being marked as shutdown, but the log does
not get marked as shut down because:

        /*
         * If this happens during log recovery then we aren't using the runtime
         * log mechanisms yet so there's nothing to shut down.
         */
        if (!log || xlog_in_recovery(log))
                return false;

If there are other buffers that then fail (say due to detecting the
mount shutdown), they will now hang in xfs_do_force_shutdown()
waiting for the log to shut down like this:

  __schedule+0x30d/0x9e0
  schedule+0x55/0xd0
  xfs_do_force_shutdown+0x1cd/0x200
  ? init_wait_var_entry+0x50/0x50
  xfs_buf_ioend+0x47e/0x530
  __xfs_buf_submit+0xb0/0x240
  xfs_buf_delwri_submit_buffers+0xfe/0x270
  xfs_buf_delwri_submit+0x3a/0xc0
  xlog_do_recovery_pass+0x474/0x7b0
  ? do_raw_spin_unlock+0x30/0xb0
  xlog_do_log_recovery+0x91/0x140
  xlog_do_recover+0x38/0x1e0
  xlog_recover+0xdd/0x170
  xfs_log_mount+0x17e/0x2e0
  xfs_mountfs+0x457/0x930
  xfs_fs_fill_super+0x476/0x830

xlog_force_shutdown() always needs to mark the log as shut down,
regardless of whether recovery is in progress or not, so that
multiple calls to xfs_force_shutdown() during recovery don't end
up waiting for the log to be shut down like this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Darrick J. Wong March 29, 2022, 12:37 a.m. UTC | #1
On Mon, Mar 28, 2022 at 09:55:34AM +1100, Dave Chinner wrote:
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When a checkpoint writeback is run by log recovery, corruption
> propagated from the log can result in writeback verifiers failing
> and calling xfs_force_shutdown() from
> xfs_buf_delwri_submit_buffers().
> 
> This results in the mount being marked as shutdown, but the log does
> not get marked as shut down because:
> 
>         /*
>          * If this happens during log recovery then we aren't using the runtime
>          * log mechanisms yet so there's nothing to shut down.
>          */
>         if (!log || xlog_in_recovery(log))
>                 return false;
> 
> If there are other buffers that then fail (say due to detecting the
> mount shutdown), they will now hang in xfs_do_force_shutdown()
> waiting for the log to shut down like this:
> 
>   __schedule+0x30d/0x9e0
>   schedule+0x55/0xd0
>   xfs_do_force_shutdown+0x1cd/0x200
>   ? init_wait_var_entry+0x50/0x50
>   xfs_buf_ioend+0x47e/0x530
>   __xfs_buf_submit+0xb0/0x240
>   xfs_buf_delwri_submit_buffers+0xfe/0x270
>   xfs_buf_delwri_submit+0x3a/0xc0
>   xlog_do_recovery_pass+0x474/0x7b0
>   ? do_raw_spin_unlock+0x30/0xb0
>   xlog_do_log_recovery+0x91/0x140
>   xlog_do_recover+0x38/0x1e0
>   xlog_recover+0xdd/0x170
>   xfs_log_mount+0x17e/0x2e0
>   xfs_mountfs+0x457/0x930
>   xfs_fs_fill_super+0x476/0x830
> 
> xlog_force_shutdown() always needs to mark the log as shut down,
> regardless of whether recovery is in progress or not, so that
> multiple calls to xfs_force_shutdown() during recovery don't end
> up waiting for the log to be shut down like this.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

That sounds like a pretty straightforward premise.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 6166348ce1d1..5f3f943c34b9 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3729,11 +3729,7 @@ xlog_force_shutdown(
>  {
>  	bool		log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
>  
> -	/*
> -	 * If this happens during log recovery then we aren't using the runtime
> -	 * log mechanisms yet so there's nothing to shut down.
> -	 */
> -	if (!log || xlog_in_recovery(log))
> +	if (!log)
>  		return false;
>  
>  	/*
> @@ -3742,10 +3738,16 @@ xlog_force_shutdown(
>  	 * before the force will prevent the log force from flushing the iclogs
>  	 * to disk.
>  	 *
> -	 * Re-entry due to a log IO error shutdown during the log force is
> -	 * prevented by the atomicity of higher level shutdown code.
> +	 * When we are in recovery, there are no transactions to flush, and
> +	 * we don't want to touch the log because we don't want to perturb the
> +	 * current head/tail for future recovery attempts. Hence we need to
> +	 * avoid a log force in this case.
> +	 *
> +	 * If we are shutting down due to a log IO error, then we must avoid
> +	 * trying to write the log as that may just result in more IO errors and
> +	 * an endless shutdown/force loop.
>  	 */
> -	if (!log_error)
> +	if (!log_error && !xlog_in_recovery(log))
>  		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
>  
>  	/*
diff mbox series

Patch

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 6166348ce1d1..5f3f943c34b9 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3729,11 +3729,7 @@  xlog_force_shutdown(
 {
 	bool		log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
 
-	/*
-	 * If this happens during log recovery then we aren't using the runtime
-	 * log mechanisms yet so there's nothing to shut down.
-	 */
-	if (!log || xlog_in_recovery(log))
+	if (!log)
 		return false;
 
 	/*
@@ -3742,10 +3738,16 @@  xlog_force_shutdown(
 	 * before the force will prevent the log force from flushing the iclogs
 	 * to disk.
 	 *
-	 * Re-entry due to a log IO error shutdown during the log force is
-	 * prevented by the atomicity of higher level shutdown code.
+	 * When we are in recovery, there are no transactions to flush, and
+	 * we don't want to touch the log because we don't want to perturb the
+	 * current head/tail for future recovery attempts. Hence we need to
+	 * avoid a log force in this case.
+	 *
+	 * If we are shutting down due to a log IO error, then we must avoid
+	 * trying to write the log as that may just result in more IO errors and
+	 * an endless shutdown/force loop.
 	 */
-	if (!log_error)
+	if (!log_error && !xlog_in_recovery(log))
 		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
 
 	/*