diff mbox series

[v4,05/17] xfs: reset buffer write failure state on successful completion

Message ID 20200504141154.55887-6-bfoster@redhat.com (mailing list archive)
State Accepted
Headers show
Series xfs: flush related error handling cleanups | expand

Commit Message

Brian Foster May 4, 2020, 2:11 p.m. UTC
The buffer write failure flag is intended to control the internal
write retry that XFS has historically implemented to help mitigate
the severity of transient I/O errors. The flag is set when a buffer
is resubmitted from the I/O completion path due to a previous
failure. It is checked on subsequent I/O completions to skip the
internal retry and fall through to the higher level configurable
error handling mechanism. The flag is cleared in the synchronous and
delwri submission paths and also checked in various places to log
write failure messages.

There are a couple minor problems with the current usage of this
flag. One is that we issue an internal retry after every submission
from xfsaild due to how delwri submission clears the flag. This
results in double the expected or configured number of write
attempts when under sustained failures. Another more subtle issue is
that the flag is never cleared on successful I/O completion. This
can cause xfs_wait_buftarg() to suggest that dirty buffers are being
thrown away due to the existence of the flag, when the reality is
that the flag might still be set because the write succeeded on the
retry.

Clear the write failure flag on successful I/O completion to address
both of these problems. This means that the internal retry attempt
occurs once since the last time a buffer write failed and that
various other contexts only see the flag set when the immediately
previous write attempt has failed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Allison Henderson May 5, 2020, 9:09 p.m. UTC | #1
On 5/4/20 7:11 AM, Brian Foster wrote:
> The buffer write failure flag is intended to control the internal
> write retry that XFS has historically implemented to help mitigate
> the severity of transient I/O errors. The flag is set when a buffer
> is resubmitted from the I/O completion path due to a previous
> failure. It is checked on subsequent I/O completions to skip the
> internal retry and fall through to the higher level configurable
> error handling mechanism. The flag is cleared in the synchronous and
> delwri submission paths and also checked in various places to log
> write failure messages.
> 
> There are a couple minor problems with the current usage of this
> flag. One is that we issue an internal retry after every submission
> from xfsaild due to how delwri submission clears the flag. This
> results in double the expected or configured number of write
> attempts when under sustained failures. Another more subtle issue is
> that the flag is never cleared on successful I/O completion. This
> can cause xfs_wait_buftarg() to suggest that dirty buffers are being
> thrown away due to the existence of the flag, when the reality is
> that the flag might still be set because the write succeeded on the
> retry.
> 
> Clear the write failure flag on successful I/O completion to address
> both of these problems. This means that the internal retry attempt
> occurs once since the last time a buffer write failed and that
> various other contexts only see the flag set when the immediately
> previous write attempt has failed.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Looks ok to me:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_buf.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index d5d6a68bb1e6..fd76a84cefdd 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1197,8 +1197,10 @@ xfs_buf_ioend(
>   		bp->b_ops->verify_read(bp);
>   	}
>   
> -	if (!bp->b_error)
> +	if (!bp->b_error) {
> +		bp->b_flags &= ~XBF_WRITE_FAIL;
>   		bp->b_flags |= XBF_DONE;
> +	}
>   
>   	if (bp->b_iodone)
>   		(*(bp->b_iodone))(bp);
> @@ -1274,7 +1276,7 @@ xfs_bwrite(
>   
>   	bp->b_flags |= XBF_WRITE;
>   	bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> -			 XBF_WRITE_FAIL | XBF_DONE);
> +			 XBF_DONE);
>   
>   	error = xfs_buf_submit(bp);
>   	if (error)
> @@ -1996,7 +1998,7 @@ xfs_buf_delwri_submit_buffers(
>   		 * synchronously. Otherwise, drop the buffer from the delwri
>   		 * queue and submit async.
>   		 */
> -		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
> +		bp->b_flags &= ~_XBF_DELWRI_Q;
>   		bp->b_flags |= XBF_WRITE;
>   		if (wait_list) {
>   			bp->b_flags &= ~XBF_ASYNC;
>
Allison Henderson May 5, 2020, 9:09 p.m. UTC | #2
On 5/4/20 7:11 AM, Brian Foster wrote:
> The buffer write failure flag is intended to control the internal
> write retry that XFS has historically implemented to help mitigate
> the severity of transient I/O errors. The flag is set when a buffer
> is resubmitted from the I/O completion path due to a previous
> failure. It is checked on subsequent I/O completions to skip the
> internal retry and fall through to the higher level configurable
> error handling mechanism. The flag is cleared in the synchronous and
> delwri submission paths and also checked in various places to log
> write failure messages.
> 
> There are a couple minor problems with the current usage of this
> flag. One is that we issue an internal retry after every submission
> from xfsaild due to how delwri submission clears the flag. This
> results in double the expected or configured number of write
> attempts when under sustained failures. Another more subtle issue is
> that the flag is never cleared on successful I/O completion. This
> can cause xfs_wait_buftarg() to suggest that dirty buffers are being
> thrown away due to the existence of the flag, when the reality is
> that the flag might still be set because the write succeeded on the
> retry.
> 
> Clear the write failure flag on successful I/O completion to address
> both of these problems. This means that the internal retry attempt
> occurs once since the last time a buffer write failed and that
> various other contexts only see the flag set when the immediately
> previous write attempt has failed.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Looks ok to me:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_buf.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index d5d6a68bb1e6..fd76a84cefdd 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1197,8 +1197,10 @@ xfs_buf_ioend(
>   		bp->b_ops->verify_read(bp);
>   	}
>   
> -	if (!bp->b_error)
> +	if (!bp->b_error) {
> +		bp->b_flags &= ~XBF_WRITE_FAIL;
>   		bp->b_flags |= XBF_DONE;
> +	}
>   
>   	if (bp->b_iodone)
>   		(*(bp->b_iodone))(bp);
> @@ -1274,7 +1276,7 @@ xfs_bwrite(
>   
>   	bp->b_flags |= XBF_WRITE;
>   	bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> -			 XBF_WRITE_FAIL | XBF_DONE);
> +			 XBF_DONE);
>   
>   	error = xfs_buf_submit(bp);
>   	if (error)
> @@ -1996,7 +1998,7 @@ xfs_buf_delwri_submit_buffers(
>   		 * synchronously. Otherwise, drop the buffer from the delwri
>   		 * queue and submit async.
>   		 */
> -		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
> +		bp->b_flags &= ~_XBF_DELWRI_Q;
>   		bp->b_flags |= XBF_WRITE;
>   		if (wait_list) {
>   			bp->b_flags &= ~XBF_ASYNC;
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index d5d6a68bb1e6..fd76a84cefdd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1197,8 +1197,10 @@  xfs_buf_ioend(
 		bp->b_ops->verify_read(bp);
 	}
 
-	if (!bp->b_error)
+	if (!bp->b_error) {
+		bp->b_flags &= ~XBF_WRITE_FAIL;
 		bp->b_flags |= XBF_DONE;
+	}
 
 	if (bp->b_iodone)
 		(*(bp->b_iodone))(bp);
@@ -1274,7 +1276,7 @@  xfs_bwrite(
 
 	bp->b_flags |= XBF_WRITE;
 	bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
-			 XBF_WRITE_FAIL | XBF_DONE);
+			 XBF_DONE);
 
 	error = xfs_buf_submit(bp);
 	if (error)
@@ -1996,7 +1998,7 @@  xfs_buf_delwri_submit_buffers(
 		 * synchronously. Otherwise, drop the buffer from the delwri
 		 * queue and submit async.
 		 */
-		bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
+		bp->b_flags &= ~_XBF_DELWRI_Q;
 		bp->b_flags |= XBF_WRITE;
 		if (wait_list) {
 			bp->b_flags &= ~XBF_ASYNC;