diff mbox

[2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

Message ID 1506013972-23049-3-git-send-email-idryomov@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Dryomov Sept. 21, 2017, 5:12 p.m. UTC
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:

  $ fallocate -zn -l 1k /dev/sdg
  fallocate: fallocate failed: Remote I/O error
  $ fallocate -zn -l 1k /dev/sdg  # OK
  $ fallocate -zn -l 1k /dev/sdg  # OK

The following calls succeed because sd_done() sets ->no_write_same in
response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
__blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.

This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
sd_done() has just set ->no_write_same thus indicating lack of offload
support.

Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 block/blk-lib.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

Comments

Christoph Hellwig Oct. 3, 2017, 8:04 a.m. UTC | #1
On Thu, Sep 21, 2017 at 07:12:52PM +0200, Ilya Dryomov wrote:
> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
> is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
> SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
> 
>   $ fallocate -zn -l 1k /dev/sdg
>   fallocate: fallocate failed: Remote I/O error
>   $ fallocate -zn -l 1k /dev/sdg  # OK
>   $ fallocate -zn -l 1k /dev/sdg  # OK
> 
> The following calls succeed because sd_done() sets ->no_write_same in
> response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
> __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
> 
> This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
> and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
> specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
> sd_done() has just set ->no_write_same thus indicating lack of offload
> support.
> 
> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> ---
>  block/blk-lib.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 6b97feb71065..1cb402beb983 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -316,12 +316,6 @@ static void __blkdev_issue_zero_pages(struct block_device *bdev,
>   *  Zero-fill a block range, either using hardware offload or by explicitly
>   *  writing zeroes to the device.
>   *
> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
> - *  zeroing offload support, but the device fails to process the command (for
> - *  some devices there is no non-destructive way to verify whether this
> - *  operation is actually supported).  In this case the caller should call
> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be used.
> - *
>   *  If a device is using logical block provisioning, the underlying space will
>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>   *
> @@ -374,6 +368,27 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  			&bio, flags);
>  	if (ret == 0 && bio) {
>  		ret = submit_bio_wait(bio);
> +		/*
> +		 * Fall back to a manual zeroout on any error, if allowed.
> +		 *
> +		 * Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
> +		 * driver signals zeroing offload support, but the device
> +		 * fails to process the command (for some devices there is no
> +		 * non-destructive way to verify whether this operation is
> +		 * actually supported).
> +		 */
> +		if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {

No need for the additional levels of indentation here.  Also I
really do not like the logic, we shouldn't have to duplicate much
of the logic multiple times.

I'd more go for something like (sketched in mail):

	bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);

retry:
	bio = NULL;
	blk_start_plug(&plug);
	if (try_write_zeroes)
		ret = __blkdev_issue_write_zeroes(...)
	else
		ret = __blkdev_issue_zero_pages(...)
	if (ret == 0 && bio) {
                ret = submit_bio_wait(bio);
                bio_put(bio);
        }
	blk_finish_plug(&plug);
	if (ret && try_write_zeroes) {
		try_write_zeroes = false;
		goto retry;
	}
Ilya Dryomov Oct. 4, 2017, 2:56 p.m. UTC | #2
On Tue, Oct 3, 2017 at 10:04 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Sep 21, 2017 at 07:12:52PM +0200, Ilya Dryomov wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
>> SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
>>
>>   $ fallocate -zn -l 1k /dev/sdg
>>   fallocate: fallocate failed: Remote I/O error
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>
>> The following calls succeed because sd_done() sets ->no_write_same in
>> response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
>> __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
>>
>> This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
>> and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
>> specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
>> sd_done() has just set ->no_write_same thus indicating lack of offload
>> support.
>>
>> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>> ---
>>  block/blk-lib.c | 27 +++++++++++++++++++++------
>>  1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 6b97feb71065..1cb402beb983 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -316,12 +316,6 @@ static void __blkdev_issue_zero_pages(struct block_device *bdev,
>>   *  Zero-fill a block range, either using hardware offload or by explicitly
>>   *  writing zeroes to the device.
>>   *
>> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
>> - *  zeroing offload support, but the device fails to process the command (for
>> - *  some devices there is no non-destructive way to verify whether this
>> - *  operation is actually supported).  In this case the caller should call
>> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be used.
>> - *
>>   *  If a device is using logical block provisioning, the underlying space will
>>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>>   *
>> @@ -374,6 +368,27 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>                       &bio, flags);
>>       if (ret == 0 && bio) {
>>               ret = submit_bio_wait(bio);
>> +             /*
>> +              * Fall back to a manual zeroout on any error, if allowed.
>> +              *
>> +              * Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
>> +              * driver signals zeroing offload support, but the device
>> +              * fails to process the command (for some devices there is no
>> +              * non-destructive way to verify whether this operation is
>> +              * actually supported).
>> +              */
>> +             if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {
>
> No need for the additional levels of indentation here.  Also I
> really do not like the logic, we shouldn't have to duplicate much
> of the logic multiple times.
>
> I'd more go for something like (sketched in mail):
>
>         bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
>
> retry:
>         bio = NULL;
>         blk_start_plug(&plug);
>         if (try_write_zeroes)
>                 ret = __blkdev_issue_write_zeroes(...)
>         else
>                 ret = __blkdev_issue_zero_pages(...)
>         if (ret == 0 && bio) {
>                 ret = submit_bio_wait(bio);
>                 bio_put(bio);
>         }
>         blk_finish_plug(&plug);
>         if (ret && try_write_zeroes) {
>                 try_write_zeroes = false;
>                 goto retry;
>         }

Yeah, I didn't like the code flow either but we are going to duplicate
some of it either way.  In particular, !bdev_write_zeroes_sectors() ->
ret = -EOPNOTSUPP part is still needed to avoid propagating -EREMOTEIO
in BLKDEV_ZERO_NOFALLBACK case:

        if (try_write_zeroes)
                ret = __blkdev_issue_write_zeroes(...);
        else if (!(flags & BLKDEV_ZERO_NOFALLBACK))
                ret = __blkdev_issue_zero_pages(...);
        else if (!bdev_write_zeroes_sectors(bdev))
                ret = -EOPNOTSUPP;

bs_mask check from __blkdev_issue_zeroout() too.

I'll post v2 in a few.

Thanks,

                Ilya
diff mbox

Patch

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 6b97feb71065..1cb402beb983 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -316,12 +316,6 @@  static void __blkdev_issue_zero_pages(struct block_device *bdev,
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
- *  Note that this function may fail with -EOPNOTSUPP if the driver signals
- *  zeroing offload support, but the device fails to process the command (for
- *  some devices there is no non-destructive way to verify whether this
- *  operation is actually supported).  In this case the caller should call
- *  retry the call to blkdev_issue_zeroout() and the fallback path will be used.
- *
  *  If a device is using logical block provisioning, the underlying space will
  *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
  *
@@ -374,6 +368,27 @@  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			&bio, flags);
 	if (ret == 0 && bio) {
 		ret = submit_bio_wait(bio);
+		/*
+		 * Fall back to a manual zeroout on any error, if allowed.
+		 *
+		 * Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
+		 * driver signals zeroing offload support, but the device
+		 * fails to process the command (for some devices there is no
+		 * non-destructive way to verify whether this operation is
+		 * actually supported).
+		 */
+		if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {
+			if (flags & BLKDEV_ZERO_NOFALLBACK) {
+				if (!bdev_write_zeroes_sectors(bdev))
+					ret = -EOPNOTSUPP;
+			} else {
+				bio_put(bio);
+				bio = NULL;
+				__blkdev_issue_zero_pages(bdev, sector,
+						    nr_sects, gfp_mask, &bio);
+				ret = submit_bio_wait(bio);
+			}
+		}
 		bio_put(bio);
 	}
 	blk_finish_plug(&plug);