diff mbox

[v2,4/9] block: Return -ENOTSUP rather than assert on unaligned discards

Message ID 1479413642-22463-5-git-send-email-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake Nov. 17, 2016, 8:13 p.m. UTC
Right now, the block layer rounds discard requests, so that
individual drivers are able to assert that discard requests
will never be unaligned.  But there are some ISCSI devices
that track and coalesce multiple unaligned requests, turning it
into an actual discard if the requests eventually cover an
entire page, which implies that it is better to always pass
discard requests as low down the stack as possible.

In isolation, this patch has no semantic effect, since the
block layer currently never passes an unaligned request through.
But the block layer already has code that silently ignores
drivers that return -ENOTSUP for a discard request that cannot
be honored (as well as drivers that return 0 even when nothing
was done).  But the next patch will update the block layer to
fragment discard requests, so that clients are guaranteed that
they are either dealing with an unaligned head or tail, or an
aligned core, making it similar to the block layer semantics of
write zero fragmentation.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/iscsi.c    | 4 +++-
 block/qcow2.c    | 5 +++++
 block/sheepdog.c | 5 +++--
 3 files changed, 11 insertions(+), 3 deletions(-)

Comments

Max Reitz Nov. 17, 2016, 10:01 p.m. UTC | #1
On 17.11.2016 21:13, Eric Blake wrote:
> Right now, the block layer rounds discard requests, so that
> individual drivers are able to assert that discard requests
> will never be unaligned.  But there are some ISCSI devices
> that track and coalesce multiple unaligned requests, turning it
> into an actual discard if the requests eventually cover an
> entire page, which implies that it is better to always pass
> discard requests as low down the stack as possible.
> 
> In isolation, this patch has no semantic effect, since the
> block layer currently never passes an unaligned request through.
> But the block layer already has code that silently ignores
> drivers that return -ENOTSUP for a discard request that cannot
> be honored (as well as drivers that return 0 even when nothing
> was done).  But the next patch will update the block layer to
> fragment discard requests, so that clients are guaranteed that
> they are either dealing with an unaligned head or tail, or an
> aligned core, making it similar to the block layer semantics of
> write zero fragmentation.
> 
> CC: qemu-stable@nongnu.org
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/iscsi.c    | 4 +++-
>  block/qcow2.c    | 5 +++++
>  block/sheepdog.c | 5 +++--
>  3 files changed, 11 insertions(+), 3 deletions(-)

OK, so much about my remark that the comment describing
pdiscard_alignment only says it's an "optimal alignment"...

> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 71bd523..0960929 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1083,7 +1083,9 @@ coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
>      struct IscsiTask iTask;
>      struct unmap_list list;
> 
> -    assert(is_byte_request_lun_aligned(offset, count, iscsilun));
> +    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
> +        return -ENOTSUP;
> +    }
> 
>      if (!iscsilun->lbp.lbpu) {
>          /* UNMAP is not supported by the target */

Next line is:

>          return 0;

Hmm... -ENOTSUP would be the obvious return value here, too. That might
interfere with your next patch, though.

> diff --git a/block/qcow2.c b/block/qcow2.c
> index e22f6dc..7cfcd84 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2491,6 +2491,11 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
>      int ret;
>      BDRVQcow2State *s = bs->opaque;
> 
> +    if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {

Ha! I like "offset | count".

> +        assert(count < s->cluster_size);

Maybe add a comment for this assertion? E.g. "The block layer will only
generate unaligned discard requests that are smaller than the alignment".

> +        return -ENOTSUP;
> +    }
> +
>      qemu_co_mutex_lock(&s->lock);
>      ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
>                                   QCOW2_DISCARD_REQUEST, false);
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 1fb9173..4c9af89 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -2829,8 +2829,9 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState *bs, int64_t offset,
>      iov.iov_len = sizeof(zero);
>      discard_iov.iov = &iov;
>      discard_iov.niov = 1;
> -    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> -    assert((count & (BDRV_SECTOR_SIZE - 1)) == 0);
> +    if (!QEMU_IS_ALIGNED(offset | count, BDRV_SECTOR_SIZE)) {
> +        return -ENOTSUP;
> +    }

Out of interest: Where does sheepdog tell the block layer that requests
have to be aligned to this value?

With this patch, it doesn't matter though, it only did before, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

>      acb = sd_aio_setup(bs, &discard_iov, offset >> BDRV_SECTOR_BITS,
>                         count >> BDRV_SECTOR_BITS);
>      acb->aiocb_type = AIOCB_DISCARD_OBJ;
>
Eric Blake Nov. 18, 2016, 10:48 p.m. UTC | #2
On 11/17/2016 04:01 PM, Max Reitz wrote:
> On 17.11.2016 21:13, Eric Blake wrote:
>> Right now, the block layer rounds discard requests, so that
>> individual drivers are able to assert that discard requests
>> will never be unaligned.  But there are some ISCSI devices
>> that track and coalesce multiple unaligned requests, turning it
>> into an actual discard if the requests eventually cover an
>> entire page, which implies that it is better to always pass
>> discard requests as low down the stack as possible.
>>
>> In isolation, this patch has no semantic effect, since the
>> block layer currently never passes an unaligned request through.
>> But the block layer already has code that silently ignores
>> drivers that return -ENOTSUP for a discard request that cannot
>> be honored (as well as drivers that return 0 even when nothing
>> was done).  But the next patch will update the block layer to
>> fragment discard requests, so that clients are guaranteed that
>> they are either dealing with an unaligned head or tail, or an
>> aligned core, making it similar to the block layer semantics of
>> write zero fragmentation.
>>

>> +++ b/block/iscsi.c
>> @@ -1083,7 +1083,9 @@ coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
>>      struct IscsiTask iTask;
>>      struct unmap_list list;
>>
>> -    assert(is_byte_request_lun_aligned(offset, count, iscsilun));
>> +    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
>> +        return -ENOTSUP;
>> +    }
>>
>>      if (!iscsilun->lbp.lbpu) {
>>          /* UNMAP is not supported by the target */
> 
> Next line is:
> 
>>          return 0;
> 
> Hmm... -ENOTSUP would be the obvious return value here, too. That might
> interfere with your next patch, though.

Shouldn't interfere. I guess no one value is better than the other; I
can respin to pick a consistent value (I'd lean towards -ENOTSUP) if you
think it is worth it; but I'd rather get this into 2.8 without worrying
about it.

> 
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index e22f6dc..7cfcd84 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2491,6 +2491,11 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
>>      int ret;
>>      BDRVQcow2State *s = bs->opaque;
>>
>> +    if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {
> 
> Ha! I like "offset | count".

It only works because we know that qcow2 guarantees that s->cluster_size
is a power of 2 (it does not work at the block layer, where the
bs->bl.pdiscard_align need not be a power of 2).

> 
>> +        assert(count < s->cluster_size);
> 
> Maybe add a comment for this assertion? E.g. "The block layer will only
> generate unaligned discard requests that are smaller than the alignment".

Sure, if the maintainer wants a respin.

> 
>> +        return -ENOTSUP;
>> +    }
>> +
>>      qemu_co_mutex_lock(&s->lock);
>>      ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
>>                                   QCOW2_DISCARD_REQUEST, false);
>> diff --git a/block/sheepdog.c b/block/sheepdog.c
>> index 1fb9173..4c9af89 100644
>> --- a/block/sheepdog.c
>> +++ b/block/sheepdog.c
>> @@ -2829,8 +2829,9 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState *bs, int64_t offset,
>>      iov.iov_len = sizeof(zero);
>>      discard_iov.iov = &iov;
>>      discard_iov.niov = 1;
>> -    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>> -    assert((count & (BDRV_SECTOR_SIZE - 1)) == 0);
>> +    if (!QEMU_IS_ALIGNED(offset | count, BDRV_SECTOR_SIZE)) {

Again, works because this is a power of 2.

>> +        return -ENOTSUP;
>> +    }
> 
> Out of interest: Where does sheepdog tell the block layer that requests
> have to be aligned to this value?

It's Magic !  Don't tell anyone that I told you :)

Actually, it's because sheepdog still uses .bdrv_co_readv
(sector-based), and not .bdrv_co_preadv (byte-based), so the block layer
automatically sets bs->bl.request_alignment to BDRV_SECTOR_SIZE for
sheepdog and all other old-school drivers.  The block layer then treats
bs->bl.request_alignment as the very minimum that can be changed in the
image at a time, so it only makes sense that sheepdog can't react to a
discard request aligned below those limits.

This code is weakening from an assertion to an early error return, and
then 5/9 is what starts calling the code even with something aligned
smaller than a sector.

Someday the sheepdog driver may be relaxed to implement byte-based
callbacks, and then we may want to delete the early error return of
-ENOTSUP for requests smaller than 512; but that depends on whether
sheepdog uses bytes or sectors over the wire.

> 
> With this patch, it doesn't matter though, it only did before, so:
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
>>      acb = sd_aio_setup(bs, &discard_iov, offset >> BDRV_SECTOR_BITS,
>>                         count >> BDRV_SECTOR_BITS);
>>      acb->aiocb_type = AIOCB_DISCARD_OBJ;
>>
> 
>
diff mbox

Patch

diff --git a/block/iscsi.c b/block/iscsi.c
index 71bd523..0960929 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1083,7 +1083,9 @@  coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
     struct IscsiTask iTask;
     struct unmap_list list;

-    assert(is_byte_request_lun_aligned(offset, count, iscsilun));
+    if (!is_byte_request_lun_aligned(offset, count, iscsilun)) {
+        return -ENOTSUP;
+    }

     if (!iscsilun->lbp.lbpu) {
         /* UNMAP is not supported by the target */
diff --git a/block/qcow2.c b/block/qcow2.c
index e22f6dc..7cfcd84 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2491,6 +2491,11 @@  static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     int ret;
     BDRVQcow2State *s = bs->opaque;

+    if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {
+        assert(count < s->cluster_size);
+        return -ENOTSUP;
+    }
+
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
                                  QCOW2_DISCARD_REQUEST, false);
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 1fb9173..4c9af89 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2829,8 +2829,9 @@  static coroutine_fn int sd_co_pdiscard(BlockDriverState *bs, int64_t offset,
     iov.iov_len = sizeof(zero);
     discard_iov.iov = &iov;
     discard_iov.niov = 1;
-    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-    assert((count & (BDRV_SECTOR_SIZE - 1)) == 0);
+    if (!QEMU_IS_ALIGNED(offset | count, BDRV_SECTOR_SIZE)) {
+        return -ENOTSUP;
+    }
     acb = sd_aio_setup(bs, &discard_iov, offset >> BDRV_SECTOR_BITS,
                        count >> BDRV_SECTOR_BITS);
     acb->aiocb_type = AIOCB_DISCARD_OBJ;