diff mbox

[v3,2/5] blkdebug: Add pass-through write_zero and discard support

Message ID 20161202192223.15153-3-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake Dec. 2, 2016, 7:22 p.m. UTC
In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: rebase to byte-based read/write, improve docs on why no
partial write zero passthrough
v2: new patch
---
 block/blkdebug.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

Comments

Max Reitz Dec. 6, 2016, 10 p.m. UTC | #1
On 02.12.2016 20:22, Eric Blake wrote:
> In order to test the effects of artificial geometry constraints
> on operations like write zero or discard, we first need blkdebug
> to manage these actions.  It also allows us to inject errors on
> those operations, just like we can for read/write/flush.
> 
> We can also test the contract promised by the block layer; namely,
> if a device has specified limits on alignment or maximum size,
> then those limits must be obeyed (for now, the blkdebug driver
> merely inherits limits from whatever it is wrapping, but the next
> patch will further enhance it to allow specific limit overrides).
> 
> This patch intentionally refuses to service requests smaller than
> the requested alignments; this is because an upcoming patch adds
> a qemu-iotest to prove that the block layer is correctly handling
> fragmentation, but the test only works if there is a way to tell
> the difference at artificial alignment boundaries when blkdebug is
> using a larger-than-default alignment.  If we let the blkdebug
> layer always defer to the underlying layer, which potentially has
> a smaller granularity, the iotest will be thwarted.
> 
> Tested by setting up an NBD server with export 'foo', then invoking:
> $ ./qemu-io
> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
> qemu-io> d 0 15M
> qemu-io> w -z 0 15M
> 
> Pre-patch, the server never sees the discard (it was silently
> eaten by the block layer); post-patch it is passed across the
> wire.  Likewise, pre-patch the write is always passed with
> NBD_WRITE (with 15M of zeroes on the wire), while post-patch
> it can utilize NBD_WRITE_ZEROES (for less traffic).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: rebase to byte-based read/write, improve docs on why no
> partial write zero passthrough
> v2: new patch
> ---
>  block/blkdebug.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>
Eric Blake Dec. 6, 2016, 10:12 p.m. UTC | #2
On 12/06/2016 04:00 PM, Max Reitz wrote:

>> Tested by setting up an NBD server with export 'foo', then invoking:
>> $ ./qemu-io
>> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo

By the way, I'd LOVE to know if there is a way to write a qemu-io
command line that would do this connection automatically (so that I can
batch commands up front and benefit from the shell's history) rather
than having to issue an 'open' after the fact.  I tried various
incantations with --object and --image-opts, but got stumped.
Max Reitz Dec. 6, 2016, 10:14 p.m. UTC | #3
On 06.12.2016 23:12, Eric Blake wrote:
> On 12/06/2016 04:00 PM, Max Reitz wrote:
> 
>>> Tested by setting up an NBD server with export 'foo', then invoking:
>>> $ ./qemu-io
>>> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
> 
> By the way, I'd LOVE to know if there is a way to write a qemu-io
> command line that would do this connection automatically (so that I can
> batch commands up front and benefit from the shell's history) rather
> than having to issue an 'open' after the fact.  I tried various
> incantations with --object and --image-opts, but got stumped.

Can't you just do qemu-io -c 'open'?

Max
Eric Blake Dec. 6, 2016, 10:18 p.m. UTC | #4
On 12/06/2016 04:14 PM, Max Reitz wrote:
> On 06.12.2016 23:12, Eric Blake wrote:
>> On 12/06/2016 04:00 PM, Max Reitz wrote:
>>
>>>> Tested by setting up an NBD server with export 'foo', then invoking:
>>>> $ ./qemu-io
>>>> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
>>
>> By the way, I'd LOVE to know if there is a way to write a qemu-io
>> command line that would do this connection automatically (so that I can
>> batch commands up front and benefit from the shell's history) rather
>> than having to issue an 'open' after the fact.  I tried various
>> incantations with --object and --image-opts, but got stumped.
> 
> Can't you just do qemu-io -c 'open'?

I suppose that would get command-line history.  But I still want
interactive mode.  The moment you use -c, ALL commands get run
back-to-back without stopping, so I'd have to add additional -c
'read'/'write' commands up front. I like interactive mode (open
pre-connected, now let me explore the image at will).
Max Reitz Dec. 6, 2016, 10:23 p.m. UTC | #5
On 06.12.2016 23:18, Eric Blake wrote:
> On 12/06/2016 04:14 PM, Max Reitz wrote:
>> On 06.12.2016 23:12, Eric Blake wrote:
>>> On 12/06/2016 04:00 PM, Max Reitz wrote:
>>>
>>>>> Tested by setting up an NBD server with export 'foo', then invoking:
>>>>> $ ./qemu-io
>>>>> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
>>>
>>> By the way, I'd LOVE to know if there is a way to write a qemu-io
>>> command line that would do this connection automatically (so that I can
>>> batch commands up front and benefit from the shell's history) rather
>>> than having to issue an 'open' after the fact.  I tried various
>>> incantations with --object and --image-opts, but got stumped.
>>
>> Can't you just do qemu-io -c 'open'?
> 
> I suppose that would get command-line history.  But I still want
> interactive mode.  The moment you use -c, ALL commands get run
> back-to-back without stopping, so I'd have to add additional -c
> 'read'/'write' commands up front. I like interactive mode (open
> pre-connected, now let me explore the image at will).

Well, the usual --image-opts version would be:

./qemu-io --image-opts driver=blkdebug,image.driver=nbd,\
image.host=localhost,image.export=foo

Max
Eric Blake Dec. 6, 2016, 10:36 p.m. UTC | #6
On 12/06/2016 04:23 PM, Max Reitz wrote:
> On 06.12.2016 23:18, Eric Blake wrote:
>> On 12/06/2016 04:14 PM, Max Reitz wrote:
>>> On 06.12.2016 23:12, Eric Blake wrote:
>>>> On 12/06/2016 04:00 PM, Max Reitz wrote:
>>>>
>>>>>> Tested by setting up an NBD server with export 'foo', then invoking:
>>>>>> $ ./qemu-io
>>>>>> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
>>>>
>>>> By the way, I'd LOVE to know if there is a way to write a qemu-io
>>>> command line that would do this connection automatically (so that I can
>>>> batch commands up front and benefit from the shell's history) rather
>>>> than having to issue an 'open' after the fact.  I tried various
>>>> incantations with --object and --image-opts, but got stumped.
>>>
>>> Can't you just do qemu-io -c 'open'?
>>
>> I suppose that would get command-line history.  But I still want
>> interactive mode.  The moment you use -c, ALL commands get run
>> back-to-back without stopping, so I'd have to add additional -c
>> 'read'/'write' commands up front. I like interactive mode (open
>> pre-connected, now let me explore the image at will).
> 
> Well, the usual --image-opts version would be:
> 
> ./qemu-io --image-opts driver=blkdebug,image.driver=nbd,\
> image.host=localhost,image.export=foo

Thanks, that appears to do the trick! I think I was getting confused by
trying 'file.driver' instead of 'image.driver', or maybe it was because
I was trying 'image.align' to set the blkdebug alignment where just
plain 'align' works once you are in --image-opts mode, or some such
problem on my end.  [Maybe I shouldn't be testing patches late at night,
either...]
Kevin Wolf Dec. 7, 2016, 1:55 p.m. UTC | #7
Am 02.12.2016 um 20:22 hat Eric Blake geschrieben:
> In order to test the effects of artificial geometry constraints
> on operations like write zero or discard, we first need blkdebug
> to manage these actions.  It also allows us to inject errors on
> those operations, just like we can for read/write/flush.
> 
> We can also test the contract promised by the block layer; namely,
> if a device has specified limits on alignment or maximum size,
> then those limits must be obeyed (for now, the blkdebug driver
> merely inherits limits from whatever it is wrapping, but the next
> patch will further enhance it to allow specific limit overrides).
> 
> This patch intentionally refuses to service requests smaller than
> the requested alignments; this is because an upcoming patch adds
> a qemu-iotest to prove that the block layer is correctly handling
> fragmentation, but the test only works if there is a way to tell
> the difference at artificial alignment boundaries when blkdebug is
> using a larger-than-default alignment.  If we let the blkdebug
> layer always defer to the underlying layer, which potentially has
> a smaller granularity, the iotest will be thwarted.
> 
> Tested by setting up an NBD server with export 'foo', then invoking:
> $ ./qemu-io
> qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
> qemu-io> d 0 15M
> qemu-io> w -z 0 15M
> 
> Pre-patch, the server never sees the discard (it was silently
> eaten by the block layer); post-patch it is passed across the
> wire.  Likewise, pre-patch the write is always passed with
> NBD_WRITE (with 15M of zeroes on the wire), while post-patch
> it can utilize NBD_WRITE_ZEROES (for less traffic).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v3: rebase to byte-based read/write, improve docs on why no
> partial write zero passthrough
> v2: new patch
> ---
>  block/blkdebug.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 37094a2..aac8184 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -1,6 +1,7 @@
>  /*
>   * Block protocol for I/O error injection
>   *
> + * Copyright (C) 2016 Red Hat, Inc.
>   * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a copy
> @@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
>          goto out;
>      }
> 
> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->file->bs->supported_write_flags;
> +    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->file->bs->supported_zero_flags;
> +
>      /* Set request alignment */
>      align = qemu_opt_get_size(opts, "align", 0);
>      if (align < INT_MAX && is_power_of_2(align)) {
> @@ -512,6 +518,79 @@ static int blkdebug_co_flush(BlockDriverState *bs)
>  }
> 
> 
> +static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
> +                                                  int64_t offset, int count,
> +                                                  BdrvRequestFlags flags)
> +{
> +    BDRVBlkdebugState *s = bs->opaque;
> +    BlkdebugRule *rule = NULL;
> +    uint32_t align = MAX(bs->bl.request_alignment,
> +                         bs->bl.pwrite_zeroes_alignment);
> +
> +    /* Only pass through requests that are larger than requested
> +     * preferred alignment (so that we test the fallback to writes on
> +     * unaligned portions), and check that the block layer never hands
> +     * us anything crossing an alignment boundary.  */
> +    if (count < align) {
> +        return -ENOTSUP;
> +    }
> +    assert(QEMU_IS_ALIGNED(offset, align));
> +    assert(QEMU_IS_ALIGNED(count, align));
> +    if (bs->bl.max_pwrite_zeroes) {
> +        assert(count <= bs->bl.max_pwrite_zeroes);
> +    }
> +
> +    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
> +        if (rule->options.inject.offset == -1) {

We do have offset and bytes parameters in this function, so I guess we
should check overlaps like in the read/write functions instead of only
executing the rule if it doesn't specify an offset.

> +            break;
> +        }
> +    }
> +
> +    if (rule && rule->options.inject.error) {
> +        return inject_error(bs, rule);
> +    }
> +
> +    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
> +}
> +
> +

Why two newlines?

> +static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
> +                                             int64_t offset, int count)
> +{
> +    BDRVBlkdebugState *s = bs->opaque;
> +    BlkdebugRule *rule = NULL;
> +    uint32_t align = bs->bl.pdiscard_alignment;
> +
> +    /* Only pass through requests that are larger than requested
> +     * minimum alignment, and ensure that unaligned requests do not
> +     * cross optimum discard boundaries. */
> +    if (count < bs->bl.request_alignment) {
> +        return -ENOTSUP;
> +    }
> +    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
> +    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
> +    if (align && count >= align) {
> +        assert(QEMU_IS_ALIGNED(offset, align));
> +        assert(QEMU_IS_ALIGNED(count, align));
> +    }
> +    if (bs->bl.max_pdiscard) {
> +        assert(count <= bs->bl.max_pdiscard);
> +    }
> +
> +    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
> +        if (rule->options.inject.offset == -1) {

Same thing as above.

> +            break;
> +        }
> +    }
> +
> +    if (rule && rule->options.inject.error) {
> +        return inject_error(bs, rule);
> +    }
> +
> +    return bdrv_co_pdiscard(bs->file->bs, offset, count);
> +}

Kevin
Eric Blake Dec. 7, 2016, 3:15 p.m. UTC | #8
On 12/07/2016 07:55 AM, Kevin Wolf wrote:
> Am 02.12.2016 um 20:22 hat Eric Blake geschrieben:
>> In order to test the effects of artificial geometry constraints
>> on operations like write zero or discard, we first need blkdebug
>> to manage these actions.  It also allows us to inject errors on
>> those operations, just like we can for read/write/flush.
>>

>>
>>
>> +static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
>> +                                                  int64_t offset, int count,
>> +                                                  BdrvRequestFlags flags)
>> +{
>> +    BDRVBlkdebugState *s = bs->opaque;
>> +    BlkdebugRule *rule = NULL;
>> +    uint32_t align = MAX(bs->bl.request_alignment,
>> +                         bs->bl.pwrite_zeroes_alignment);
>> +
>> +    /* Only pass through requests that are larger than requested
>> +     * preferred alignment (so that we test the fallback to writes on
>> +     * unaligned portions), and check that the block layer never hands
>> +     * us anything crossing an alignment boundary.  */
>> +    if (count < align) {
>> +        return -ENOTSUP;
>> +    }

This early exit bypasses the checks for error injection - but the block
layer will then fall back to write which will perform the check there.

>> +    assert(QEMU_IS_ALIGNED(offset, align));
>> +    assert(QEMU_IS_ALIGNED(count, align));
>> +    if (bs->bl.max_pwrite_zeroes) {
>> +        assert(count <= bs->bl.max_pwrite_zeroes);
>> +    }
>> +
>> +    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
>> +        if (rule->options.inject.offset == -1) {
> 
> We do have offset and bytes parameters in this function, so I guess we
> should check overlaps like in the read/write functions instead of only
> executing the rule if it doesn't specify an offset.

Oh, right. I copied and pasted from flush, when I should have copied
from read.

>> +    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
>> +}
>> +
>> +
> 
> Why two newlines?
> 

Habit; they aren't essential, so I'll trim to 1 for consistency with the
rest of the file.

>> +static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
>> +                                             int64_t offset, int count)
>> +{
>> +    BDRVBlkdebugState *s = bs->opaque;
>> +    BlkdebugRule *rule = NULL;
>> +    uint32_t align = bs->bl.pdiscard_alignment;
>> +
>> +    /* Only pass through requests that are larger than requested
>> +     * minimum alignment, and ensure that unaligned requests do not
>> +     * cross optimum discard boundaries. */
>> +    if (count < bs->bl.request_alignment) {
>> +        return -ENOTSUP;
>> +    }

Here, the early exit is a no-op; we never see the error injection
because there is no fallback path at the block layer.  But I guess
that's still okay - when we are discarding the unaligned portion of a
request, there really is no I/O and therefore no way to inject an error
representing failed I/O.

Looks like I get to spin a v4 for the error injection to do specific
range checking.
diff mbox

Patch

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 37094a2..aac8184 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1,6 +1,7 @@ 
 /*
  * Block protocol for I/O error injection
  *
+ * Copyright (C) 2016 Red Hat, Inc.
  * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -382,6 +383,11 @@  static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }

+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags;
+    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags;
+
     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align", 0);
     if (align < INT_MAX && is_power_of_2(align)) {
@@ -512,6 +518,79 @@  static int blkdebug_co_flush(BlockDriverState *bs)
 }


+static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
+                                                  int64_t offset, int count,
+                                                  BdrvRequestFlags flags)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+    BlkdebugRule *rule = NULL;
+    uint32_t align = MAX(bs->bl.request_alignment,
+                         bs->bl.pwrite_zeroes_alignment);
+
+    /* Only pass through requests that are larger than requested
+     * preferred alignment (so that we test the fallback to writes on
+     * unaligned portions), and check that the block layer never hands
+     * us anything crossing an alignment boundary.  */
+    if (count < align) {
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, align));
+    assert(QEMU_IS_ALIGNED(count, align));
+    if (bs->bl.max_pwrite_zeroes) {
+        assert(count <= bs->bl.max_pwrite_zeroes);
+    }
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        if (rule->options.inject.offset == -1) {
+            break;
+        }
+    }
+
+    if (rule && rule->options.inject.error) {
+        return inject_error(bs, rule);
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+
+static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int count)
+{
+    BDRVBlkdebugState *s = bs->opaque;
+    BlkdebugRule *rule = NULL;
+    uint32_t align = bs->bl.pdiscard_alignment;
+
+    /* Only pass through requests that are larger than requested
+     * minimum alignment, and ensure that unaligned requests do not
+     * cross optimum discard boundaries. */
+    if (count < bs->bl.request_alignment) {
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
+    if (align && count >= align) {
+        assert(QEMU_IS_ALIGNED(offset, align));
+        assert(QEMU_IS_ALIGNED(count, align));
+    }
+    if (bs->bl.max_pdiscard) {
+        assert(count <= bs->bl.max_pdiscard);
+    }
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        if (rule->options.inject.offset == -1) {
+            break;
+        }
+    }
+
+    if (rule && rule->options.inject.error) {
+        return inject_error(bs, rule);
+    }
+
+    return bdrv_co_pdiscard(bs->file->bs, offset, count);
+}
+
+
 static void blkdebug_close(BlockDriverState *bs)
 {
     BDRVBlkdebugState *s = bs->opaque;
@@ -763,6 +842,8 @@  static BlockDriver bdrv_blkdebug = {
     .bdrv_co_preadv         = blkdebug_co_preadv,
     .bdrv_co_pwritev        = blkdebug_co_pwritev,
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
+    .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
+    .bdrv_co_pdiscard       = blkdebug_co_pdiscard,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,