diff mbox

[v7,3/3] qcow2: Discard unaligned tail when wiping image

Message ID 20170330223645.349-4-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake March 30, 2017, 10:36 p.m. UTC
The previous commit pointed out a subtle difference between the
fast and slow path of qcow2_make_empty(), where we failed to discard
the final (partial) cluster of an unaligned image.

The problem stems from the fact that qcow2_discard_clusters() was
silently ignoring sub-cluster head and tail on unaligned requests.
A quick audit of all callers shows that qcow2_snapshot_create() has
always passed a cluster-aligned request since the call was added
in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned
request since commit ecdbead taught the block layer about preferred
discard alignment; and qcow2_make_empty() was fixed to pass an
aligned start (but not necessarily end) in commit a3e1505.

Asserting that the start is always aligned also points out that we
now have a dead check: rounding the end offset down can never result
in a value less than the aligned start offset (the check was rendered
dead with commit ecdbead).  Meanwhile, we do not want to round the
end cluster down in the one case of the end offset matching the
(unaligned) file size - that final partial cluster should still be
discarded.

With those fixes in place, we can adjust the testsuite so that
qemu-iotests 97 and 176 are once again identical for qcow2, showing
that fast and slow paths are back in sync.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v7: new patch
---
 block/qcow2-cluster.c      | 10 ++++------
 tests/qemu-iotests/176.out |  2 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

Comments

Max Reitz March 31, 2017, 12:51 p.m. UTC | #1
On 31.03.2017 00:36, Eric Blake wrote:
> The previous commit pointed out a subtle difference between the
> fast and slow path of qcow2_make_empty(), where we failed to discard
> the final (partial) cluster of an unaligned image.
> 
> The problem stems from the fact that qcow2_discard_clusters() was
> silently ignoring sub-cluster head and tail on unaligned requests.
> A quick audit of all callers shows that qcow2_snapshot_create() has
> always passed a cluster-aligned request since the call was added
> in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned
> request since commit ecdbead taught the block layer about preferred
> discard alignment; and qcow2_make_empty() was fixed to pass an
> aligned start (but not necessarily end) in commit a3e1505.
> 
> Asserting that the start is always aligned also points out that we
> now have a dead check: rounding the end offset down can never result
> in a value less than the aligned start offset (the check was rendered
> dead with commit ecdbead).  Meanwhile, we do not want to round the
> end cluster down in the one case of the end offset matching the
> (unaligned) file size - that final partial cluster should still be
> discarded.
> 
> With those fixes in place, we can adjust the testsuite so that
> qemu-iotests 97 and 176 are once again identical for qcow2, showing
> that fast and slow paths are back in sync.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v7: new patch
> ---
>  block/qcow2-cluster.c      | 10 ++++------
>  tests/qemu-iotests/176.out |  2 +-
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 78c11d4..100398c 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1519,12 +1519,10 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> 
>      end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> 
> -    /* Round start up and end down */
> -    offset = align_offset(offset, s->cluster_size);
> -    end_offset = start_of_cluster(s, end_offset);
> -
> -    if (offset > end_offset) {
> -        return 0;
> +    /* The caller must cluster-align start; round end down except at EOF */
> +    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> +    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
> +        end_offset = start_of_cluster(s, end_offset);
>      }

This change looks good and works for qcow2_make_empty(), but
qcow2_co_pdiscard() will still drop these requests:

$ ./qemu-img create -f qcow2 foo.qcow2 512
Formatting 'foo.qcow2', fmt=qcow2 size=512 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-io -c 'write 0 512' foo.qcow2
wrote 512/512 bytes at offset 0
512 bytes, 1 ops; 0.0062 sec (80.167 KiB/sec and 160.3335 ops/sec)
$ ./qemu-img map foo.qcow2
Offset          Length          Mapped to       File
0               0x200           0x50000         foo.qcow2
$ ./qemu-io -c 'discard 0 512' foo.qcow2
discard 512/512 bytes at offset 0
512 bytes, 1 ops; 0.0000 sec (34.877 MiB/sec and 71428.5714 ops/sec)
$ ./qemu-img map foo.qcow2
Offset          Length          Mapped to       File
0               0x200           0x50000         foo.qcow2

We don't necessarily have to fix it for 2.9, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> 
>      nb_clusters = size_to_clusters(s, end_offset - offset);
> diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
> index 990a41c..6271fa7 100644
> --- a/tests/qemu-iotests/176.out
> +++ b/tests/qemu-iotests/176.out
> @@ -35,7 +35,7 @@ Offset          Length          File
>  Offset          Length          File
>  0x7ffd0000      0x10000         TEST_DIR/t.IMGFMT.base
>  0x7ffe0000      0x20000         TEST_DIR/t.IMGFMT.itmd
> -0x83400000      0x200           TEST_DIR/t.IMGFMT
> +0x83400000      0x200           TEST_DIR/t.IMGFMT.itmd
> 
>  === Test pass 1 ===
>
Eric Blake March 31, 2017, 1:56 p.m. UTC | #2
On 03/31/2017 07:51 AM, Max Reitz wrote:
> On 31.03.2017 00:36, Eric Blake wrote:
>> The previous commit pointed out a subtle difference between the
>> fast and slow path of qcow2_make_empty(), where we failed to discard
>> the final (partial) cluster of an unaligned image.
>>

>> +    /* The caller must cluster-align start; round end down except at EOF */
>> +    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>> +    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
>> +        end_offset = start_of_cluster(s, end_offset);
>>      }
> 
> This change looks good and works for qcow2_make_empty(), but
> qcow2_co_pdiscard() will still drop these requests:

Yes, but qcow2_co_pdiscard() is advisory, and leaving the last partial
cluster undiscarded (consistent for what we do for all other partial
cluster requests) is different than our documented notion that
qcow2_make_empty() gets rid of all clusters no matter what.

> We don't necessarily have to fix it for 2.9, so:

Agreed that enhancing pdiscard to special-case a partial cluster at EOF
is not a bug fix, and therefore 2.10 material if we even want it.

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
>>
>>      nb_clusters = size_to_clusters(s, end_offset - offset);
>> diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
>> index 990a41c..6271fa7 100644
>> --- a/tests/qemu-iotests/176.out
>> +++ b/tests/qemu-iotests/176.out
>> @@ -35,7 +35,7 @@ Offset          Length          File
>>  Offset          Length          File
>>  0x7ffd0000      0x10000         TEST_DIR/t.IMGFMT.base
>>  0x7ffe0000      0x20000         TEST_DIR/t.IMGFMT.itmd
>> -0x83400000      0x200           TEST_DIR/t.IMGFMT
>> +0x83400000      0x200           TEST_DIR/t.IMGFMT.itmd

As to your comment about swapping patches 2 and 3, if I did that, then I
would not have this change to 176.out during the bug fix, and would
instead squash this line into the single patch touching the testsuite to
add the enhancement.  How important is the swapped order? Do I need to
respin for that, or is it something a maintainer could tweak, or is the
series fine as-is?  For what it's worth, the policy of single test after
the patch is somewhat opposite of Markus' approach of test first showing
the buggy behavior, then patch that includes the testsuite fix to match
the patch.  But I can live with either order, so I won't respin without
an explicit request to do so.
Max Reitz March 31, 2017, 2:01 p.m. UTC | #3
On 31.03.2017 15:56, Eric Blake wrote:
> On 03/31/2017 07:51 AM, Max Reitz wrote:
>> On 31.03.2017 00:36, Eric Blake wrote:
>>> The previous commit pointed out a subtle difference between the
>>> fast and slow path of qcow2_make_empty(), where we failed to discard
>>> the final (partial) cluster of an unaligned image.
>>>
> 
>>> +    /* The caller must cluster-align start; round end down except at EOF */
>>> +    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>>> +    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
>>> +        end_offset = start_of_cluster(s, end_offset);
>>>      }
>>
>> This change looks good and works for qcow2_make_empty(), but
>> qcow2_co_pdiscard() will still drop these requests:
> 
> Yes, but qcow2_co_pdiscard() is advisory, and leaving the last partial
> cluster undiscarded (consistent for what we do for all other partial
> cluster requests) is different than our documented notion that
> qcow2_make_empty() gets rid of all clusters no matter what.
> 
>> We don't necessarily have to fix it for 2.9, so:
> 
> Agreed that enhancing pdiscard to special-case a partial cluster at EOF
> is not a bug fix, and therefore 2.10 material if we even want it.

Why would we not want it? :-)

> 
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>>>
>>>      nb_clusters = size_to_clusters(s, end_offset - offset);
>>> diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
>>> index 990a41c..6271fa7 100644
>>> --- a/tests/qemu-iotests/176.out
>>> +++ b/tests/qemu-iotests/176.out
>>> @@ -35,7 +35,7 @@ Offset          Length          File
>>>  Offset          Length          File
>>>  0x7ffd0000      0x10000         TEST_DIR/t.IMGFMT.base
>>>  0x7ffe0000      0x20000         TEST_DIR/t.IMGFMT.itmd
>>> -0x83400000      0x200           TEST_DIR/t.IMGFMT
>>> +0x83400000      0x200           TEST_DIR/t.IMGFMT.itmd
> 
> As to your comment about swapping patches 2 and 3, if I did that, then I
> would not have this change to 176.out during the bug fix, and would
> instead squash this line into the single patch touching the testsuite to
> add the enhancement.

Right.

>                       How important is the swapped order?

As you can probably guess, technically not important. But I having
reference outputs that are not actually a reference kind of defeats the
purpose in my opinion.

>                                                            Do I need to
> respin for that, or is it something a maintainer could tweak, or is the
> series fine as-is?

Of course I can fix the code, but changing the commit messages is a bit
more involved...

>                     For what it's worth, the policy of single test after
> the patch is somewhat opposite of Markus' approach of test first showing
> the buggy behavior, then patch that includes the testsuite fix to match
> the patch.  But I can live with either order, so I won't respin without
> an explicit request to do so.

Well, then consider this an explicit request. :-)

Max
Eric Blake March 31, 2017, 2:11 p.m. UTC | #4
On 03/31/2017 09:01 AM, Max Reitz wrote:

>>                                                            Do I need to
>> respin for that, or is it something a maintainer could tweak, or is the
>> series fine as-is?
> 
> Of course I can fix the code, but changing the commit messages is a bit
> more involved...
> 
>>                     For what it's worth, the policy of single test after
>> the patch is somewhat opposite of Markus' approach of test first showing
>> the buggy behavior, then patch that includes the testsuite fix to match
>> the patch.  But I can live with either order, so I won't respin without
>> an explicit request to do so.
> 
> Well, then consider this an explicit request. :-)

Yes sir! v8 coming right up.
diff mbox

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 78c11d4..100398c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1519,12 +1519,10 @@  int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-    /* Round start up and end down */
-    offset = align_offset(offset, s->cluster_size);
-    end_offset = start_of_cluster(s, end_offset);
-
-    if (offset > end_offset) {
-        return 0;
+    /* The caller must cluster-align start; round end down except at EOF */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
+        end_offset = start_of_cluster(s, end_offset);
     }

     nb_clusters = size_to_clusters(s, end_offset - offset);
diff --git a/tests/qemu-iotests/176.out b/tests/qemu-iotests/176.out
index 990a41c..6271fa7 100644
--- a/tests/qemu-iotests/176.out
+++ b/tests/qemu-iotests/176.out
@@ -35,7 +35,7 @@  Offset          Length          File
 Offset          Length          File
 0x7ffd0000      0x10000         TEST_DIR/t.IMGFMT.base
 0x7ffe0000      0x20000         TEST_DIR/t.IMGFMT.itmd
-0x83400000      0x200           TEST_DIR/t.IMGFMT
+0x83400000      0x200           TEST_DIR/t.IMGFMT.itmd

 === Test pass 1 ===