[1/5] block: split write_zeroes always
diff mbox

Message ID 1463476543-3087-2-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev May 17, 2016, 9:15 a.m. UTC
We should split requests even if they are less than write_zeroes_alignment.
For example we can have the following request:
  offset 62k
  size   4k
  write_zeroes_alignment 64k
The original code sent 1 request covering 2 qcow2 clusters, and resulted
in both clusters being allocated. But by splitting the request, we can
cater to the case where one of the two clusters can be zeroed as a
whole, for only 1 cluster allocated after the operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Kevin Wolf May 17, 2016, 4:34 p.m. UTC | #1
Am 17.05.2016 um 11:15 hat Denis V. Lunev geschrieben:
> We should split requests even if they are less than write_zeroes_alignment.
> For example we can have the following request:
>   offset 62k
>   size   4k
>   write_zeroes_alignment 64k
> The original code sent 1 request covering 2 qcow2 clusters, and resulted
> in both clusters being allocated. But by splitting the request, we can
> cater to the case where one of the two clusters can be zeroed as a
> whole, for only 1 cluster allocated after the operation.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Eric Blake <eblake@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index cd6d71a..6a24ea8 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1172,13 +1172,13 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>          /* Align request.  Block drivers can expect the "bulk" of the request
>           * to be aligned.
>           */
> -        if (bs->bl.write_zeroes_alignment
> -            && num > bs->bl.write_zeroes_alignment) {
> +        if (bs->bl.write_zeroes_alignment) {
>              if (sector_num % bs->bl.write_zeroes_alignment != 0) {
>                  /* Make a small request up to the first aligned sector.  */
>                  num = bs->bl.write_zeroes_alignment;
>                  num -= sector_num % bs->bl.write_zeroes_alignment;

Turns out this doesn't work. If this is a small request that zeros
something in the middle of a single cluster (i.e. we have untouched data
both before and after the request in the same cluster), then num can now
become greater than nb_sectors, so that we end up zeroing too much.

I'll send a test case that catches this and unstage the series for the
time being.

Kevin
Eric Blake May 25, 2016, 6:36 p.m. UTC | #2
On 05/17/2016 10:34 AM, Kevin Wolf wrote:
> Am 17.05.2016 um 11:15 hat Denis V. Lunev geschrieben:
>> We should split requests even if they are less than write_zeroes_alignment.
>> For example we can have the following request:
>>   offset 62k
>>   size   4k
>>   write_zeroes_alignment 64k
>> The original code sent 1 request covering 2 qcow2 clusters, and resulted
>> in both clusters being allocated. But by splitting the request, we can
>> cater to the case where one of the two clusters can be zeroed as a
>> whole, for only 1 cluster allocated after the operation.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block/io.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index cd6d71a..6a24ea8 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1172,13 +1172,13 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>>          /* Align request.  Block drivers can expect the "bulk" of the request
>>           * to be aligned.
>>           */
>> -        if (bs->bl.write_zeroes_alignment
>> -            && num > bs->bl.write_zeroes_alignment) {
>> +        if (bs->bl.write_zeroes_alignment) {
>>              if (sector_num % bs->bl.write_zeroes_alignment != 0) {
>>                  /* Make a small request up to the first aligned sector.  */
>>                  num = bs->bl.write_zeroes_alignment;
>>                  num -= sector_num % bs->bl.write_zeroes_alignment;
> 
> Turns out this doesn't work. If this is a small request that zeros
> something in the middle of a single cluster (i.e. we have untouched data
> both before and after the request in the same cluster), then num can now
> become greater than nb_sectors, so that we end up zeroing too much.

I'm planning on folding in a working version of this patch in my
byte-based write_zeroes conversion series.  As part of the patch, I'm
also hoisting the division out of the loop (no guarantees that the
compiler can spot that bs->bl.write_zeroes_alignment will be a power of
two, to optimize it to a shift).

Patch
diff mbox

diff --git a/block/io.c b/block/io.c
index cd6d71a..6a24ea8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1172,13 +1172,13 @@  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
         /* Align request.  Block drivers can expect the "bulk" of the request
          * to be aligned.
          */
-        if (bs->bl.write_zeroes_alignment
-            && num > bs->bl.write_zeroes_alignment) {
+        if (bs->bl.write_zeroes_alignment) {
             if (sector_num % bs->bl.write_zeroes_alignment != 0) {
                 /* Make a small request up to the first aligned sector.  */
                 num = bs->bl.write_zeroes_alignment;
                 num -= sector_num % bs->bl.write_zeroes_alignment;
-            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
+            } else if (num > bs->bl.write_zeroes_alignment &&
+                (sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
                 /* Shorten the request to the last aligned sector.  num cannot
                  * underflow because num > bs->bl.write_zeroes_alignment.
                  */