[for,2.7,v2,1/1] qcow2: improve qcow2_co_write_zeroes()

Message ID	1461740925-20887-1-git-send-email-den@openvz.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: "Denis V. Lunev" <den@openvz.org> To: qemu-devel@nongnu.org Date: Wed, 27 Apr 2016 10:08:45 +0300 Message-Id: <1461740925-20887-1-git-send-email-den@openvz.org> Subject: [Qemu-devel] [PATCH for 2.7 v2 1/1] qcow2: improve qcow2_co_write_zeroes() Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, den@openvz.org, Max Reitz <mreitz@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

1461740925-20887-1-git-send-email-den@openvz.org (mailing list archive)

State

New, archived

Headers

From: "Denis V. Lunev" <den@openvz.org>
To: qemu-devel@nongnu.org
Date: Wed, 27 Apr 2016 10:08:45 +0300
Message-Id: <1461740925-20887-1-git-send-email-den@openvz.org>
Subject: [Qemu-devel] [PATCH for 2.7 v2 1/1] qcow2: improve
	qcow2_co_write_zeroes()
Precedence: list
Cc: Kevin Wolf <kwolf@redhat.com>, den@openvz.org,
	Max Reitz <mreitz@redhat.com>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Denis V. Lunev April 27, 2016, 7:08 a.m. UTC

There is a possibility that qcow2_co_write_zeroes() will be called by
with the partial block. This could be synthetically triggered with
    qemu-io -c "write -z 32k 4k"
and can happen in the real life in qemu-nbd. The latter happens under
the following conditions:
    (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
        kernel NBD client
    (2) third party program opens kernel NBD device with O_DIRECT
    (3) third party program performs write operation with memory buffer
        not aligned to the page
In this case qcow2_co_write_zeroes() is unable to perform the operation
and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
switches to non-optimized version and writes real zeroes to the disk.

The patch creates a shortcut. If the block is full of zeroes at the moment,
the request is extended to cover full block. User-visible situation with
this block is not changed. Before the patch the block is filled in the
image with real zeroes. After that patch the block is marked as zeroed
in metadata. Thus any subsequent changes in backing store chain are not
affected.

Kewin, thank you for a cool suggestion.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
---
Changes from v1:
- description rewritten completely
- new approach suggested by Kevin is implemented

 block/qcow2.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

Comments

Kevin Wolf May 2, 2016, 3:35 p.m. UTC | #1

Am 27.04.2016 um 09:08 hat Denis V. Lunev geschrieben:
> There is a possibility that qcow2_co_write_zeroes() will be called by
> with the partial block. This could be synthetically triggered with
>     qemu-io -c "write -z 32k 4k"
> and can happen in the real life in qemu-nbd. The latter happens under
> the following conditions:
>     (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
>         kernel NBD client
>     (2) third party program opens kernel NBD device with O_DIRECT
>     (3) third party program performs write operation with memory buffer
>         not aligned to the page
> In this case qcow2_co_write_zeroes() is unable to perform the operation
> and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
> switches to non-optimized version and writes real zeroes to the disk.
> 
> The patch creates a shortcut. If the block is full of zeroes at the moment,
> the request is extended to cover full block. User-visible situation with
> this block is not changed. Before the patch the block is filled in the
> image with real zeroes. After that patch the block is marked as zeroed
> in metadata. Thus any subsequent changes in backing store chain are not
> affected.
> 
> Kewin, thank you for a cool suggestion.

s/Kewin/Kevin/ ;-)

> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> ---
> Changes from v1:
> - description rewritten completely
> - new approach suggested by Kevin is implemented
> 
>  block/qcow2.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 470734b..405d1da 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2417,15 +2417,29 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
>      int ret;
>      BDRVQcow2State *s = bs->opaque;
>  
> -    /* Emulate misaligned zero writes */
> -    if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
> -        return -ENOTSUP;
> +    int head = sector_num % s->cluster_sectors;
> +    int tail = nb_sectors % s->cluster_sectors;

If you want to get the number of sectors between the end of the request
and the end of its cluster, this is not the right calculation.

We need to count the sectors after the end of the request rather than
those before it, and we also need to consider requests where both start
and end of the request are unaligned.

> +    if (head != 0 || tail != 0) {
> +        int nr;
> +        BlockDriverState *file;
> +
> +        sector_num -= head;
> +        nb_sectors += head + tail;
> +
> +        /* check the the request extended to the entire cluster is read
> +           as all zeroes at the moment. If so we can mark entire cluster
> +           as zeroed in the metadata */
> +        ret = bdrv_get_block_status_above(bs, NULL, sector_num, nb_sectors,
> +                                          &nr, &file);
> +        if (ret < 0 || !(ret & BDRV_BLOCK_ZERO) || sector_num == nr) {

sector_num is an offset and nr is a length, so the comparison doesn't
look very meaningful. Did you mean nb_sectors != nr?

> +            /* Emulate misaligned zero writes */
> +            return -ENOTSUP;
> +        }
>      }
>  
>      /* Whatever is left can use real zero clusters */
>      qemu_co_mutex_lock(&s->lock);
> -    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
> -        nb_sectors);
> +    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
>      qemu_co_mutex_unlock(&s->lock);

Kevin

Denis V. Lunev May 3, 2016, 1:30 p.m. UTC | #2

On 05/02/2016 06:35 PM, Kevin Wolf wrote:
> Am 27.04.2016 um 09:08 hat Denis V. Lunev geschrieben:
>> There is a possibility that qcow2_co_write_zeroes() will be called by
>> with the partial block. This could be synthetically triggered with
>>      qemu-io -c "write -z 32k 4k"
>> and can happen in the real life in qemu-nbd. The latter happens under
>> the following conditions:
>>      (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
>>          kernel NBD client
>>      (2) third party program opens kernel NBD device with O_DIRECT
>>      (3) third party program performs write operation with memory buffer
>>          not aligned to the page
>> In this case qcow2_co_write_zeroes() is unable to perform the operation
>> and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
>> switches to non-optimized version and writes real zeroes to the disk.
>>
>> The patch creates a shortcut. If the block is full of zeroes at the moment,
>> the request is extended to cover full block. User-visible situation with
>> this block is not changed. Before the patch the block is filled in the
>> image with real zeroes. After that patch the block is marked as zeroed
>> in metadata. Thus any subsequent changes in backing store chain are not
>> affected.
>>
>> Kewin, thank you for a cool suggestion.
> s/Kewin/Kevin/ ;-)
>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> ---
>> Changes from v1:
>> - description rewritten completely
>> - new approach suggested by Kevin is implemented
>>
>>   block/qcow2.c | 24 +++++++++++++++++++-----
>>   1 file changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 470734b..405d1da 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2417,15 +2417,29 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
>>       int ret;
>>       BDRVQcow2State *s = bs->opaque;
>>   
>> -    /* Emulate misaligned zero writes */
>> -    if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
>> -        return -ENOTSUP;
>> +    int head = sector_num % s->cluster_sectors;
>> +    int tail = nb_sectors % s->cluster_sectors;
> If you want to get the number of sectors between the end of the request
> and the end of its cluster, this is not the right calculation.
>
> We need to count the sectors after the end of the request rather than
> those before it, and we also need to consider requests where both start
> and end of the request are unaligned.
>
>> +    if (head != 0 || tail != 0) {
>> +        int nr;
>> +        BlockDriverState *file;
>> +
>> +        sector_num -= head;
>> +        nb_sectors += head + tail;
>> +
>> +        /* check the the request extended to the entire cluster is read
>> +           as all zeroes at the moment. If so we can mark entire cluster
>> +           as zeroed in the metadata */
>> +        ret = bdrv_get_block_status_above(bs, NULL, sector_num, nb_sectors,
>> +                                          &nr, &file);
>> +        if (ret < 0 || !(ret & BDRV_BLOCK_ZERO) || sector_num == nr) {
> sector_num is an offset and nr is a length, so the comparison doesn't
> look very meaningful. Did you mean nb_sectors != nr?
>
>> +            /* Emulate misaligned zero writes */
>> +            return -ENOTSUP;
>> +        }
>>       }
>>   
>>       /* Whatever is left can use real zero clusters */
>>       qemu_co_mutex_lock(&s->lock);
>> -    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
>> -        nb_sectors);
>> +    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
>>       qemu_co_mutex_unlock(&s->lock);
> Kevin
you are perfectly correct in both places.. Will fix.

Thank you very much.

Den

diff --git a/block/qcow2.c b/block/qcow2.c
index 470734b..405d1da 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2417,15 +2417,29 @@  static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     int ret;
     BDRVQcow2State *s = bs->opaque;
 
-    /* Emulate misaligned zero writes */
-    if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
-        return -ENOTSUP;
+    int head = sector_num % s->cluster_sectors;
+    int tail = nb_sectors % s->cluster_sectors;
+    if (head != 0 || tail != 0) {
+        int nr;
+        BlockDriverState *file;
+
+        sector_num -= head;
+        nb_sectors += head + tail;
+
+        /* check the the request extended to the entire cluster is read
+           as all zeroes at the moment. If so we can mark entire cluster
+           as zeroed in the metadata */
+        ret = bdrv_get_block_status_above(bs, NULL, sector_num, nb_sectors,
+                                          &nr, &file);
+        if (ret < 0 || !(ret & BDRV_BLOCK_ZERO) || sector_num == nr) {
+            /* Emulate misaligned zero writes */
+            return -ENOTSUP;
+        }
     }
 
     /* Whatever is left can use real zero clusters */
     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
-        nb_sectors);
+    ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
     qemu_co_mutex_unlock(&s->lock);
 
     return ret;

[for,2.7,v2,1/1] qcow2: improve qcow2_co_write_zeroes()

Commit Message

Comments

Patch