diff mbox series

[v7,04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Message ID 20200424125448.63318-5-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Fix resize (extending) of short overlays | expand

Commit Message

Kevin Wolf April 24, 2020, 12:54 p.m. UTC
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

Comments

Max Reitz April 24, 2020, 2:07 p.m. UTC | #1
On 24.04.20 14:54, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c |  2 +-
>  block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+), 1 deletion(-)

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..98065d7808 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -4214,6 +4215,39 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

[...]

> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint64_t len = zero_start - old_length;
> +            uint8_t *buf = qemu_blockalign0(bs, len);

I wonder whether I should raise the question of why this should be
block-aligned when we make no effort to align the offset its written to
(and we know it isn’t aligned to qcow2 clusters at least).

I probably should not.

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, len);
> +
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = qcow2_co_pwritev_part(bs, old_length, len, &qiov, 0, 0);
> +            qemu_co_mutex_lock(&s->lock);
> +
> +            qemu_vfree(buf);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> +                goto fail;
> +            }
> +        }
Eric Blake April 24, 2020, 2:39 p.m. UTC | #2
On 4/24/20 7:54 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
Eric Blake April 28, 2020, 4:28 p.m. UTC | #3
On 4/24/20 7:54 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+), 1 deletion(-)
> 

> +++ b/block/qcow2.c
> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>   
>       bs->supported_zero_flags = header.version >= 3 ?
>                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;

Is this really what we want for encrypted files, or would it be better as:

     if (bs->encrypted) {
         bs->supported_truncate_flags = 0;
     } else {
         bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
     }

At the qcow2 level, we can guarantee a read of 0 even for an encrypted 
image, but is that really what we want?  Is setting the qcow2 zero flag 
on the cluster done at the decrypted level (at which point we may be 
leaking information about guest contents via anyone that can read the 
qcow2 metadata) or at the encrypted level (at which point it's useless 
information, because knowing the underlying file reads as zero still 
decrypts into garbage)?
Kevin Wolf April 28, 2020, 6:45 p.m. UTC | #4
Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
> On 4/24/20 7:54 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
> >   2 files changed, 35 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
> >       bs->supported_zero_flags = header.version >= 3 ?
> >                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> > +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> 
> Is this really what we want for encrypted files, or would it be better as:
> 
>     if (bs->encrypted) {
>         bs->supported_truncate_flags = 0;
>     } else {
>         bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>     }
> 
> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
> image, but is that really what we want?  Is setting the qcow2 zero flag on
> the cluster done at the decrypted level (at which point we may be leaking
> information about guest contents via anyone that can read the qcow2
> metadata) or at the encrypted level (at which point it's useless
> information, because knowing the underlying file reads as zero still
> decrypts into garbage)?

The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.

If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).

Kevin
Eric Blake April 28, 2020, 6:58 p.m. UTC | #5
On 4/28/20 1:45 PM, Kevin Wolf wrote:
> Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
>> On 4/24/20 7:54 AM, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>    block/qcow2-cluster.c |  2 +-
>>>    block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>>>    2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>
>>> +++ b/block/qcow2.c
>>> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>>>        bs->supported_zero_flags = header.version >= 3 ?
>>>                                   BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
>>> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>
>> Is this really what we want for encrypted files, or would it be better as:
>>
>>      if (bs->encrypted) {
>>          bs->supported_truncate_flags = 0;
>>      } else {
>>          bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>      }
>>
>> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
>> image, but is that really what we want?  Is setting the qcow2 zero flag on
>> the cluster done at the decrypted level (at which point we may be leaking
>> information about guest contents via anyone that can read the qcow2
>> metadata) or at the encrypted level (at which point it's useless
>> information, because knowing the underlying file reads as zero still
>> decrypts into garbage)?
> 
> The zero flag means that the guest reads zeros, even with encrypted
> files. I'm not sure if it's worse than exposing the information which
> clusters are allocated and which are unallocated, which we have always
> been doing and which is hard to avoid without encrypting all the
> metadata, too. But it does reveal some information.
> 
> If we think that exposing zero flags is worse than exposing the
> allocation status, I would still not use your solution above. In that
> case, the full fix would be returning -ENOTSUP from
> .bdrv_co_pwrite_zeroes() to cover all other callers, too.

Indeed, it also makes me wonder if we should support 
truncate(BDRV_REQ_ZERO_WRITE|BDRV_REQ_NO_FALLBACK), to differentiate 
whether a truncation request is aiming more to be fast (NO_FALLBACK set, 
fail immediately with -ENOTSUP on encryption) or complete (NO_FALLBACK 
clear, go ahead and write guest-visible zeroes, which populates the 
format layer).  In other words, maybe we want a knob that the user can 
set on encrypted volumes on whether to allow zero flags in the qcow2 image.

> 
> If we think that allocation status and zero flags are of comparable
> importance, then we need to fix either both or nothing. Hiding all of
> this information probably means encrypting at least the L2 tables and
> potentially all of the metadata apart from the header. This would
> obviously require an incompatible feature flag (and some effort to
> implement it).

Indeed, my question is broad enough that it does not hold up _this_ 
series, so much as providing food for thought on what else we may need 
to add for encrypted qcow2 images as a future series, to make it easier 
to adjust the slider between the extremes of performance vs. minimal 
data leaks when using encryption.
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..4b5fc8c4a7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1795,7 +1795,7 @@  int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
-           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..98065d7808 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@  static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
 
     bs->supported_zero_flags = header.version >= 3 ?
                                BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
     /* Repair image if dirty */
     if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,39 @@  static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         g_assert_not_reached();
     }
 
+    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+
+        /*
+         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
+         * requires a cluster-aligned start. The end may be unaligned if it is
+         * at the end of the image (which it is here).
+         */
+        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
+            goto fail;
+        }
+
+        /* Write explicit zeros for the unaligned head */
+        if (zero_start > old_length) {
+            uint64_t len = zero_start - old_length;
+            uint8_t *buf = qemu_blockalign0(bs, len);
+            QEMUIOVector qiov;
+            qemu_iovec_init_buf(&qiov, buf, len);
+
+            qemu_co_mutex_unlock(&s->lock);
+            ret = qcow2_co_pwritev_part(bs, old_length, len, &qiov, 0, 0);
+            qemu_co_mutex_lock(&s->lock);
+
+            qemu_vfree(buf);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to zero out the new area");
+                goto fail;
+            }
+        }
+    }
+
     if (prealloc != PREALLOC_MODE_OFF) {
         /* Flush metadata before actually changing the image size */
         ret = qcow2_write_caches(bs);