[v7,04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Message ID	20200424125448.63318-5-kwolf@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qck9=6I=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 089BA20706 From: Kevin Wolf <kwolf@redhat.com> To: qemu-block@nongnu.org Subject: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Date: Fri, 24 Apr 2020 14:54:42 +0200 Message-Id: <20200424125448.63318-5-kwolf@redhat.com> In-Reply-To: <20200424125448.63318-1-kwolf@redhat.com> References: <20200424125448.63318-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=207.211.31.81; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-1.mimecast.com Precedence: list Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, berto@igalia.com, qemu-devel@nongnu.org, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: Fix resize (extending) of short overlays \| expand [v7,00/10] block: Fix resize (extending) of short overlays [v7,01/10] block: Add flags to BlockDriver.bdrv_co_truncate() [v7,02/10] block: Add flags to bdrv(_co)_truncate() [v7,03/10] block-backend: Add flags to blk_truncate() [v7,04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate [v7,05/10] raw-format: Support BDRV_REQ_ZERO_WRITE for truncate [v7,06/10] file-posix: Support BDRV_REQ_ZERO_WRITE for truncate [v7,07/10] block: truncate: Don't make backing file data visible [v7,08/10] iotests: Filter testfiles out in filter_img_info() [v7,09/10] iotests: Test committing to short backing file [v7,10/10] qcow2: Forward ZERO_WRITE flag for full preallocation

Kevin Wolf April 24, 2020, 12:54 p.m. UTC

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

Max Reitz April 24, 2020, 2:07 p.m. UTC | #1

On 24.04.20 14:54, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c |  2 +-
>  block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+), 1 deletion(-)

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..98065d7808 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -4214,6 +4215,39 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

[...]

> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint64_t len = zero_start - old_length;
> +            uint8_t *buf = qemu_blockalign0(bs, len);

I wonder whether I should raise the question of why this should be
block-aligned when we make no effort to align the offset its written to
(and we know it isn’t aligned to qcow2 clusters at least).

I probably should not.

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, len);
> +
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = qcow2_co_pwritev_part(bs, old_length, len, &qiov, 0, 0);
> +            qemu_co_mutex_lock(&s->lock);
> +
> +            qemu_vfree(buf);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> +                goto fail;
> +            }
> +        }

Eric Blake April 24, 2020, 2:39 p.m. UTC | #2

On 4/24/20 7:54 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

Eric Blake April 28, 2020, 4:28 p.m. UTC | #3

On 4/24/20 7:54 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+), 1 deletion(-)
> 

> +++ b/block/qcow2.c
> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>   
>       bs->supported_zero_flags = header.version >= 3 ?
>                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;

Is this really what we want for encrypted files, or would it be better as:

     if (bs->encrypted) {
         bs->supported_truncate_flags = 0;
     } else {
         bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
     }

At the qcow2 level, we can guarantee a read of 0 even for an encrypted 
image, but is that really what we want?  Is setting the qcow2 zero flag 
on the cluster done at the decrypted level (at which point we may be 
leaking information about guest contents via anyone that can read the 
qcow2 metadata) or at the encrypted level (at which point it's useless 
information, because knowing the underlying file reads as zero still 
decrypts into garbage)?

Kevin Wolf April 28, 2020, 6:45 p.m. UTC | #4

Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
> On 4/24/20 7:54 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
> >   2 files changed, 35 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
> >       bs->supported_zero_flags = header.version >= 3 ?
> >                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> > +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> 
> Is this really what we want for encrypted files, or would it be better as:
> 
>     if (bs->encrypted) {
>         bs->supported_truncate_flags = 0;
>     } else {
>         bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>     }
> 
> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
> image, but is that really what we want?  Is setting the qcow2 zero flag on
> the cluster done at the decrypted level (at which point we may be leaking
> information about guest contents via anyone that can read the qcow2
> metadata) or at the encrypted level (at which point it's useless
> information, because knowing the underlying file reads as zero still
> decrypts into garbage)?

The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.

If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).

Kevin

Eric Blake April 28, 2020, 6:58 p.m. UTC | #5

On 4/28/20 1:45 PM, Kevin Wolf wrote:
> Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
>> On 4/24/20 7:54 AM, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>    block/qcow2-cluster.c |  2 +-
>>>    block/qcow2.c         | 34 ++++++++++++++++++++++++++++++++++
>>>    2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>
>>> +++ b/block/qcow2.c
>>> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>>>        bs->supported_zero_flags = header.version >= 3 ?
>>>                                   BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
>>> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>
>> Is this really what we want for encrypted files, or would it be better as:
>>
>>      if (bs->encrypted) {
>>          bs->supported_truncate_flags = 0;
>>      } else {
>>          bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>      }
>>
>> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
>> image, but is that really what we want?  Is setting the qcow2 zero flag on
>> the cluster done at the decrypted level (at which point we may be leaking
>> information about guest contents via anyone that can read the qcow2
>> metadata) or at the encrypted level (at which point it's useless
>> information, because knowing the underlying file reads as zero still
>> decrypts into garbage)?
> 
> The zero flag means that the guest reads zeros, even with encrypted
> files. I'm not sure if it's worse than exposing the information which
> clusters are allocated and which are unallocated, which we have always
> been doing and which is hard to avoid without encrypting all the
> metadata, too. But it does reveal some information.
> 
> If we think that exposing zero flags is worse than exposing the
> allocation status, I would still not use your solution above. In that
> case, the full fix would be returning -ENOTSUP from
> .bdrv_co_pwrite_zeroes() to cover all other callers, too.

Indeed, it also makes me wonder if we should support 
truncate(BDRV_REQ_ZERO_WRITE|BDRV_REQ_NO_FALLBACK), to differentiate 
whether a truncation request is aiming more to be fast (NO_FALLBACK set, 
fail immediately with -ENOTSUP on encryption) or complete (NO_FALLBACK 
clear, go ahead and write guest-visible zeroes, which populates the 
format layer).  In other words, maybe we want a knob that the user can 
set on encrypted volumes on whether to allow zero flags in the qcow2 image.

> 
> If we think that allocation status and zero flags are of comparable
> importance, then we need to fix either both or nothing. Hiding all of
> this information probably means encrypting at least the L2 tables and
> potentially all of the metadata apart from the header. This would
> obviously require an incompatible feature flag (and some effort to
> implement it).

Indeed, my question is broad enough that it does not hold up _this_ 
series, so much as providing food for thought on what else we may need 
to add for encrypted qcow2 images as a future series, to make it easier 
to adjust the slider between the extremes of performance vs. minimal 
data leaks when using encryption.

[v7,04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Commit Message

Comments

Patch