[v5,4/9] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Message ID	20200422152129.167074-5-kwolf@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qLmg=6G=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3FDA2074B From: Kevin Wolf <kwolf@redhat.com> To: qemu-block@nongnu.org Subject: [PATCH v5 4/9] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate Date: Wed, 22 Apr 2020 17:21:24 +0200 Message-Id: <20200422152129.167074-5-kwolf@redhat.com> In-Reply-To: <20200422152129.167074-1-kwolf@redhat.com> References: <20200422152129.167074-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=207.211.31.120; envelope-from=kwolf@redhat.com; helo=us-smtp-1.mimecast.com Precedence: list Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, berto@igalia.com, qemu-devel@nongnu.org, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: Fix resize (extending) of short overlays \| expand [v5,0/9] block: Fix resize (extending) of short overlays [v5,1/9] block: Add flags to BlockDriver.bdrv_co_truncate() [v5,2/9] block: Add flags to bdrv(_co)_truncate() [v5,3/9] block-backend: Add flags to blk_truncate() [v5,4/9] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate [v5,5/9] raw-format: Support BDRV_REQ_ZERO_WRITE for truncate [v5,6/9] file-posix: Support BDRV_REQ_ZERO_WRITE for truncate [v5,7/9] block: truncate: Don't make backing file data visible [v5,8/9] iotests: Filter testfiles out in filter_img_info() [v5,9/9] iotests: Test committing to short backing file

Kevin Wolf April 22, 2020, 3:21 p.m. UTC

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Eric Blake April 22, 2020, 3:33 p.m. UTC | #1

On 4/22/20 10:21 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 

> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>           g_assert_not_reached();
>       }
>   
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);

This rounds up beyond the new size...

> +
> +        /* Use zero clusters as much as we can */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);

and then requests that the extra be zeroed.  Does that always work, even 
when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? 
  If so,

Reviewed-by: Eric Blake <eblake@redhat.com>

otherwise, you may have to treat the tail specially, the same way you 
treated an unaligned head.

Kevin Wolf April 22, 2020, 3:58 p.m. UTC | #2

Am 22.04.2020 um 17:33 hat Eric Blake geschrieben:
> On 4/22/20 10:21 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> >   1 file changed, 30 insertions(+)
> > 
> 
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           g_assert_not_reached();
> >       }
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> 
> This rounds up beyond the new size...
> 
> > +
> > +        /* Use zero clusters as much as we can */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> 
> and then requests that the extra be zeroed.  Does that always work, even
> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?

You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
currently not a code path that is run because we only set
BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
data_file_is_raw() doesn't work with backing files.

But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
for such a file, I think it would fail.

> If so,
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> otherwise, you may have to treat the tail specially, the same way you
> treated an unaligned head.

Actually, do I even need to round the tail?

    /* Caller must pass aligned values, except at image end */
    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
still set the zero flag for the partial last cluster and for the
external data file, bdrv_co_pwrite_zeroes() would have the correct size.

Kevin

Eric Blake April 22, 2020, 4:14 p.m. UTC | #3

On 4/22/20 10:58 AM, Kevin Wolf wrote:

>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>            g_assert_not_reached();
>>>        }
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>
>> This rounds up beyond the new size...
>>
>>> +
>>> +        /* Use zero clusters as much as we can */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> and then requests that the extra be zeroed.  Does that always work, even
>> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
> 
> You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> currently not a code path that is run because we only set
> BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> data_file_is_raw() doesn't work with backing files.

Good point.

> 
> But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> for such a file, I think it would fail.
> 
>> If so,
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> otherwise, you may have to treat the tail specially, the same way you
>> treated an unaligned head.
> 
> Actually, do I even need to round the tail?
> 
>      /* Caller must pass aligned values, except at image end */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>             end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> 
> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> still set the zero flag for the partial last cluster and for the
> external data file, bdrv_co_pwrite_zeroes() would have the correct size.

Then I'm in favor of NOT rounding the tail.  That's an easy enough 
change and we've now justified that it does what we want, so R-b stands 
with that one-line tweak.

Max Reitz April 23, 2020, 10:53 a.m. UTC | #4

On 22.04.20 17:21, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..bd632405d1 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>          g_assert_not_reached();
>      }
>  
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> +
> +        /* Use zero clusters as much as we can */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);

It’s kind of a pity that this changes the cluster mappings that were
established when using falloc/full preallocation already (i.e., they
become preallocated zero clusters then, so when writing to them, we need
COW again).

But falloc/full preallocation do not guarantee that the new data is
zero, so I suppose this is the only thing we can reasonably do.

I personally don’t really care about whether zero_end is aligned or not,
so either way:

Reviewed-by: Max Reitz <mreitz@redhat.com>

Kevin Wolf April 23, 2020, 1:23 p.m. UTC | #5

Am 22.04.2020 um 18:14 hat Eric Blake geschrieben:
> On 4/22/20 10:58 AM, Kevin Wolf wrote:
> 
> > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > > >            g_assert_not_reached();
> > > >        }
> > > > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > > > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > > > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > > 
> > > This rounds up beyond the new size...
> > > 
> > > > +
> > > > +        /* Use zero clusters as much as we can */
> > > > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> > > 
> > > and then requests that the extra be zeroed.  Does that always work, even
> > > when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
> > 
> > You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> > currently not a code path that is run because we only set
> > BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> > data_file_is_raw() doesn't work with backing files.
> 
> Good point.
> 
> > 
> > But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> > for such a file, I think it would fail.
> > 
> > > If so,
> > > 
> > > Reviewed-by: Eric Blake <eblake@redhat.com>
> > > 
> > > otherwise, you may have to treat the tail specially, the same way you
> > > treated an unaligned head.
> > 
> > Actually, do I even need to round the tail?
> > 
> >      /* Caller must pass aligned values, except at image end */
> >      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> >      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> >             end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> > 
> > So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> > still set the zero flag for the partial last cluster and for the
> > external data file, bdrv_co_pwrite_zeroes() would have the correct size.
> 
> Then I'm in favor of NOT rounding the tail.  That's an easy enough change
> and we've now justified that it does what we want, so R-b stands with that
> one-line tweak.

Would have been too easy... bs->total_sectors isn't updated yet, so the
assertion does fail.

I can make the assertion check end_offset >= ... instead. That should
still check what we wanted to check here and allow the unaligned
extension.

This feels like the better option to me compared to updating
bs->total_sectors earlier and then undoing that change in every error
path.

Kevin

Kevin Wolf April 23, 2020, 1:25 p.m. UTC | #6

Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
> On 22.04.20 17:21, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..bd632405d1 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> 
> [...]
> 
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >          g_assert_not_reached();
> >      }
> >  
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > +
> > +        /* Use zero clusters as much as we can */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> 
> It’s kind of a pity that this changes the cluster mappings that were
> established when using falloc/full preallocation already (i.e., they
> become preallocated zero clusters then, so when writing to them, we need
> COW again).
> 
> But falloc/full preallocation do not guarantee that the new data is
> zero, so I suppose this is the only thing we can reasonably do.

If we really want, I guess we could make full preallocation first try
passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
we could skip setting the zero cluster flag at the qcow2 level.

Feels like a separate patch, though.

Kevin

Max Reitz April 23, 2020, 1:56 p.m. UTC | #7

On 23.04.20 15:25, Kevin Wolf wrote:
> Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
>> On 22.04.20 17:21, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>>>  1 file changed, 30 insertions(+)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 9cfbdfc939..bd632405d1 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>
>> [...]
>>
>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>          g_assert_not_reached();
>>>      }
>>>  
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>> +
>>> +        /* Use zero clusters as much as we can */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> It’s kind of a pity that this changes the cluster mappings that were
>> established when using falloc/full preallocation already (i.e., they
>> become preallocated zero clusters then, so when writing to them, we need
>> COW again).
>>
>> But falloc/full preallocation do not guarantee that the new data is
>> zero, so I suppose this is the only thing we can reasonably do.
> 
> If we really want, I guess we could make full preallocation first try
> passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
> we could skip setting the zero cluster flag at the qcow2 level.

That might be nice.

> Feels like a separate patch, though.

Definitely, yes.

Max

Eric Blake April 23, 2020, 1:59 p.m. UTC | #8

On 4/23/20 8:23 AM, Kevin Wolf wrote:

>>>
>>> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
>>> still set the zero flag for the partial last cluster and for the
>>> external data file, bdrv_co_pwrite_zeroes() would have the correct size.
>>
>> Then I'm in favor of NOT rounding the tail.  That's an easy enough change
>> and we've now justified that it does what we want, so R-b stands with that
>> one-line tweak.
> 
> Would have been too easy... bs->total_sectors isn't updated yet, so the
> assertion does fail.
> 
> I can make the assertion check end_offset >= ... instead. That should
> still check what we wanted to check here and allow the unaligned
> extension.

Yes, that works for me.

> 
> This feels like the better option to me compared to updating
> bs->total_sectors earlier and then undoing that change in every error
> path.

Indeed.

[v5,4/9] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Commit Message

Comments

Patch