diff mbox series

[v5,4/9] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Message ID 20200422152129.167074-5-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Fix resize (extending) of short overlays | expand

Commit Message

Kevin Wolf April 22, 2020, 3:21 p.m. UTC
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Eric Blake April 22, 2020, 3:33 p.m. UTC | #1
On 4/22/20 10:21 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 

> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>           g_assert_not_reached();
>       }
>   
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);

This rounds up beyond the new size...

> +
> +        /* Use zero clusters as much as we can */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);

and then requests that the extra be zeroed.  Does that always work, even 
when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? 
  If so,

Reviewed-by: Eric Blake <eblake@redhat.com>

otherwise, you may have to treat the tail specially, the same way you 
treated an unaligned head.
Kevin Wolf April 22, 2020, 3:58 p.m. UTC | #2
Am 22.04.2020 um 17:33 hat Eric Blake geschrieben:
> On 4/22/20 10:21 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> >   1 file changed, 30 insertions(+)
> > 
> 
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           g_assert_not_reached();
> >       }
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> 
> This rounds up beyond the new size...
> 
> > +
> > +        /* Use zero clusters as much as we can */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> 
> and then requests that the extra be zeroed.  Does that always work, even
> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?

You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
currently not a code path that is run because we only set
BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
data_file_is_raw() doesn't work with backing files.

But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
for such a file, I think it would fail.

> If so,
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> otherwise, you may have to treat the tail specially, the same way you
> treated an unaligned head.

Actually, do I even need to round the tail?

    /* Caller must pass aligned values, except at image end */
    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
still set the zero flag for the partial last cluster and for the
external data file, bdrv_co_pwrite_zeroes() would have the correct size.

Kevin
Eric Blake April 22, 2020, 4:14 p.m. UTC | #3
On 4/22/20 10:58 AM, Kevin Wolf wrote:

>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>            g_assert_not_reached();
>>>        }
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>
>> This rounds up beyond the new size...
>>
>>> +
>>> +        /* Use zero clusters as much as we can */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> and then requests that the extra be zeroed.  Does that always work, even
>> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
> 
> You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> currently not a code path that is run because we only set
> BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> data_file_is_raw() doesn't work with backing files.

Good point.

> 
> But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> for such a file, I think it would fail.
> 
>> If so,
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> otherwise, you may have to treat the tail specially, the same way you
>> treated an unaligned head.
> 
> Actually, do I even need to round the tail?
> 
>      /* Caller must pass aligned values, except at image end */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>             end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> 
> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> still set the zero flag for the partial last cluster and for the
> external data file, bdrv_co_pwrite_zeroes() would have the correct size.

Then I'm in favor of NOT rounding the tail.  That's an easy enough 
change and we've now justified that it does what we want, so R-b stands 
with that one-line tweak.
Max Reitz April 23, 2020, 10:53 a.m. UTC | #4
On 22.04.20 17:21, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..bd632405d1 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>          g_assert_not_reached();
>      }
>  
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> +
> +        /* Use zero clusters as much as we can */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);

It’s kind of a pity that this changes the cluster mappings that were
established when using falloc/full preallocation already (i.e., they
become preallocated zero clusters then, so when writing to them, we need
COW again).

But falloc/full preallocation do not guarantee that the new data is
zero, so I suppose this is the only thing we can reasonably do.

I personally don’t really care about whether zero_end is aligned or not,
so either way:

Reviewed-by: Max Reitz <mreitz@redhat.com>
Kevin Wolf April 23, 2020, 1:23 p.m. UTC | #5
Am 22.04.2020 um 18:14 hat Eric Blake geschrieben:
> On 4/22/20 10:58 AM, Kevin Wolf wrote:
> 
> > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > > >            g_assert_not_reached();
> > > >        }
> > > > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > > > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > > > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > > 
> > > This rounds up beyond the new size...
> > > 
> > > > +
> > > > +        /* Use zero clusters as much as we can */
> > > > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> > > 
> > > and then requests that the extra be zeroed.  Does that always work, even
> > > when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
> > 
> > You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> > currently not a code path that is run because we only set
> > BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> > data_file_is_raw() doesn't work with backing files.
> 
> Good point.
> 
> > 
> > But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> > for such a file, I think it would fail.
> > 
> > > If so,
> > > 
> > > Reviewed-by: Eric Blake <eblake@redhat.com>
> > > 
> > > otherwise, you may have to treat the tail specially, the same way you
> > > treated an unaligned head.
> > 
> > Actually, do I even need to round the tail?
> > 
> >      /* Caller must pass aligned values, except at image end */
> >      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> >      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> >             end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> > 
> > So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> > still set the zero flag for the partial last cluster and for the
> > external data file, bdrv_co_pwrite_zeroes() would have the correct size.
> 
> Then I'm in favor of NOT rounding the tail.  That's an easy enough change
> and we've now justified that it does what we want, so R-b stands with that
> one-line tweak.

Would have been too easy... bs->total_sectors isn't updated yet, so the
assertion does fail.

I can make the assertion check end_offset >= ... instead. That should
still check what we wanted to check here and allow the unaligned
extension.

This feels like the better option to me compared to updating
bs->total_sectors earlier and then undoing that change in every error
path.

Kevin
Kevin Wolf April 23, 2020, 1:25 p.m. UTC | #6
Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
> On 22.04.20 17:21, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..bd632405d1 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> 
> [...]
> 
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >          g_assert_not_reached();
> >      }
> >  
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > +
> > +        /* Use zero clusters as much as we can */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> 
> It’s kind of a pity that this changes the cluster mappings that were
> established when using falloc/full preallocation already (i.e., they
> become preallocated zero clusters then, so when writing to them, we need
> COW again).
> 
> But falloc/full preallocation do not guarantee that the new data is
> zero, so I suppose this is the only thing we can reasonably do.

If we really want, I guess we could make full preallocation first try
passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
we could skip setting the zero cluster flag at the qcow2 level.

Feels like a separate patch, though.

Kevin
Max Reitz April 23, 2020, 1:56 p.m. UTC | #7
On 23.04.20 15:25, Kevin Wolf wrote:
> Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
>> On 22.04.20 17:21, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>>>  1 file changed, 30 insertions(+)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 9cfbdfc939..bd632405d1 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>
>> [...]
>>
>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>          g_assert_not_reached();
>>>      }
>>>  
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>> +
>>> +        /* Use zero clusters as much as we can */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> It’s kind of a pity that this changes the cluster mappings that were
>> established when using falloc/full preallocation already (i.e., they
>> become preallocated zero clusters then, so when writing to them, we need
>> COW again).
>>
>> But falloc/full preallocation do not guarantee that the new data is
>> zero, so I suppose this is the only thing we can reasonably do.
> 
> If we really want, I guess we could make full preallocation first try
> passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
> we could skip setting the zero cluster flag at the qcow2 level.

That might be nice.

> Feels like a separate patch, though.

Definitely, yes.

Max
Eric Blake April 23, 2020, 1:59 p.m. UTC | #8
On 4/23/20 8:23 AM, Kevin Wolf wrote:

>>>
>>> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
>>> still set the zero flag for the partial last cluster and for the
>>> external data file, bdrv_co_pwrite_zeroes() would have the correct size.
>>
>> Then I'm in favor of NOT rounding the tail.  That's an easy enough change
>> and we've now justified that it does what we want, so R-b stands with that
>> one-line tweak.
> 
> Would have been too easy... bs->total_sectors isn't updated yet, so the
> assertion does fail.
> 
> I can make the assertion check end_offset >= ... instead. That should
> still check what we wanted to check here and allow the unaligned
> extension.

Yes, that works for me.

> 
> This feels like the better option to me compared to updating
> bs->total_sectors earlier and then undoing that change in every error
> path.

Indeed.
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..bd632405d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@  static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
 
     bs->supported_zero_flags = header.version >= 3 ?
                                BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
     /* Repair image if dirty */
     if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,35 @@  static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         g_assert_not_reached();
     }
 
+    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+        uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
+
+        /* Use zero clusters as much as we can */
+        ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to zero out the new area");
+            goto fail;
+        }
+
+        /* Write explicit zeros for the unaligned head */
+        if (zero_start > old_length) {
+            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
+            QEMUIOVector qiov;
+            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
+
+            qemu_co_mutex_unlock(&s->lock);
+            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
+            qemu_co_mutex_lock(&s->lock);
+
+            qemu_vfree(buf);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to zero out the new area");
+                goto fail;
+            }
+        }
+    }
+
     if (prealloc != PREALLOC_MODE_OFF) {
         /* Flush metadata before actually changing the image size */
         ret = qcow2_write_caches(bs);