diff mbox series

[v6,04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

Message ID 20200423150127.142609-5-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Fix resize (extending) of short overlays | expand

Commit Message

Kevin Wolf April 23, 2020, 3:01 p.m. UTC
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)

Comments

Eric Blake April 23, 2020, 3:18 p.m. UTC | #1
On 4/23/20 10:01 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 1 deletion(-)
> 

> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +
> +        /*
> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> +         * requires a cluster-aligned start. The end may be unaligned if it is
> +         * at the end of the image (which it is here).
> +         */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> +            goto fail;
> +        }
> +
> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> +
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);

This works, but would it be any more efficient to use 
qcow2_co_pwrite_zeroes?  If the head of the cluster is already zero, 
then qcow2_co_pwrite_zeroes can turn into qcow2_cluster_zeroize for this 
cluster, while qcow2_co_pwritev_part cannot.

Because what you have works, and because we can use 
qcow2_co_pwrite_zeroes as an optimization in a later patch,

Reviewed-by: Eric Blake <eblake@redhat.com>
Kevin Wolf April 23, 2020, 3:48 p.m. UTC | #2
Am 23.04.2020 um 17:18 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> 
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +
> > +        /*
> > +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> > +         * requires a cluster-aligned start. The end may be unaligned if it is
> > +         * at the end of the image (which it is here).
> > +         */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> > +        if (ret < 0) {
> > +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> > +            goto fail;
> > +        }
> > +
> > +        /* Write explicit zeros for the unaligned head */
> > +        if (zero_start > old_length) {
> > +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> > +            QEMUIOVector qiov;
> > +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> > +
> > +            qemu_co_mutex_unlock(&s->lock);
> > +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
> 
> This works, but would it be any more efficient to use
> qcow2_co_pwrite_zeroes?  If the head of the cluster is already zero, then
> qcow2_co_pwrite_zeroes can turn into qcow2_cluster_zeroize for this cluster,
> while qcow2_co_pwritev_part cannot.

The problem is that qcow2_co_pwrite_zeroes() will return -ENOTSUP if the
request is still unaligned after this optimisation. I would have to go
through the generic bdrv_co_pwrite_zeroes() to get the fallback, but I
can't do that because bs->total_sectors isn't updated yet.

Kevin
Vladimir Sementsov-Ogievskiy April 24, 2020, 6:16 a.m. UTC | #3
23.04.2020 18:01, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c |  2 +-
>   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 17f1363279..4b5fc8c4a7 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>       /* Caller must pass aligned values, except at image end */
>       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>   
>       /* The zero flag is only supported by version 3 and newer */
>       if (s->qcow_version < 3) {
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..ad621fe404 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>   
>       bs->supported_zero_flags = header.version >= 3 ?
>                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>   
>       /* Repair image if dirty */
>       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
> @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>           g_assert_not_reached();
>       }
>   
> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> +
> +        /*
> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> +         * requires a cluster-aligned start. The end may be unaligned if it is
> +         * at the end of the image (which it is here).
> +         */
> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> +            goto fail;
> +        }
> +
> +        /* Write explicit zeros for the unaligned head */
> +        if (zero_start > old_length) {
> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);

Why not s/s->cluster_size/zero_start - old_length/? We may save a lot of pages if cluster_size is large.

> +            QEMUIOVector qiov;
> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> +
> +            qemu_co_mutex_unlock(&s->lock);

We are in intermediate state here. Is it safe to unlock? Anything may happen, up to another truncate..

> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);

Better not do it if this cluster is already ZERO.. But it may be a later patch to improve that.

> +            qemu_co_mutex_lock(&s->lock);
> +
> +            qemu_vfree(buf);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> +                goto fail;
> +            }
> +        }
> +    }
> +
>       if (prealloc != PREALLOC_MODE_OFF) {
>           /* Flush metadata before actually changing the image size */
>           ret = qcow2_write_caches(bs);
>
Kevin Wolf April 24, 2020, 12:17 p.m. UTC | #4
Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.04.2020 18:01, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 17f1363279..4b5fc8c4a7 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> >       /* Caller must pass aligned values, except at image end */
> >       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> >       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> > -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> > +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
> >       /* The zero flag is only supported by version 3 and newer */
> >       if (s->qcow_version < 3) {
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..ad621fe404 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
> >       bs->supported_zero_flags = header.version >= 3 ?
> >                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
> > +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> >       /* Repair image if dirty */
> >       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
> > @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           g_assert_not_reached();
> >       }
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +
> > +        /*
> > +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> > +         * requires a cluster-aligned start. The end may be unaligned if it is
> > +         * at the end of the image (which it is here).
> > +         */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
> > +        if (ret < 0) {
> > +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
> > +            goto fail;
> > +        }
> > +
> > +        /* Write explicit zeros for the unaligned head */
> > +        if (zero_start > old_length) {
> > +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> 
> Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
> of pages if cluster_size is large.

Ok.

> > +            QEMUIOVector qiov;
> > +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> > +
> > +            qemu_co_mutex_unlock(&s->lock);
> 
> We are in intermediate state here. Is it safe to unlock? Anything may
> happen, up to another truncate..

I don't think it's worse than unlocking during normal writes, which we
have been doing for a long time. I don't see anything that would corrupt
any internal state.

Of course, doing truncate in parallel with other operations around EOF
has somewhat undefined semantics because the order could be anything.
But if you don't want to get undefined results, you just shouldn't make
such requests. It's similar to reading and writing the same area at the
same time.

> > +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
> 
> Better not do it if this cluster is already ZERO.. But it may be a
> later patch to improve that.

I doubt it's worth writing code to optimise a corner case inside a
corner case.

> > +            qemu_co_mutex_lock(&s->lock);
> > +
> > +            qemu_vfree(buf);
> > +            if (ret < 0) {
> > +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
> > +                goto fail;
> > +            }
> > +        }
> > +    }
> > +
> >       if (prealloc != PREALLOC_MODE_OFF) {
> >           /* Flush metadata before actually changing the image size */
> >           ret = qcow2_write_caches(bs);

Kevin
Eric Blake April 24, 2020, 2:16 p.m. UTC | #5
On 4/24/20 7:17 AM, Kevin Wolf wrote:

> 
>>> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
>>
>> Better not do it if this cluster is already ZERO.. But it may be a
>> later patch to improve that.
> 
> I doubt it's worth writing code to optimise a corner case inside a
> corner case.

I spotted the same issue, and my suggestion was to use 
qcow2_co_pwrite_zero instead of qcow2_co_pwritev_part, but Kevin pointed 
out that my idea would probably not work the way I thought.  Then again, 
checking if the page is already zero is all the more benefit that 
qcow2_co_pwrite_zero would have given for me to have raised the 
question.  The following example illustrates why it might be worthwhile:

Suppose we have an image with 1M clusters, which is an unaligned 100.5M 
in length but started life with all clusters sparse.  We then resize it 
to another unaligned 200.5M.  The call to qcow2_co_pwritev_part will 
cause the image to be non-sparse for one cluster out of 201, whereas 
adding a check to skip the pwritev because the original unaligned tail 
cluster already read as zero will let us keep all clusters sparse.

At any rate, I argued that any such optimization would be a followup 
patch, and Kevin is right that it is a corner case optimization unlikely 
to affect many real-life images.
Vladimir Sementsov-Ogievskiy April 24, 2020, 2:27 p.m. UTC | #6
24.04.2020 15:17, Kevin Wolf wrote:
> Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.04.2020 18:01, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    block/qcow2-cluster.c |  2 +-
>>>    block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
>>>    2 files changed, 34 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 17f1363279..4b5fc8c4a7 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>>>        /* Caller must pass aligned values, except at image end */
>>>        assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>>>        assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>>> -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
>>> +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>>>        /* The zero flag is only supported by version 3 and newer */
>>>        if (s->qcow_version < 3) {
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 9cfbdfc939..ad621fe404 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>>>        bs->supported_zero_flags = header.version >= 3 ?
>>>                                   BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
>>> +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>>>        /* Repair image if dirty */
>>>        if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
>>> @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>>            g_assert_not_reached();
>>>        }
>>> +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> +
>>> +        /*
>>> +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
>>> +         * requires a cluster-aligned start. The end may be unaligned if it is
>>> +         * at the end of the image (which it is here).
>>> +         */
>>> +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
>>> +        if (ret < 0) {
>>> +            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
>>> +            goto fail;
>>> +        }
>>> +
>>> +        /* Write explicit zeros for the unaligned head */
>>> +        if (zero_start > old_length) {
>>> +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
>>
>> Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
>> of pages if cluster_size is large.
> 
> Ok.
> 
>>> +            QEMUIOVector qiov;
>>> +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
>>> +
>>> +            qemu_co_mutex_unlock(&s->lock);
>>
>> We are in intermediate state here. Is it safe to unlock? Anything may
>> happen, up to another truncate..
> 
> I don't think it's worse than unlocking during normal writes, which we
> have been doing for a long time. I don't see anything that would corrupt
> any internal state.
> 
> Of course, doing truncate in parallel with other operations around EOF
> has somewhat undefined semantics because the order could be anything.
> But if you don't want to get undefined results, you just shouldn't make
> such requests. It's similar to reading and writing the same area at the
> same time.

If there would be two truncate operations in parallel, we may finish up with s->l1_vm_state_index bs->total_sectors not corresponding to other metadata. Not sure how much is it bad..

> 
>>> +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
>>
>> Better not do it if this cluster is already ZERO.. But it may be a
>> later patch to improve that.
> 
> I doubt it's worth writing code to optimise a corner case inside a
> corner case.
> 
>>> +            qemu_co_mutex_lock(&s->lock);
>>> +
>>> +            qemu_vfree(buf);
>>> +            if (ret < 0) {
>>> +                error_setg_errno(errp, -ret, "Failed to zero out the new area");
>>> +                goto fail;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>        if (prealloc != PREALLOC_MODE_OFF) {
>>>            /* Flush metadata before actually changing the image size */
>>>            ret = qcow2_write_caches(bs);
> 
> Kevin
>
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..4b5fc8c4a7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1795,7 +1795,7 @@  int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
-           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..ad621fe404 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@  static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
 
     bs->supported_zero_flags = header.version >= 3 ?
                                BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
     /* Repair image if dirty */
     if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,38 @@  static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         g_assert_not_reached();
     }
 
+    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+
+        /*
+         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
+         * requires a cluster-aligned start. The end may be unaligned if it is
+         * at the end of the image (which it is here).
+         */
+        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to zero out new clusters");
+            goto fail;
+        }
+
+        /* Write explicit zeros for the unaligned head */
+        if (zero_start > old_length) {
+            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
+            QEMUIOVector qiov;
+            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
+
+            qemu_co_mutex_unlock(&s->lock);
+            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
+            qemu_co_mutex_lock(&s->lock);
+
+            qemu_vfree(buf);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to zero out the new area");
+                goto fail;
+            }
+        }
+    }
+
     if (prealloc != PREALLOC_MODE_OFF) {
         /* Flush metadata before actually changing the image size */
         ret = qcow2_write_caches(bs);