diff mbox series

[for-5.1,1/2] qcow2: Implement v2 zero writes with discard if possible

Message ID 20200720131810.177978-2-kwolf@redhat.com
State New, archived
Headers show
Series qemu-img convert -n: Keep qcow2 v2 target sparse | expand

Commit Message

Kevin Wolf July 20, 2020, 1:18 p.m. UTC
qcow2 version 2 images don't support the zero flag for clusters, so for
write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
writes. If the image doesn't have a backing file, we can do better: Just
discard the respective clusters.

This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
to assume that the existing target image may contain any data, so it has
to write zeroes. Without this patch, this results in a fully allocated
target image, even if the source image was empty.

Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Nir Soffer July 20, 2020, 2:50 p.m. UTC | #1
On Mon, Jul 20, 2020 at 4:18 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> qcow2 version 2 images don't support the zero flag for clusters, so for
> write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
> writes. If the image doesn't have a backing file, we can do better: Just
> discard the respective clusters.
>
> This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
> to assume that the existing target image may contain any data, so it has
> to write zeroes. Without this patch, this results in a fully allocated
> target image, even if the source image was empty.
>
> Reported-by: Nir Soffer <nsoffer@redhat.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 4b5fc8c4a7..a677ba9f5c 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1797,8 +1797,15 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>             end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>
> -    /* The zero flag is only supported by version 3 and newer */
> +    /*
> +     * The zero flag is only supported by version 3 and newer. However, if we
> +     * have no backing file, we can resort to discard in version 2.
> +     */
>      if (s->qcow_version < 3) {
> +        if (!bs->backing) {
> +            return qcow2_cluster_discard(bs, offset, bytes,
> +                                         QCOW2_DISCARD_REQUEST, false);
> +        }
>          return -ENOTSUP;
>      }

Looks good to me.

>
> --
> 2.25.4
>
Max Reitz July 21, 2020, 10:07 a.m. UTC | #2
On 20.07.20 15:18, Kevin Wolf wrote:
> qcow2 version 2 images don't support the zero flag for clusters, so for
> write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
> writes. If the image doesn't have a backing file, we can do better: Just
> discard the respective clusters.
> 
> This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
> to assume that the existing target image may contain any data, so it has
> to write zeroes. Without this patch, this results in a fully allocated
> target image, even if the source image was empty.
> 
> Reported-by: Nir Soffer <nsoffer@redhat.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>
Maxim Levitsky July 22, 2020, 5:01 p.m. UTC | #3
On Mon, 2020-07-20 at 15:18 +0200, Kevin Wolf wrote:
> qcow2 version 2 images don't support the zero flag for clusters, so for
> write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
> writes. If the image doesn't have a backing file, we can do better: Just
> discard the respective clusters.
> 
> This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
> to assume that the existing target image may contain any data, so it has
> to write zeroes. Without this patch, this results in a fully allocated
> target image, even if the source image was empty.
> 
> Reported-by: Nir Soffer <nsoffer@redhat.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 4b5fc8c4a7..a677ba9f5c 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1797,8 +1797,15 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>             end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
>  
> -    /* The zero flag is only supported by version 3 and newer */
> +    /*
> +     * The zero flag is only supported by version 3 and newer. However, if we
> +     * have no backing file, we can resort to discard in version 2.
> +     */
>      if (s->qcow_version < 3) {
> +        if (!bs->backing) {
> +            return qcow2_cluster_discard(bs, offset, bytes,
> +                                         QCOW2_DISCARD_REQUEST, false);
> +        }
>          return -ENOTSUP;
>      }
>  

From my knowelege of nvme, I remember that discard doesn't have to zero the blocks.
There is special namespace capability the indicates the contents of the discarded block.
(Deallocate Logical Block Features)

If and only if the discard behavier flag indicates that discarded areas are zero,
then the write-zero command can have special 'deallocate' flag that hints the controller
to discard the sectors.

So woudn't discarding the clusters have theoretical risk of introducing garbage there?

Best regards,
	Maxim Levitsky
Kevin Wolf July 22, 2020, 5:14 p.m. UTC | #4
Am 22.07.2020 um 19:01 hat Maxim Levitsky geschrieben:
> On Mon, 2020-07-20 at 15:18 +0200, Kevin Wolf wrote:
> > qcow2 version 2 images don't support the zero flag for clusters, so for
> > write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
> > writes. If the image doesn't have a backing file, we can do better: Just
> > discard the respective clusters.
> > 
> > This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
> > to assume that the existing target image may contain any data, so it has
> > to write zeroes. Without this patch, this results in a fully allocated
> > target image, even if the source image was empty.
> > 
> > Reported-by: Nir Soffer <nsoffer@redhat.com>
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/qcow2-cluster.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 4b5fc8c4a7..a677ba9f5c 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -1797,8 +1797,15 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> >      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> >             end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
> >  
> > -    /* The zero flag is only supported by version 3 and newer */
> > +    /*
> > +     * The zero flag is only supported by version 3 and newer. However, if we
> > +     * have no backing file, we can resort to discard in version 2.
> > +     */
> >      if (s->qcow_version < 3) {
> > +        if (!bs->backing) {
> > +            return qcow2_cluster_discard(bs, offset, bytes,
> > +                                         QCOW2_DISCARD_REQUEST, false);
> > +        }
> >          return -ENOTSUP;
> >      }
> >  
> 
> From my knowelege of nvme, I remember that discard doesn't have to zero the blocks.
> There is special namespace capability the indicates the contents of the discarded block.
> (Deallocate Logical Block Features)
> 
> If and only if the discard behavier flag indicates that discarded areas are zero,
> then the write-zero command can have special 'deallocate' flag that hints the controller
> to discard the sectors.
> 
> So woudn't discarding the clusters have theoretical risk of introducing garbage there?

No, qcow2_cluster_discard() has a defined behaviour. For v2 images, it
unallocates the cluster in the L2 table (this is only safe without a
backing file), for v3 images it converts them to zero clusters.

Kevin
Maxim Levitsky July 22, 2020, 5:15 p.m. UTC | #5
On Wed, 2020-07-22 at 19:14 +0200, Kevin Wolf wrote:
> Am 22.07.2020 um 19:01 hat Maxim Levitsky geschrieben:
> > On Mon, 2020-07-20 at 15:18 +0200, Kevin Wolf wrote:
> > > qcow2 version 2 images don't support the zero flag for clusters, so for
> > > write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
> > > writes. If the image doesn't have a backing file, we can do better: Just
> > > discard the respective clusters.
> > > 
> > > This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
> > > to assume that the existing target image may contain any data, so it has
> > > to write zeroes. Without this patch, this results in a fully allocated
> > > target image, even if the source image was empty.
> > > 
> > > Reported-by: Nir Soffer <nsoffer@redhat.com>
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  block/qcow2-cluster.c | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > > index 4b5fc8c4a7..a677ba9f5c 100644
> > > --- a/block/qcow2-cluster.c
> > > +++ b/block/qcow2-cluster.c
> > > @@ -1797,8 +1797,15 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> > >      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> > >             end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
> > >  
> > > -    /* The zero flag is only supported by version 3 and newer */
> > > +    /*
> > > +     * The zero flag is only supported by version 3 and newer. However, if we
> > > +     * have no backing file, we can resort to discard in version 2.
> > > +     */
> > >      if (s->qcow_version < 3) {
> > > +        if (!bs->backing) {
> > > +            return qcow2_cluster_discard(bs, offset, bytes,
> > > +                                         QCOW2_DISCARD_REQUEST, false);
> > > +        }
> > >          return -ENOTSUP;
> > >      }
> > >  
> > 
> > From my knowelege of nvme, I remember that discard doesn't have to zero the blocks.
> > There is special namespace capability the indicates the contents of the discarded block.
> > (Deallocate Logical Block Features)
> > 
> > If and only if the discard behavier flag indicates that discarded areas are zero,
> > then the write-zero command can have special 'deallocate' flag that hints the controller
> > to discard the sectors.
> > 
> > So woudn't discarding the clusters have theoretical risk of introducing garbage there?
> 
> No, qcow2_cluster_discard() has a defined behaviour. For v2 images, it
> unallocates the cluster in the L2 table (this is only safe without a
> backing file), for v3 images it converts them to zero clusters.

All right then!

Best regards,
	Maxim Levitsky
> 
> Kevin
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4b5fc8c4a7..a677ba9f5c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1797,8 +1797,15 @@  int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
            end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
 
-    /* The zero flag is only supported by version 3 and newer */
+    /*
+     * The zero flag is only supported by version 3 and newer. However, if we
+     * have no backing file, we can resort to discard in version 2.
+     */
     if (s->qcow_version < 3) {
+        if (!bs->backing) {
+            return qcow2_cluster_discard(bs, offset, bytes,
+                                         QCOW2_DISCARD_REQUEST, false);
+        }
         return -ENOTSUP;
     }