diff mbox

[v10,02/17] qcow2: Correctly report status of preallocated zero clusters

Message ID 20170427014626.11553-3-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake April 27, 2017, 1:46 a.m. UTC
We were throwing away the preallocation information associated with
zero clusters.  But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json.  The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch
---
 block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

Comments

Max Reitz April 28, 2017, 5:35 p.m. UTC | #1
On 27.04.2017 03:46, Eric Blake wrote:
> We were throwing away the preallocation information associated with
> zero clusters.  But we should be matching the well-defined semantics
> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
> while still reminding the user that reading from that offset is
> likely to read garbage.
> 
> Making this change lets us see which portions of an image are zero
> but preallocated, when using qemu-img map --output=json.  The
> --output=human side intentionally ignores all zero clusters, whether
> or not they are preallocated.
> 
> The fact that there is no change to qemu-iotests './check -qcow2'
> merely means that we aren't yet testing this aspect of qemu-img;
> a later patch will add a test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch
> ---
>  block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 5 deletions(-)

I'd propose you split the qcow2 changes off of this series.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 100398c..d1063df 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -328,6 +328,10 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
>  	return i;
>  }
> 
> +/*
> + * Checks how many consecutive clusters in a given L2 table have the same
> + * cluster type with no corresponding allocation.
> + */
>  static int count_contiguous_clusters_by_type(int nb_clusters,
>                                               uint64_t *l2_table,
>                                               int wanted_type)
> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int nb_clusters,
>      int i;
> 
>      for (i = 0; i < nb_clusters; i++) {
> -        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
> +        uint64_t entry = be64_to_cpu(l2_table[i]);
> +        int type = qcow2_get_cluster_type(entry);
> 
> -        if (type != wanted_type) {
> +        if (type != wanted_type || entry & L2E_OFFSET_MASK) {

This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
what the comment you added describes, but this may still warrant a
renaming of this function (count_contiguous_unallocated_clusters?) and
probably an assertion that wanted_type is ZERO or UNALLOCATED.

>              break;
>          }
>      }
> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>              ret = -EIO;
>              goto fail;
>          }
> -        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
> -                                              QCOW2_CLUSTER_ZERO);
> -        *cluster_offset = 0;
> +        /* Distinguish between pure zero clusters and pre-allocated ones */
> +        if (*cluster_offset & L2E_OFFSET_MASK) {
> +            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> +                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);

You should probably switch this patch and the next one -- or I just send
my patches myself and you base your series on top of it...

Because before the next patch, this happens:

$ ./qemu-img create -f qcow2 foo.qcow2 64M
Formatting 'foo.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-io -c 'write 0 64M' -c 'write -z 0 64M' foo.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.1846 sec (346.590 MiB/sec and 5.4155 ops/sec)
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.0004 sec (127.551 GiB/sec and 2040.8163 ops/sec)
$ ./qemu-img map foo.qcow2
Offset          Length          Mapped to       File
qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
failed.
[1]    4970 abort (core dumped)  ./qemu-img map foo.qcow2

Max

> +            *cluster_offset &= L2E_OFFSET_MASK;
> +            if (offset_into_cluster(s, *cluster_offset)) {
> +                qcow2_signal_corruption(bs, true, -1, -1,
> +                                        "Preallocated zero cluster offset %#"
> +                                        PRIx64 " unaligned (L2 offset: %#"
> +                                        PRIx64 ", L2 index: %#x)",
> +                                        *cluster_offset, l2_offset, l2_index);
> +                ret = -EIO;
> +                goto fail;
> +            }
> +        } else {
> +            c = count_contiguous_clusters_by_type(nb_clusters,
> +                                                  &l2_table[l2_index],
> +                                                  QCOW2_CLUSTER_ZERO);
> +            *cluster_offset = 0;
> +        }
>          break;
>      case QCOW2_CLUSTER_UNALLOCATED:
>          /* how many empty clusters ? */
>
Eric Blake April 28, 2017, 7:04 p.m. UTC | #2
On 04/28/2017 12:35 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We were throwing away the preallocation information associated with
>> zero clusters.  But we should be matching the well-defined semantics
>> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
>> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
>> while still reminding the user that reading from that offset is
>> likely to read garbage.
>>
>> Making this change lets us see which portions of an image are zero
>> but preallocated, when using qemu-img map --output=json.  The
>> --output=human side intentionally ignores all zero clusters, whether
>> or not they are preallocated.
>>
>> The fact that there is no change to qemu-iotests './check -qcow2'
>> merely means that we aren't yet testing this aspect of qemu-img;
>> a later patch will add a test.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> v10: new patch
>> ---
>>  block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
>>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> I'd propose you split the qcow2 changes off of this series.

Yeah, it's sort of morphed into a well-tested later part and an earlier
part that is still undergoing heavy review.  I don't know how much churn
there would be to rebase things to do the blkdebug changes first (it may
invalidate portions of my test 177; but I could always tag such spots as
FIXME while waiting on the qcow2 part to settle and land).  I'll see
what I can do, as I've now got multiple series queued up, and should at
least get SOMETHING to land to start clearing up the logjam.


>> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int nb_clusters,
>>      int i;
>>
>>      for (i = 0; i < nb_clusters; i++) {
>> -        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
>> +        uint64_t entry = be64_to_cpu(l2_table[i]);
>> +        int type = qcow2_get_cluster_type(entry);
>>
>> -        if (type != wanted_type) {
>> +        if (type != wanted_type || entry & L2E_OFFSET_MASK) {
> 
> This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
> what the comment you added describes, but this may still warrant a
> renaming of this function (count_contiguous_unallocated_clusters?) and
> probably an assertion that wanted_type is ZERO or UNALLOCATED.

Good idea.


>> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>              ret = -EIO;
>>              goto fail;
>>          }
>> -        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
>> -                                              QCOW2_CLUSTER_ZERO);
>> -        *cluster_offset = 0;
>> +        /* Distinguish between pure zero clusters and pre-allocated ones */
>> +        if (*cluster_offset & L2E_OFFSET_MASK) {
>> +            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>> +                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
> 
> You should probably switch this patch and the next one -- or I just send
> my patches myself and you base your series on top of it...

I tested with your patches in, then rebased with your patches out to see
what broke, and...

> 
> Because before the next patch, this happens:
> 

> $ ./qemu-img map foo.qcow2
> Offset          Length          Mapped to       File
> qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
> Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
> failed.

got that same assert. So I realized that your patch was necessary and
rebased it back in, but obviously forgot to re-test at all points in the
series. Yes, your should go first, and I'd be more than happy if you
post yours formally and I rebase my qcow2 portions on top of yours (all
the more reason for me to split this series into two, and get the
blkdebug portions in now while we still hammer on the qcow2 stuff).
diff mbox

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 100398c..d1063df 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -328,6 +328,10 @@  static int count_contiguous_clusters(int nb_clusters, int cluster_size,
 	return i;
 }

+/*
+ * Checks how many consecutive clusters in a given L2 table have the same
+ * cluster type with no corresponding allocation.
+ */
 static int count_contiguous_clusters_by_type(int nb_clusters,
                                              uint64_t *l2_table,
                                              int wanted_type)
@@ -335,9 +339,10 @@  static int count_contiguous_clusters_by_type(int nb_clusters,
     int i;

     for (i = 0; i < nb_clusters; i++) {
-        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
+        uint64_t entry = be64_to_cpu(l2_table[i]);
+        int type = qcow2_get_cluster_type(entry);

-        if (type != wanted_type) {
+        if (type != wanted_type || entry & L2E_OFFSET_MASK) {
             break;
         }
     }
@@ -559,9 +564,26 @@  int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
             ret = -EIO;
             goto fail;
         }
-        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
-                                              QCOW2_CLUSTER_ZERO);
-        *cluster_offset = 0;
+        /* Distinguish between pure zero clusters and pre-allocated ones */
+        if (*cluster_offset & L2E_OFFSET_MASK) {
+            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
+                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
+            *cluster_offset &= L2E_OFFSET_MASK;
+            if (offset_into_cluster(s, *cluster_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1,
+                                        "Preallocated zero cluster offset %#"
+                                        PRIx64 " unaligned (L2 offset: %#"
+                                        PRIx64 ", L2 index: %#x)",
+                                        *cluster_offset, l2_offset, l2_index);
+                ret = -EIO;
+                goto fail;
+            }
+        } else {
+            c = count_contiguous_clusters_by_type(nb_clusters,
+                                                  &l2_table[l2_index],
+                                                  QCOW2_CLUSTER_ZERO);
+            *cluster_offset = 0;
+        }
         break;
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */