diff mbox

[v4,2/7] qcow2: Discard/zero clusters by byte count

Message ID 20161220191523.5779-3-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake Dec. 20, 2016, 7:15 p.m. UTC
Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the interfaces to take both offset and count as bytes, while
still keeping the assertion added previously that the caller
must align the values to a cluster.  Then rename things to
make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard and qcow2_cluster_zeroize().

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h          |  9 +++++----
 block/qcow2-cluster.c  | 17 ++++++++---------
 block/qcow2-snapshot.c |  7 +++----
 block/qcow2.c          | 21 +++++++++------------
 4 files changed, 25 insertions(+), 29 deletions(-)

Comments

Max Reitz Jan. 28, 2017, 8:21 p.m. UTC | #1
On 20.12.2016 20:15, Eric Blake wrote:
> Passing a byte offset, but sector count, when we ultimately
> want to operate on cluster granularity, is madness.  Clean up
> the interfaces to take both offset and count as bytes, while
> still keeping the assertion added previously that the caller
> must align the values to a cluster.  Then rename things to
> make sure backports don't get confused by changed units:
> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
> we now have qcow2_cluster_discard and qcow2_cluster_zeroize().
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v4: improve function names, split assertion additions into earlier patch
> [no v3 or v2]
> v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
> ---
>  block/qcow2.h          |  9 +++++----
>  block/qcow2-cluster.c  | 17 ++++++++---------
>  block/qcow2-snapshot.c |  7 +++----
>  block/qcow2.c          | 21 +++++++++------------
>  4 files changed, 25 insertions(+), 29 deletions(-)

Nothing functionally bad, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>


Some notes, though:

> diff --git a/block/qcow2.h b/block/qcow2.h
> index 1823414..a131b72 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -543,10 +543,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>                                           int compressed_size);
> 
>  int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
> -int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
> -                        int flags);
> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t count, enum qcow2_discard_type type,
> +                          bool full_discard);
> +int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t count, int flags);

I know byte count parameters are often called "count", but I find it a
bit problematic if the function name contains a word that can be a unit.
In these cases, it's not entirely clear whether "count" refers to bytes
or clusters. Maybe "length" or "size" would be better?

Purely subjective and thus optional, of course.

Another thing: I'm not quite positive whether qcow2_cluster_zeroize()
can actually handle a byte count greater than INT_MAX. If you could pass
such a number to it, the multiplication "ret * s->cluster_size" might
overflow. It's only caller qcow2_co_pwrite_zeroes() will never exceed
that limit, but maybe this should be made clear somewhere -- either by
making the parameter an int instead of a uint64_t or by asserting it in
the function.

> 
>  int qcow2_expand_zero_clusters(BlockDriverState *bs,
>                                 BlockDriverAmendStatusCB *status_cb,
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 3304a15..aad5183 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1511,16 +1511,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>      return nb_clusters;
>  }
> 
> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t count, enum qcow2_discard_type type,
> +                          bool full_discard)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    uint64_t end_offset;
> +    uint64_t end_offset = offset + count;
>      uint64_t nb_clusters;
>      int ret;
> 
> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> -
>      /* Caller must pass aligned values */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));

Directly below this we have "offset - end_offset" which could be
shortened to "count" (or "length" or "size" or...).

Max
Eric Blake Jan. 30, 2017, 4:52 p.m. UTC | #2
On 01/28/2017 02:21 PM, Max Reitz wrote:
> On 20.12.2016 20:15, Eric Blake wrote:
>> Passing a byte offset, but sector count, when we ultimately
>> want to operate on cluster granularity, is madness.  Clean up
>> the interfaces to take both offset and count as bytes, while
>> still keeping the assertion added previously that the caller
>> must align the values to a cluster.  Then rename things to
>> make sure backports don't get confused by changed units:
>> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
>> we now have qcow2_cluster_discard and qcow2_cluster_zeroize().
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> +int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>> +                          uint64_t count, int flags);
> 
> I know byte count parameters are often called "count", but I find it a
> bit problematic if the function name contains a word that can be a unit.
> In these cases, it's not entirely clear whether "count" refers to bytes
> or clusters. Maybe "length" or "size" would be better?

Now that you mention it, 'cluster' and 'count' is indeed confusing.  I'm
leaning towards 'size'.  Do I need to respin, or is that something that
the maintainer is comfortable tweaking?

> 
> Purely subjective and thus optional, of course.
> 
> Another thing: I'm not quite positive whether qcow2_cluster_zeroize()
> can actually handle a byte count greater than INT_MAX. If you could pass
> such a number to it, the multiplication "ret * s->cluster_size" might
> overflow. It's only caller qcow2_co_pwrite_zeroes() will never exceed
> that limit, but maybe this should be made clear somewhere -- either by
> making the parameter an int instead of a uint64_t or by asserting it in
> the function.

I'd lean towards an assertion, especially since it would be nice to
someday unify all the internal block interfaces to get to a 64-bit
interface wherever possible (limiting ourselves to 32-bit is necessary
for some drivers, like NBD, but that limitation should be enforced via
proper BlockLimits, rather than by inherent limits in our API).  An
assertion with 64-bit types is better than yet one more place to audit
for missing assertions if we use a 32-bit type now and then widen to
64-bit later.


>> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
>> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
>> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
>> +                          uint64_t count, enum qcow2_discard_type type,
>> +                          bool full_discard)
>>  {
>>      BDRVQcow2State *s = bs->opaque;
>> -    uint64_t end_offset;
>> +    uint64_t end_offset = offset + count;
>>      uint64_t nb_clusters;
>>      int ret;
>>
>> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
>> -
>>      /* Caller must pass aligned values */
>>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));
> 
> Directly below this we have "offset - end_offset" which could be
> shortened to "count" (or "length" or "size" or...).

Okay, there's now enough comments that it's worth me respinning a v5.
Max Reitz Feb. 1, 2017, 1:54 a.m. UTC | #3
On 30.01.2017 17:52, Eric Blake wrote:
> On 01/28/2017 02:21 PM, Max Reitz wrote:
>> On 20.12.2016 20:15, Eric Blake wrote:
>>> Passing a byte offset, but sector count, when we ultimately
>>> want to operate on cluster granularity, is madness.  Clean up
>>> the interfaces to take both offset and count as bytes, while
>>> still keeping the assertion added previously that the caller
>>> must align the values to a cluster.  Then rename things to
>>> make sure backports don't get confused by changed units:
>>> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
>>> we now have qcow2_cluster_discard and qcow2_cluster_zeroize().
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
> 
>>> +int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
>>> +                          uint64_t count, int flags);
>>
>> I know byte count parameters are often called "count", but I find it a
>> bit problematic if the function name contains a word that can be a unit.
>> In these cases, it's not entirely clear whether "count" refers to bytes
>> or clusters. Maybe "length" or "size" would be better?
> 
> Now that you mention it, 'cluster' and 'count' is indeed confusing.  I'm
> leaning towards 'size'.  Do I need to respin, or is that something that
> the maintainer is comfortable tweaking?

There's probably not too much that could go wrong, but I wouldn't be
exactly comfortable with it.

>> Purely subjective and thus optional, of course.
>>
>> Another thing: I'm not quite positive whether qcow2_cluster_zeroize()
>> can actually handle a byte count greater than INT_MAX. If you could pass
>> such a number to it, the multiplication "ret * s->cluster_size" might
>> overflow. It's only caller qcow2_co_pwrite_zeroes() will never exceed
>> that limit, but maybe this should be made clear somewhere -- either by
>> making the parameter an int instead of a uint64_t or by asserting it in
>> the function.
> 
> I'd lean towards an assertion, especially since it would be nice to
> someday unify all the internal block interfaces to get to a 64-bit
> interface wherever possible

Kevin once said "We should write code that makes sense today, not code
that will make sense some day in the future." Or something like that. :-)

>                             (limiting ourselves to 32-bit is necessary
> for some drivers, like NBD, but that limitation should be enforced via
> proper BlockLimits, rather than by inherent limits in our API).  An
> assertion with 64-bit types is better than yet one more place to audit
> for missing assertions if we use a 32-bit type now and then widen to
> 64-bit later.

Depends on the viewpoint. You're right, it does prevent us from
accidentally implicitly narrowing a 64 bit size, but on the other hand,
it won't be clear just from looking at the function signature that this
function actually only supports a 32 bit parameter.

Not sure what to do here. Maybe switch to a language that would warn us
before implicitly narrowing values. */me ducks*

As long as the assertion is visible enough (i.e. immediately after the
variable declaration block), it should be fine.

Max

>>> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
>>> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
>>> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
>>> +                          uint64_t count, enum qcow2_discard_type type,
>>> +                          bool full_discard)
>>>  {
>>>      BDRVQcow2State *s = bs->opaque;
>>> -    uint64_t end_offset;
>>> +    uint64_t end_offset = offset + count;
>>>      uint64_t nb_clusters;
>>>      int ret;
>>>
>>> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
>>> -
>>>      /* Caller must pass aligned values */
>>>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>>>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));
>>
>> Directly below this we have "offset - end_offset" which could be
>> shortened to "count" (or "length" or "size" or...).
> 
> Okay, there's now enough comments that it's worth me respinning a v5.
Eric Blake Feb. 9, 2017, 9:05 p.m. UTC | #4
On 01/31/2017 07:54 PM, Max Reitz wrote:

>>> Another thing: I'm not quite positive whether qcow2_cluster_zeroize()
>>> can actually handle a byte count greater than INT_MAX. If you could pass
>>> such a number to it, the multiplication "ret * s->cluster_size" might
>>> overflow. It's only caller qcow2_co_pwrite_zeroes() will never exceed
>>> that limit, but maybe this should be made clear somewhere -- either by
>>> making the parameter an int instead of a uint64_t or by asserting it in
>>> the function.
>>
>> I'd lean towards an assertion, especially since it would be nice to
>> someday unify all the internal block interfaces to get to a 64-bit
>> interface wherever possible
> 
> Kevin once said "We should write code that makes sense today, not code
> that will make sense some day in the future." Or something like that. :-)

Hmm. I looked a bit further.  The calculation of "ret * s->cluster_size"
is bounded by the maximum value of the earlier "ret = zero_single_l2()",
which is limited to the number of clusters in a single L2 table.  But
when you have an image with 2M clusters, a single L2 page thus covers
2M/8 (or 262144) clusters, but that is in turn much larger than INT_MAX.
 My v5 will fix things to track the int return value separately from the
64-bit crawl through zero_single_l2() (in other words, quit overloading
'ret' to serve two purposes).
diff mbox

Patch

diff --git a/block/qcow2.h b/block/qcow2.h
index 1823414..a131b72 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -543,10 +543,11 @@  uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t count, enum qcow2_discard_type type,
+                          bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t count, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
                                BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3304a15..aad5183 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1511,16 +1511,15 @@  static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t count, enum qcow2_discard_type type,
+                          bool full_discard)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + count;
     uint64_t nb_clusters;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size));
@@ -1591,8 +1590,8 @@  static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t count, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t nb_clusters;
@@ -1600,7 +1599,7 @@  int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,

     /* Caller must pass aligned values */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-    assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_size >> BDRV_SECTOR_BITS));
+    assert(QEMU_IS_ALIGNED(count, s->cluster_size));

     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
@@ -1608,7 +1607,7 @@  int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     }

     /* Each L2 table is handled by its own loop iteration */
-    nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+    nb_clusters = size_to_clusters(s, count);

     s->cache_discards = true;

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@  int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)

     /* The VM state isn't needed any more in the active L1 table; in fact, it
      * hurts by causing expensive COW for the next snapshot. */
-    qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
-                           align_offset(sn->vm_state_size, s->cluster_size)
-                                >> BDRV_SECTOR_BITS,
-                           QCOW2_DISCARD_NEVER, false);
+    qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+                          align_offset(sn->vm_state_size, s->cluster_size),
+                          QCOW2_DISCARD_NEVER, false);

 #ifdef DEBUG_ALLOC
     {
diff --git a/block/qcow2.c b/block/qcow2.c
index 96fb8a8..324e474 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2487,7 +2487,7 @@  static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+    ret = qcow2_cluster_zeroize(bs, offset, count, flags);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -2505,8 +2505,8 @@  static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     }

     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
-                                 QCOW2_DISCARD_REQUEST, false);
+    ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+                                false);
     qemu_co_mutex_unlock(&s->lock);
     return ret;
 }
@@ -2807,9 +2807,8 @@  fail:
 static int qcow2_make_empty(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t start_sector;
-    int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
-                       BDRV_SECTOR_SIZE);
+    uint64_t offset, end_offset;
+    int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
     int l1_clusters, ret = 0;

     l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2826,18 +2825,16 @@  static int qcow2_make_empty(BlockDriverState *bs)

     /* This fallback code simply discards every active cluster; this is slow,
      * but works in all cases */
-    for (start_sector = 0; start_sector < bs->total_sectors;
-         start_sector += sector_step)
+    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+    for (offset = 0; offset < end_offset; offset += step)
     {
         /* As this function is generally used after committing an external
          * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
          * default action for this kind of discard is to pass the discard,
          * which will ideally result in an actually smaller image file, as
          * is probably desired. */
-        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
-                                     MIN(sector_step,
-                                         bs->total_sectors - start_sector),
-                                     QCOW2_DISCARD_SNAPSHOT, true);
+        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+                                    QCOW2_DISCARD_SNAPSHOT, true);
         if (ret < 0) {
             break;
         }