[v2,for-2.10,13/16] block/qcow2: qcow2_calc_size_usage() for truncate
diff mbox

Message ID 20170403160936.28293-14-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz April 3, 2017, 4:09 p.m. UTC
This patch extends qcow2_calc_size_usage() so it can calculate the
additional space needed for preallocating image growth.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 137 +++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 98 insertions(+), 39 deletions(-)

Comments

Stefan Hajnoczi April 6, 2017, 1:04 p.m. UTC | #1
On Mon, Apr 03, 2017 at 06:09:33PM +0200, Max Reitz wrote:
> +        BDRVQcow2State *s = bs->opaque;
> +        uint64_t aligned_cur_size = align_offset(current_size, cluster_size);
> +        uint64_t creftable_length;
> +        uint64_t i;
> +
> +        /* new total size of L2 tables */
> +        nl2e = aligned_total_size / cluster_size;
> +        nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
> +        meta_size += nl2e * sizeof(uint64_t);
> +
> +        /* Subtract L2 tables which are already present */
> +        for (i = 0; i < s->l1_size; i++) {
> +            if (s->l1_table[i] & L1E_OFFSET_MASK) {
> +                meta_size -= cluster_size;
> +            }
> +        }

Allocated L1 table entries are 'A', unallocated L1 table entries in
the existing image are '-', and new L1 table entries after growth are
'+':

  |-A-AAA--+++++|

This for loop includes '-' entries.  Should we count them or just the
'+' entries?

>  
> -    /* total size of refcount tables */
> -    nreftablee = nrefblocke / refblock_size;
> -    nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
> -    meta_size += nreftablee * sizeof(uint64_t);
> +        /* Do not add L1 table size because the only caller of this path
> +         * (qcow2_truncate) has increased its size already. */
>  
> -    return aligned_total_size + meta_size;
> +        /* Calculate size of the additional refblocks (this assumes that all of
> +         * the existing image is covered by refblocks, which is extremely
> +         * likely); this may result in overallocation because parts of the newly
> +         * added space may be covered by existing refblocks, but that is fine.
> +         *
> +         * This only considers the newly added space. Since we cannot update the
> +         * reftable in-place, we will have to able to store both the old and the
> +         * new one at the same time, though. Therefore, we need to add the size
> +         * of the old reftable here.
> +         */
> +        creftable_length = ROUND_UP(s->refcount_table_size * sizeof(uint64_t),
> +                                    cluster_size);
> +        nrefblocke = ((aligned_total_size - aligned_cur_size) + meta_size +
> +                      creftable_length + cluster_size)
> +                   / (cluster_size - rces -
> +                      rces * sizeof(uint64_t) / cluster_size);
> +        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
> +
> +        /* total size of the new refcount table (again, may be too much because
> +         * it assumes that the new area is not covered by any refcount blocks
> +         * yet) */
> +        nreftablee = s->max_refcount_table_index + 1 +
> +                     nrefblocke / refblock_size;
> +        nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
> +        meta_size += nreftablee * sizeof(uint64_t);
> +
> +        return (aligned_total_size - aligned_cur_size) + meta_size;

This calculation is an approximation.  Here is a simpler approximation:

  current_usage = qcow2_calc_size_usage(current_size, ...);
  new_usage = qcow2_calc_size_usage(new_size, ...);
  delta = new_usage - current_usage;

(Perhaps the new refcount_table clusters need to be added to new_size
too.)

Is there an advantage of the more elaborate calculation implemented in
this patch?

Stefan
Max Reitz April 7, 2017, 3:42 p.m. UTC | #2
On 06.04.2017 15:04, Stefan Hajnoczi wrote:
> On Mon, Apr 03, 2017 at 06:09:33PM +0200, Max Reitz wrote:
>> +        BDRVQcow2State *s = bs->opaque;
>> +        uint64_t aligned_cur_size = align_offset(current_size, cluster_size);
>> +        uint64_t creftable_length;
>> +        uint64_t i;
>> +
>> +        /* new total size of L2 tables */
>> +        nl2e = aligned_total_size / cluster_size;
>> +        nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
>> +        meta_size += nl2e * sizeof(uint64_t);
>> +
>> +        /* Subtract L2 tables which are already present */
>> +        for (i = 0; i < s->l1_size; i++) {
>> +            if (s->l1_table[i] & L1E_OFFSET_MASK) {
>> +                meta_size -= cluster_size;
>> +            }
>> +        }
> 
> Allocated L1 table entries are 'A', unallocated L1 table entries in
> the existing image are '-', and new L1 table entries after growth are
> '+':
> 
>   |-A-AAA--+++++|
> 
> This for loop includes '-' entries.  Should we count them or just the
> '+' entries?

Hm, good question, thanks. I suppose we should count them if we wanted
to fully allocate the image now. But that's not the intention, we just
want to fully allocate the new area.

While you can make the calculation faster if you take that into account,
it also gets a bit more complicated: We can subtract all L2 tables that
do not point to the new area. But there may be L2 tables already which
do point there which we therefore do not have to allocate. So we'd still
need to do this loop, but the starting index would be the first L2 table
index to point into the area.

So the above loop may calculate more space to be required than actually
is the case; I could argue that's fine because this function may
overcommit (especially if the image wasn't fully allocated before). But
modifying it to ignore all L2 tables before the new area doesn't seem to
be too complicated, so I'll do it.

>> -    /* total size of refcount tables */
>> -    nreftablee = nrefblocke / refblock_size;
>> -    nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
>> -    meta_size += nreftablee * sizeof(uint64_t);
>> +        /* Do not add L1 table size because the only caller of this path
>> +         * (qcow2_truncate) has increased its size already. */
>>  
>> -    return aligned_total_size + meta_size;
>> +        /* Calculate size of the additional refblocks (this assumes that all of
>> +         * the existing image is covered by refblocks, which is extremely
>> +         * likely); this may result in overallocation because parts of the newly
>> +         * added space may be covered by existing refblocks, but that is fine.
>> +         *
>> +         * This only considers the newly added space. Since we cannot update the
>> +         * reftable in-place, we will have to able to store both the old and the
>> +         * new one at the same time, though. Therefore, we need to add the size
>> +         * of the old reftable here.
>> +         */
>> +        creftable_length = ROUND_UP(s->refcount_table_size * sizeof(uint64_t),
>> +                                    cluster_size);
>> +        nrefblocke = ((aligned_total_size - aligned_cur_size) + meta_size +
>> +                      creftable_length + cluster_size)
>> +                   / (cluster_size - rces -
>> +                      rces * sizeof(uint64_t) / cluster_size);
>> +        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
>> +
>> +        /* total size of the new refcount table (again, may be too much because
>> +         * it assumes that the new area is not covered by any refcount blocks
>> +         * yet) */
>> +        nreftablee = s->max_refcount_table_index + 1 +
>> +                     nrefblocke / refblock_size;
>> +        nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
>> +        meta_size += nreftablee * sizeof(uint64_t);
>> +
>> +        return (aligned_total_size - aligned_cur_size) + meta_size;
> 
> This calculation is an approximation.  Here is a simpler approximation:
> 
>   current_usage = qcow2_calc_size_usage(current_size, ...);
>   new_usage = qcow2_calc_size_usage(new_size, ...);
>   delta = new_usage - current_usage;
> 
> (Perhaps the new refcount_table clusters need to be added to new_size
> too.)
> 
> Is there an advantage of the more elaborate calculation implemented in
> this patch?

Now that you mention it... Well, the important thing is it's a
conservative approximation. It's supposed to never underestimate the
correct result.

Now the existing image doesn't need to be fully allocated. However, your
snippet assumes that it is. Often, this actually wouldn't be an issue.
But it is for clusters that are "shared" between the existing image and
the new area:

1. The old final L2 table may point into the new area. Your code assumes
it is already present but it may not be.

2. current_size need not be aligned to clusters. If it isn't, the new
area will reuse a data cluster from the existing image that now needs to
be allocated.

3. Theoretically: The L1 table may be not be actually allocated. We have
to make sure it is.

In practice: We call qcow2_grow_l1_table() before doing any
preallocation, and that always fully allocates the (new) L1 table. So
we're good.

4. Of course, just as always, it gets *very* funny with refcount
information. Maybe the existing image is sparsely allocated in a way
that its allocated cluster count is aligned to refblocks so that it
completely fills up all the refblocks it has (and those completely fill
up, say, one reftable cluster). Now the calculation above might assume
that we have more allocated clusters and therefore enough free entries
in the last refblock to put everything there. But when actually doing
the allocation... Surprise, you need to allocate one new refblock and a
hole new reftable because that new refblock doesn't fit into the old one.

So if I implemented your snippet and wanted to keep conservative, I'd
have to add the following cluster counts for each of these:

1. 1
2. 1
3. --
4. 1 (refblock) + number of existing reftable clusters + 1 (resized
reftable)

So this is not really good. We could add checks so we keep the count lower:

1. Check whether current_size is aligned to L2 boundaries. If it isn't,
check whether the final L2 table is allocated. If it isn't, add 1.
2. Check whether current_size is aligned to clusters. If it isn't, check
whether the final cluster is allocated. If it isn't, add 1.
4. Errr... This seems hard (like all refcount stuff). Maybe check
whether the super-conservative estimate above would fit into the last
refblock, if it does, add 0; otherwise, add
$number_of_refblocks_required plus the number of reftable clusters
required for that, however we can calculate that[1].

[1]: This brings me to another issue. When we have to resize the
reftable, we create a copy, so we have to be able to store both the old
and the new reftable at the same time. Your above snippet doesn't take
this into account, so we'll have to add the number of existing reftable
clusters to it; unless we don't have to resize the reftable...


So all in all we could either use your snippet in a super-conservative
approach, or we get probably the same complexity as I already have if we
add all of the checks proposed above.

The issue with the "super-conservative approach" is that I'm not even
sure I found all possible corner cases.

Max
Stefan Hajnoczi April 10, 2017, 9:59 a.m. UTC | #3
On Fri, Apr 07, 2017 at 05:42:16PM +0200, Max Reitz wrote:
> On 06.04.2017 15:04, Stefan Hajnoczi wrote:
> > On Mon, Apr 03, 2017 at 06:09:33PM +0200, Max Reitz wrote:
> >> -    /* total size of refcount tables */
> >> -    nreftablee = nrefblocke / refblock_size;
> >> -    nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
> >> -    meta_size += nreftablee * sizeof(uint64_t);
> >> +        /* Do not add L1 table size because the only caller of this path
> >> +         * (qcow2_truncate) has increased its size already. */
> >>  
> >> -    return aligned_total_size + meta_size;
> >> +        /* Calculate size of the additional refblocks (this assumes that all of
> >> +         * the existing image is covered by refblocks, which is extremely
> >> +         * likely); this may result in overallocation because parts of the newly
> >> +         * added space may be covered by existing refblocks, but that is fine.
> >> +         *
> >> +         * This only considers the newly added space. Since we cannot update the
> >> +         * reftable in-place, we will have to able to store both the old and the
> >> +         * new one at the same time, though. Therefore, we need to add the size
> >> +         * of the old reftable here.
> >> +         */
> >> +        creftable_length = ROUND_UP(s->refcount_table_size * sizeof(uint64_t),
> >> +                                    cluster_size);
> >> +        nrefblocke = ((aligned_total_size - aligned_cur_size) + meta_size +
> >> +                      creftable_length + cluster_size)
> >> +                   / (cluster_size - rces -
> >> +                      rces * sizeof(uint64_t) / cluster_size);
> >> +        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
> >> +
> >> +        /* total size of the new refcount table (again, may be too much because
> >> +         * it assumes that the new area is not covered by any refcount blocks
> >> +         * yet) */
> >> +        nreftablee = s->max_refcount_table_index + 1 +
> >> +                     nrefblocke / refblock_size;
> >> +        nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
> >> +        meta_size += nreftablee * sizeof(uint64_t);
> >> +
> >> +        return (aligned_total_size - aligned_cur_size) + meta_size;
> > 
> > This calculation is an approximation.  Here is a simpler approximation:
> > 
> >   current_usage = qcow2_calc_size_usage(current_size, ...);
> >   new_usage = qcow2_calc_size_usage(new_size, ...);
> >   delta = new_usage - current_usage;
> > 
> > (Perhaps the new refcount_table clusters need to be added to new_size
> > too.)
> > 
> > Is there an advantage of the more elaborate calculation implemented in
> > this patch?
> 
> Now that you mention it... Well, the important thing is it's a
> conservative approximation. It's supposed to never underestimate the
> correct result.
> 
> Now the existing image doesn't need to be fully allocated. However, your
> snippet assumes that it is. Often, this actually wouldn't be an issue.
> But it is for clusters that are "shared" between the existing image and
> the new area:
> 
> 1. The old final L2 table may point into the new area. Your code assumes
> it is already present but it may not be.
> 
> 2. current_size need not be aligned to clusters. If it isn't, the new
> area will reuse a data cluster from the existing image that now needs to
> be allocated.
> 
> 3. Theoretically: The L1 table may be not be actually allocated. We have
> to make sure it is.
> 
> In practice: We call qcow2_grow_l1_table() before doing any
> preallocation, and that always fully allocates the (new) L1 table. So
> we're good.
> 
> 4. Of course, just as always, it gets *very* funny with refcount
> information. Maybe the existing image is sparsely allocated in a way
> that its allocated cluster count is aligned to refblocks so that it
> completely fills up all the refblocks it has (and those completely fill
> up, say, one reftable cluster). Now the calculation above might assume
> that we have more allocated clusters and therefore enough free entries
> in the last refblock to put everything there. But when actually doing
> the allocation... Surprise, you need to allocate one new refblock and a
> hole new reftable because that new refblock doesn't fit into the old one.
> 
> So if I implemented your snippet and wanted to keep conservative, I'd
> have to add the following cluster counts for each of these:
> 
> 1. 1
> 2. 1
> 3. --
> 4. 1 (refblock) + number of existing reftable clusters + 1 (resized
> reftable)
> 
> So this is not really good. We could add checks so we keep the count lower:
> 
> 1. Check whether current_size is aligned to L2 boundaries. If it isn't,
> check whether the final L2 table is allocated. If it isn't, add 1.
> 2. Check whether current_size is aligned to clusters. If it isn't, check
> whether the final cluster is allocated. If it isn't, add 1.
> 4. Errr... This seems hard (like all refcount stuff). Maybe check
> whether the super-conservative estimate above would fit into the last
> refblock, if it does, add 0; otherwise, add
> $number_of_refblocks_required plus the number of reftable clusters
> required for that, however we can calculate that[1].
> 
> [1]: This brings me to another issue. When we have to resize the
> reftable, we create a copy, so we have to be able to store both the old
> and the new reftable at the same time. Your above snippet doesn't take
> this into account, so we'll have to add the number of existing reftable
> clusters to it; unless we don't have to resize the reftable...
> 
> 
> So all in all we could either use your snippet in a super-conservative
> approach, or we get probably the same complexity as I already have if we
> add all of the checks proposed above.
> 
> The issue with the "super-conservative approach" is that I'm not even
> sure I found all possible corner cases.

Good points.  Calculating a conservative number really does require
detailed knowledge of the metadata state.

Stefan

Patch
diff mbox

diff --git a/block/qcow2.c b/block/qcow2.c
index aafbc8dbed..12dafcc570 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2108,7 +2108,15 @@  done:
     return ret;
 }
 
-static uint64_t qcow2_calc_size_usage(uint64_t new_size,
+/**
+ * Returns the number of bytes that must be allocated in the underlying file
+ * to accomodate an image growth from @current_size to @new_size.
+ *
+ * @current_size must be 0 when creating a new image. In that case, @bs is
+ * ignored; otherwise it must be valid.
+ */
+static uint64_t qcow2_calc_size_usage(BlockDriverState *bs,
+                                      uint64_t current_size, uint64_t new_size,
                                       int cluster_bits, int refcount_order)
 {
     size_t cluster_size = 1u << cluster_bits;
@@ -2129,47 +2137,97 @@  static uint64_t qcow2_calc_size_usage(uint64_t new_size,
     refblock_bits = cluster_bits - (refcount_order - 3);
     refblock_size = 1 << refblock_bits;
 
-    /* header: 1 cluster */
-    meta_size += cluster_size;
-
-    /* total size of L2 tables */
-    nl2e = aligned_total_size / cluster_size;
-    nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
-    meta_size += nl2e * sizeof(uint64_t);
+    if (!current_size) {
+        /* header: 1 cluster */
+        meta_size += cluster_size;
+
+        /* total size of L2 tables */
+        nl2e = aligned_total_size / cluster_size;
+        nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
+        meta_size += nl2e * sizeof(uint64_t);
+
+        /* total size of L1 tables */
+        nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+        nl1e = align_offset(nl1e, cluster_size / sizeof(uint64_t));
+        meta_size += nl1e * sizeof(uint64_t);
+
+        /* total size of refcount blocks
+         *
+         * note: every host cluster is reference-counted, including metadata
+         * (even refcount blocks are recursively included).
+         * Let:
+         *   a = total_size (this is the guest disk size)
+         *   m = meta size not including refcount blocks and refcount tables
+         *   c = cluster size
+         *   y1 = number of refcount blocks entries
+         *   y2 = meta size including everything
+         *   rces = refcount entry size in bytes
+         * then,
+         *   y1 = (y2 + a)/c
+         *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
+         * we can get y1:
+         *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
+         */
+        nrefblocke = (aligned_total_size + meta_size + cluster_size)
+                   / (cluster_size - rces - rces * sizeof(uint64_t)
+                                                 / cluster_size);
+        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
 
-    /* total size of L1 tables */
-    nl1e = nl2e * sizeof(uint64_t) / cluster_size;
-    nl1e = align_offset(nl1e, cluster_size / sizeof(uint64_t));
-    meta_size += nl1e * sizeof(uint64_t);
+        /* total size of refcount tables */
+        nreftablee = nrefblocke / refblock_size;
+        nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
+        meta_size += nreftablee * sizeof(uint64_t);
 
-    /* total size of refcount blocks
-     *
-     * note: every host cluster is reference-counted, including metadata
-     * (even refcount blocks are recursively included).
-     * Let:
-     *   a = total_size (this is the guest disk size)
-     *   m = meta size not including refcount blocks and refcount tables
-     *   c = cluster size
-     *   y1 = number of refcount blocks entries
-     *   y2 = meta size including everything
-     *   rces = refcount entry size in bytes
-     * then,
-     *   y1 = (y2 + a)/c
-     *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
-     * we can get y1:
-     *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
-     */
-    nrefblocke = (aligned_total_size + meta_size + cluster_size)
-               / (cluster_size - rces - rces * sizeof(uint64_t)
-                                             / cluster_size);
-    meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
+        return aligned_total_size + meta_size;
+    } else {
+        BDRVQcow2State *s = bs->opaque;
+        uint64_t aligned_cur_size = align_offset(current_size, cluster_size);
+        uint64_t creftable_length;
+        uint64_t i;
+
+        /* new total size of L2 tables */
+        nl2e = aligned_total_size / cluster_size;
+        nl2e = align_offset(nl2e, cluster_size / sizeof(uint64_t));
+        meta_size += nl2e * sizeof(uint64_t);
+
+        /* Subtract L2 tables which are already present */
+        for (i = 0; i < s->l1_size; i++) {
+            if (s->l1_table[i] & L1E_OFFSET_MASK) {
+                meta_size -= cluster_size;
+            }
+        }
 
-    /* total size of refcount tables */
-    nreftablee = nrefblocke / refblock_size;
-    nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
-    meta_size += nreftablee * sizeof(uint64_t);
+        /* Do not add L1 table size because the only caller of this path
+         * (qcow2_truncate) has increased its size already. */
 
-    return aligned_total_size + meta_size;
+        /* Calculate size of the additional refblocks (this assumes that all of
+         * the existing image is covered by refblocks, which is extremely
+         * likely); this may result in overallocation because parts of the newly
+         * added space may be covered by existing refblocks, but that is fine.
+         *
+         * This only considers the newly added space. Since we cannot update the
+         * reftable in-place, we will have to able to store both the old and the
+         * new one at the same time, though. Therefore, we need to add the size
+         * of the old reftable here.
+         */
+        creftable_length = ROUND_UP(s->refcount_table_size * sizeof(uint64_t),
+                                    cluster_size);
+        nrefblocke = ((aligned_total_size - aligned_cur_size) + meta_size +
+                      creftable_length + cluster_size)
+                   / (cluster_size - rces -
+                      rces * sizeof(uint64_t) / cluster_size);
+        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
+
+        /* total size of the new refcount table (again, may be too much because
+         * it assumes that the new area is not covered by any refcount blocks
+         * yet) */
+        nreftablee = s->max_refcount_table_index + 1 +
+                     nrefblocke / refblock_size;
+        nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
+        meta_size += nreftablee * sizeof(uint64_t);
+
+        return (aligned_total_size - aligned_cur_size) + meta_size;
+    }
 }
 
 static int qcow2_create2(const char *filename, int64_t total_size,
@@ -2210,7 +2268,8 @@  static int qcow2_create2(const char *filename, int64_t total_size,
     int ret;
 
     if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-        uint64_t file_size = qcow2_calc_size_usage(total_size, cluster_bits,
+        uint64_t file_size = qcow2_calc_size_usage(NULL, 0, total_size,
+                                                   cluster_bits,
                                                    refcount_order);
 
         qemu_opt_set_number(opts, BLOCK_OPT_SIZE, file_size, &error_abort);