diff mbox series

[for-4.2,09/13] qcow2: Fix overly long snapshot tables

Message ID 20190730172508.19911-10-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series qcow2: Let check -r all repair some snapshot bits | expand

Commit Message

Max Reitz July 30, 2019, 5:25 p.m. UTC
We currently refuse to open qcow2 images with overly long snapshot
tables.  This patch makes qemu-img check -r all drop all offending
entries past what we deem acceptable.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 10 deletions(-)

Comments

Eric Blake July 30, 2019, 7:08 p.m. UTC | #1
On 7/30/19 12:25 PM, Max Reitz wrote:
> We currently refuse to open qcow2 images with overly long snapshot
> tables.  This patch makes qemu-img check -r all drop all offending
> entries past what we deem acceptable.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 79 insertions(+), 10 deletions(-)

I'm less sure about this one.  8/13 should have no semantic effect (if
the user _depended_ on that much extra data, they should have set an
incompatible feature flag bit, at which point we'd leave their data
alone because we don't recognize the feature bit; so it is safe to
assume the user did not depend on the data and that we can thus nuke it
with impunity).  But here, we are throwing away the user's internal
snapshots, and not even giving them a say in which ones to throw away
(more likely, by trimming from the end, we are destroying the most
recent snapshots in favor of the older ones - but I could argue that
throwing away the oldest also has its uses).


> @@ -417,7 +461,32 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>  
>          return ret;
>      }
> -    result->corruptions += extra_data_dropped;
> +    result->corruptions += nb_clusters_reduced + extra_data_dropped;
> +
> +    if (nb_clusters_reduced) {
> +        /*
> +         * Update image header now, because:
> +         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
> +         *     the same as what the image header says,
> +         * (2) this leaks clusters, but qcow2_check_refcounts() will
> +         *     fix that.
> +         */
> +        assert(fix & BDRV_FIX_ERRORS);
> +
> +        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
> +        ret = bdrv_pwrite_sync(bs->file, 60,

That '60' needs a name; it keeps popping up.

If we like the patch, I didn't spot major coding problems.  But because
I'm not sure we want this patch, I'll skip R-b for now.
Max Reitz July 31, 2019, 9:22 a.m. UTC | #2
On 30.07.19 21:08, Eric Blake wrote:
> On 7/30/19 12:25 PM, Max Reitz wrote:
>> We currently refuse to open qcow2 images with overly long snapshot
>> tables.  This patch makes qemu-img check -r all drop all offending
>> entries past what we deem acceptable.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 79 insertions(+), 10 deletions(-)
> 
> I'm less sure about this one.  8/13 should have no semantic effect (if
> the user _depended_ on that much extra data, they should have set an
> incompatible feature flag bit, at which point we'd leave their data
> alone because we don't recognize the feature bit; so it is safe to
> assume the user did not depend on the data and that we can thus nuke it
> with impunity).  But here, we are throwing away the user's internal
> snapshots, and not even giving them a say in which ones to throw away
> (more likely, by trimming from the end, we are destroying the most
> recent snapshots in favor of the older ones - but I could argue that
> throwing away the oldest also has its uses).

First, I don’t think there really is a legitimate use case for having an
overly long snapshot table.  In fact, I think our limit is too high as
it is and we just introduced it this way because we didn’t have any
repair functionality, and so just had to pick some limit that nobody
could ever reasonably reach.

(As the test shows, you need more than 500 snapshots with 64 kB names
and ID strings, and 1 kB of extra data to reach this limit.)

So the only likely cause to reach this number of snapshots is
corruption.  OK, so maybe we don’t need to be able to fix it, then,
because the image is corrupted anyway.

But I think we do want to be able to fix it, because otherwise you just
can’t open the image at all and thus not even read the active layer.


This gets me to: Second, it doesn’t make things worse.  Right now, we
just refuse to open such images in all cases.  I’d personally prefer
discarding some data on my image over losing it all.


And third, I wonder what interface you have in mind.  I think adding an
interface to qemu-img check to properly address this problem (letting
the user discard individual snapshots) is hard.  I could imagine two things:

(A) Making qemu-img snapshot sometimes set BDRV_O_CHECK, too, or
something.  For qemu-img snapshot -d, you don’t need to read the whole
table into memory, and thus we don’t need to impose any limit.  But that
seems pretty hackish to me.

(B) Maybe the proper solution would be to add an interactive interface
to bdrv_check().  I can imagine that in the future, we may get more
cases where we want interaction with the user on what data to delete and
so on.  But that's hard...  (I’ll try.  Good thing stdio is already the
standard interface in bdrv_check(), so I won’t have to feel bad if I go
down that route even further.)

Max

>> @@ -417,7 +461,32 @@ int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
>>  
>>          return ret;
>>      }
>> -    result->corruptions += extra_data_dropped;
>> +    result->corruptions += nb_clusters_reduced + extra_data_dropped;
>> +
>> +    if (nb_clusters_reduced) {
>> +        /*
>> +         * Update image header now, because:
>> +         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
>> +         *     the same as what the image header says,
>> +         * (2) this leaks clusters, but qcow2_check_refcounts() will
>> +         *     fix that.
>> +         */
>> +        assert(fix & BDRV_FIX_ERRORS);
>> +
>> +        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
>> +        ret = bdrv_pwrite_sync(bs->file, 60,
> 
> That '60' needs a name; it keeps popping up.
> 
> If we like the patch, I didn't spot major coding problems.  But because
> I'm not sure we want this patch, I'll skip R-b for now.
>
Max Reitz Aug. 16, 2019, 6:06 p.m. UTC | #3
On 31.07.19 11:22, Max Reitz wrote:
> On 30.07.19 21:08, Eric Blake wrote:
>> On 7/30/19 12:25 PM, Max Reitz wrote:
>>> We currently refuse to open qcow2 images with overly long snapshot
>>> tables.  This patch makes qemu-img check -r all drop all offending
>>> entries past what we deem acceptable.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  block/qcow2-snapshot.c | 89 +++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 79 insertions(+), 10 deletions(-)
>>
>> I'm less sure about this one.  8/13 should have no semantic effect (if
>> the user _depended_ on that much extra data, they should have set an
>> incompatible feature flag bit, at which point we'd leave their data
>> alone because we don't recognize the feature bit; so it is safe to
>> assume the user did not depend on the data and that we can thus nuke it
>> with impunity).  But here, we are throwing away the user's internal
>> snapshots, and not even giving them a say in which ones to throw away
>> (more likely, by trimming from the end, we are destroying the most
>> recent snapshots in favor of the older ones - but I could argue that
>> throwing away the oldest also has its uses).
> 
> First, I don’t think there really is a legitimate use case for having an
> overly long snapshot table.  In fact, I think our limit is too high as
> it is and we just introduced it this way because we didn’t have any
> repair functionality, and so just had to pick some limit that nobody
> could ever reasonably reach.
> 
> (As the test shows, you need more than 500 snapshots with 64 kB names
> and ID strings, and 1 kB of extra data to reach this limit.)
> 
> So the only likely cause to reach this number of snapshots is
> corruption.  OK, so maybe we don’t need to be able to fix it, then,
> because the image is corrupted anyway.
> 
> But I think we do want to be able to fix it, because otherwise you just
> can’t open the image at all and thus not even read the active layer.
> 
> 
> This gets me to: Second, it doesn’t make things worse.  Right now, we
> just refuse to open such images in all cases.  I’d personally prefer
> discarding some data on my image over losing it all.
> 
> 
> And third, I wonder what interface you have in mind.  I think adding an
> interface to qemu-img check to properly address this problem (letting
> the user discard individual snapshots) is hard.  I could imagine two things:
> 
> (A) Making qemu-img snapshot sometimes set BDRV_O_CHECK, too, or
> something.  For qemu-img snapshot -d, you don’t need to read the whole
> table into memory, and thus we don’t need to impose any limit.  But that
> seems pretty hackish to me.
> 
> (B) Maybe the proper solution would be to add an interactive interface
> to bdrv_check().  I can imagine that in the future, we may get more
> cases where we want interaction with the user on what data to delete and
> so on.  But that's hard...  (I’ll try.  Good thing stdio is already the
> standard interface in bdrv_check(), so I won’t have to feel bad if I go
> down that route even further.)

After some fiddling around, I don’t think this is worth it.  As I said,
this is an extremely rare case anyway, so the main goal should be to
just being able to access the active layer to copy at least that data
off the image.

The other side is that this would introduce quite complex code that
basically cannot be tested reasonably.  I’d rather not do that.

Max
diff mbox series

Patch

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 9956c32964..bd8e56a99e 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -29,15 +29,24 @@ 
 #include "qemu/error-report.h"
 #include "qemu/cutils.h"
 
+static void qcow2_free_single_snapshot(BlockDriverState *bs, int i)
+{
+    BDRVQcow2State *s = bs->opaque;
+
+    assert(i >= 0 && i < s->nb_snapshots);
+    g_free(s->snapshots[i].name);
+    g_free(s->snapshots[i].id_str);
+    g_free(s->snapshots[i].unknown_extra_data);
+    memset(&s->snapshots[i], 0, sizeof(s->snapshots[i]));
+}
+
 void qcow2_free_snapshots(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
     int i;
 
     for(i = 0; i < s->nb_snapshots; i++) {
-        g_free(s->snapshots[i].name);
-        g_free(s->snapshots[i].id_str);
-        g_free(s->snapshots[i].unknown_extra_data);
+        qcow2_free_single_snapshot(bs, i);
     }
     g_free(s->snapshots);
     s->snapshots = NULL;
@@ -48,6 +57,14 @@  void qcow2_free_snapshots(BlockDriverState *bs)
  * If @repair is true, try to repair a broken snapshot table instead
  * of just returning an error:
  *
+ * - If the snapshot table was too long, set *nb_clusters_reduced to
+ *   the number of snapshots removed off the end.
+ *   The caller will update the on-disk nb_snapshots accordingly;
+ *   this leaks clusters, but is safe.
+ *   (The on-disk information must be updated before
+ *   qcow2_check_refcounts(), because that function relies on
+ *   s->nb_snapshots to reflect the on-disk value.)
+ *
  * - If there were snapshots with too much extra metadata, increment
  *   *extra_data_dropped for each.
  *   This requires the caller to eventually rewrite the whole snapshot
@@ -59,6 +76,7 @@  void qcow2_free_snapshots(BlockDriverState *bs)
  *   extra data.)
  */
 static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
+                                   int *nb_clusters_reduced,
                                    int *extra_data_dropped,
                                    Error **errp)
 {
@@ -67,7 +85,7 @@  static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
     QCowSnapshotExtraData extra;
     QCowSnapshot *sn;
     int i, id_str_size, name_size;
-    int64_t offset;
+    int64_t offset, pre_sn_offset;
     int ret;
 
     if (!s->nb_snapshots) {
@@ -82,6 +100,8 @@  static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
     for(i = 0; i < s->nb_snapshots; i++) {
         bool discard_unknown_extra_data = false;
 
+        pre_sn_offset = offset;
+
         /* Read statically sized part of the snapshot header */
         offset = ROUND_UP(offset, 8);
         ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
@@ -182,9 +202,31 @@  static int qcow2_do_read_snapshots(BlockDriverState *bs, bool repair,
         sn->name[name_size] = '\0';
 
         if (offset - s->snapshots_offset > QCOW_MAX_SNAPSHOTS_SIZE) {
-            ret = -EFBIG;
-            error_setg(errp, "Snapshot table is too big");
-            goto fail;
+            if (!repair) {
+                ret = -EFBIG;
+                error_setg(errp, "Snapshot table is too big");
+                error_append_hint(errp, "You can force-remove all %u "
+                                  "overhanging snapshots with qemu-img check "
+                                  "-r all\n", s->nb_snapshots - i);
+                goto fail;
+            }
+
+            fprintf(stderr, "Discarding %u overhanging snapshots (snapshot "
+                    "table is too big)\n", s->nb_snapshots - i);
+
+            *nb_clusters_reduced += (s->nb_snapshots - i);
+
+            /* Discard current snapshot also */
+            qcow2_free_single_snapshot(bs, i);
+
+            /*
+             * This leaks all the rest of the snapshot table and the
+             * snapshots' clusters, but we run in check -r all mode,
+             * so qcow2_check_refcounts() will take care of it.
+             */
+            s->nb_snapshots = i;
+            offset = pre_sn_offset;
+            break;
         }
     }
 
@@ -199,7 +241,7 @@  fail:
 
 int qcow2_read_snapshots(BlockDriverState *bs, Error **errp)
 {
-    return qcow2_do_read_snapshots(bs, false, NULL, errp);
+    return qcow2_do_read_snapshots(bs, false, NULL, NULL, errp);
 }
 
 /* add at the end of the file a new list of snapshots */
@@ -367,6 +409,7 @@  int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 {
     BDRVQcow2State *s = bs->opaque;
     Error *local_err = NULL;
+    int nb_clusters_reduced = 0;
     int extra_data_dropped = 0;
     int ret;
     struct {
@@ -404,7 +447,8 @@  int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 
     qemu_co_mutex_unlock(&s->lock);
     ret = qcow2_do_read_snapshots(bs, fix & BDRV_FIX_ERRORS,
-                                  &extra_data_dropped, &local_err);
+                                  &nb_clusters_reduced, &extra_data_dropped,
+                                  &local_err);
     qemu_co_mutex_lock(&s->lock);
     if (ret < 0) {
         result->check_errors++;
@@ -417,7 +461,32 @@  int coroutine_fn qcow2_check_read_snapshot_table(BlockDriverState *bs,
 
         return ret;
     }
-    result->corruptions += extra_data_dropped;
+    result->corruptions += nb_clusters_reduced + extra_data_dropped;
+
+    if (nb_clusters_reduced) {
+        /*
+         * Update image header now, because:
+         * (1) qcow2_check_refcounts() relies on s->nb_snapshots to be
+         *     the same as what the image header says,
+         * (2) this leaks clusters, but qcow2_check_refcounts() will
+         *     fix that.
+         */
+        assert(fix & BDRV_FIX_ERRORS);
+
+        snapshot_table_pointer.nb_snapshots = cpu_to_be32(s->nb_snapshots);
+        ret = bdrv_pwrite_sync(bs->file, 60,
+                               &snapshot_table_pointer.nb_snapshots,
+                               sizeof(snapshot_table_pointer.nb_snapshots));
+        if (ret < 0) {
+            result->check_errors++;
+            fprintf(stderr, "ERROR failed to update the snapshot count in the "
+                    "image header: %s\n", strerror(-ret));
+            return ret;
+        }
+
+        result->corruptions_fixed += nb_clusters_reduced;
+        result->corruptions -= nb_clusters_reduced;
+    }
 
     return 0;
 }