[v4,2/5] qcow2-refcount: avoid eating RAM

Message ID	20181214134240.217870-3-vsementsov@virtuozzo.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> To: qemu-block@nongnu.org, qemu-devel@nongnu.org Date: Fri, 14 Dec 2018 16:42:37 +0300 Message-Id: <20181214134240.217870-3-vsementsov@virtuozzo.com> In-Reply-To: <20181214134240.217870-1-vsementsov@virtuozzo.com> References: <20181214134240.217870-1-vsementsov@virtuozzo.com> Subject: [Qemu-devel] [PATCH v4 2/5] qcow2-refcount: avoid eating RAM Precedence: list Cc: kwolf@redhat.com, den@openvz.org, vsementsov@virtuozzo.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	qcow2 check improvements part I \| expand [v4,0/5] qcow2 check improvements part I [v4,1/5] qcow2-refcount: fix check_oflag_copied [v4,2/5] qcow2-refcount: avoid eating RAM [v4,3/5] qcow2-refcount: check_refcounts_l2: reduce ignored overlaps [v4,4/5] qcow2-refcount: check_refcounts_l2: don't count fixed cluster as allocated [v4,5/5] qcow2-refcount: don't mask corruptions under internal errors

Message ID

20181214134240.217870-3-vsementsov@virtuozzo.com (mailing list archive)

State

New, archived

Headers

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org,
	qemu-devel@nongnu.org
Date: Fri, 14 Dec 2018 16:42:37 +0300
Message-Id: <20181214134240.217870-3-vsementsov@virtuozzo.com>
In-Reply-To: <20181214134240.217870-1-vsementsov@virtuozzo.com>
References: <20181214134240.217870-1-vsementsov@virtuozzo.com>
Subject: [Qemu-devel] [PATCH v4 2/5] qcow2-refcount: avoid eating RAM
Precedence: list
Cc: kwolf@redhat.com, den@openvz.org, vsementsov@virtuozzo.com,
	mreitz@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

qcow2 check improvements part I | expand

Commit Message

Vladimir Sementsov-Ogievskiy Dec. 14, 2018, 1:42 p.m. UTC

qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
an unpredictable amount of memory on corrupted table entries, which are
referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Interesting that iotest 138 checks exactly the behavior which we fix
here. So, change the test appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-refcount.c     | 18 ++++++++++++++++++
 tests/qemu-iotests/138     | 12 +++++-------
 tests/qemu-iotests/138.out |  5 ++++-
 3 files changed, 27 insertions(+), 8 deletions(-)

Comments

Max Reitz Feb. 27, 2019, 12:26 p.m. UTC | #1

On 14.12.18 14:42, Vladimir Sementsov-Ogievskiy wrote:
> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
> an unpredictable amount of memory on corrupted table entries, which are
> referencing regions far beyond the end of file.
> 
> Prevent this, by skipping such regions from further processing.

Is it actually wrong to reference clusters beyond the EOF?

Hmm...  The spec says everywhere "offset into the image file" (except
for L2 entries pointing to data clusters, but let's say that's just a
mistake).  This implies that offsets have to be within the confines of
the file, OK.

> Interesting that iotest 138 checks exactly the behavior which we fix
> here. So, change the test appropriately.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-refcount.c     | 18 ++++++++++++++++++
>  tests/qemu-iotests/138     | 12 +++++-------
>  tests/qemu-iotests/138.out |  5 ++++-
>  3 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index b717fdecc6..c41c1fbe00 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1505,12 +1505,30 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
>  {
>      BDRVQcow2State *s = bs->opaque;
>      uint64_t start, last, cluster_offset, k, refcount;
> +    int64_t file_len;
>      int ret;
>  
>      if (size <= 0) {
>          return 0;
>      }
>  
> +    file_len = bdrv_getlength(bs->file->bs);

I just had to delete a paragraph where I was questioning the use of
bdrv_getlength() here again, because I remembered I did that before and
decided to give in...

> +    if (file_len < 0) {
> +        return file_len;
> +    }
> +
> +    /* Last cluster of qcow2 image may be semi-allocated, so it's may be OK to

With s/it's/it/:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +     * reference some space after file end but it should be less than one
> +     * cluster.
> +     */
> +    if (offset + size - file_len >= s->cluster_size) {
> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
> +                "end of the file by one cluster or more: offset 0x%" PRIx64
> +                " size 0x%" PRIx64 "\n", offset, size);
> +        res->corruptions++;
> +        return 0;
> +    }
> +
>      start = start_of_cluster(s, offset);
>      last = start_of_cluster(s, offset + size - 1);
>      for(cluster_offset = start; cluster_offset <= last;
> diff --git a/tests/qemu-iotests/138 b/tests/qemu-iotests/138
> index eccbcae3a6..12e20762e5 100755
> --- a/tests/qemu-iotests/138
> +++ b/tests/qemu-iotests/138
> @@ -54,15 +54,13 @@ $QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
>  # Put the data cluster at a multiple of 2 TB, resulting in the image apparently
>  # having a multiple of 2^32 clusters
>  # (To be more specific: It is at 32 PB)
> -poke_file "$TEST_IMG" 2048 "\x80\x80\x00\x00\x00\x00\x00\x00"
> +poke_file "$TEST_IMG" $((2048 + 8)) "\x00\x80\x00\x00\x00\x00\x00\x00"
>  
>  # An offset of 32 PB results in qemu-img check having to allocate an in-memory
> -# refcount table of 128 TB (16 bit refcounts, 512 byte clusters).
> -# This should be generally too much for any system and thus fail.
> -# What this test is checking is that the qcow2 driver actually tries to allocate
> -# such a large amount of memory (and is consequently aborting) instead of having
> -# truncated the cluster count somewhere (which would result in much less memory
> -# being allocated and then a segfault occurring).
> +# refcount table of 128 TB (16 bit refcounts, 512 byte clusters), if qemu-img
> +# don't check that referenced data cluster is far beyond the end of file.
> +# But starting from 4.0, qemu-img does this check, and instead of "Cannot
> +# allocate memory", we have an error showing that l2 entry is invalid.
>  _check_test_img
>  
>  # success, all done
> diff --git a/tests/qemu-iotests/138.out b/tests/qemu-iotests/138.out
> index 3fe911f85a..aca7d47a80 100644
> --- a/tests/qemu-iotests/138.out
> +++ b/tests/qemu-iotests/138.out
> @@ -5,5 +5,8 @@ QA output created by 138
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=512
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -qemu-img: Check failed: Cannot allocate memory
> +ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x80000000000000 size 0x200
> +
> +1 errors were found on the image.
> +Data may be corrupted, or further writes to the image may corrupt it.
>  *** done
>

Eric Blake Feb. 27, 2019, 2:43 p.m. UTC | #2

On 12/14/18 7:42 AM, Vladimir Sementsov-Ogievskiy wrote:
> qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
> an unpredictable amount of memory on corrupted table entries, which are
> referencing regions far beyond the end of file.
> 
> Prevent this, by skipping such regions from further processing.
> 
> Interesting that iotest 138 checks exactly the behavior which we fix
> here. So, change the test appropriately.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---

> +    /* Last cluster of qcow2 image may be semi-allocated, so it's may be OK to
> +     * reference some space after file end but it should be less than one
> +     * cluster.

Are we guaranteed that this is always the case, even when incrementally
building up an image.  That is, if we get a power failure in the middle
of allocating both a new L2 table and a larger refcount table, can we
see an image which temporary references 2 or more clusters beyond the
end of the file that it was previously using?  Or are we guaranteed that
the way we update the refcount table will never see more than one
cluster beyond the end of the file for any other reference?  I guess
where I'm going with this is a question of whether we should permit a
finite overrun to account for multiple pieces of metadata all being in
flight at once, and only reject the obvious overruns that are definitely
beyond that sane limit, and whether 1 cluster is too small for that sane
limit.

> +     */
> +    if (offset + size - file_len >= s->cluster_size) {
> +        fprintf(stderr, "ERROR: counting reference for region exceeding the "
> +                "end of the file by one cluster or more: offset 0x%" PRIx64
> +                " size 0x%" PRIx64 "\n", offset, size);

Should we be using something smarter than fprintf() here?

Grammar suggestion:

ERROR: reference to a region too far beyond the end of the file:

(which is shorter than what you wrote, and also has the nice property of
allowing us to change our mind on whether it is 1 cluster, 2, 10, or
some other finite limit as our heuristic of when we consider the image
corrupt without having to reword it).

> +++ b/tests/qemu-iotests/138
> @@ -54,15 +54,13 @@ $QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
>  # Put the data cluster at a multiple of 2 TB, resulting in the image apparently
>  # having a multiple of 2^32 clusters
>  # (To be more specific: It is at 32 PB)
> -poke_file "$TEST_IMG" 2048 "\x80\x80\x00\x00\x00\x00\x00\x00"
> +poke_file "$TEST_IMG" $((2048 + 8)) "\x00\x80\x00\x00\x00\x00\x00\x00"
>  
>  # An offset of 32 PB results in qemu-img check having to allocate an in-memory
> -# refcount table of 128 TB (16 bit refcounts, 512 byte clusters).
> -# This should be generally too much for any system and thus fail.
> -# What this test is checking is that the qcow2 driver actually tries to allocate
> -# such a large amount of memory (and is consequently aborting) instead of having
> -# truncated the cluster count somewhere (which would result in much less memory
> -# being allocated and then a segfault occurring).
> +# refcount table of 128 TB (16 bit refcounts, 512 byte clusters), if qemu-img
> +# don't check that referenced data cluster is far beyond the end of file.
> +# But starting from 4.0, qemu-img does this check, and instead of "Cannot
> +# allocate memory", we have an error showing that l2 entry is invalid.
>  _check_test_img

In other words, we are still gracefully invalidating the file, but now
because of a heuristic rather than a memory allocation failure.

I might word the comment differently:

An offset of 32 PB would result in qemu-img check having to allocate an
in-memory refcount table of 128 TB (16 bit refcounts, 512 byte
clusters). It is unlikely that a system has this much memory, so the
test ensures that qemu-img either gracefully fails an allocation (prior
to 4.0) or flags the image as invalid (the reference points to a cluster
too far beyond the actual file), rather than silently allocating a
truncated memory amount or dumping core.

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b717fdecc6..c41c1fbe00 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1505,12 +1505,30 @@  int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t start, last, cluster_offset, k, refcount;
+    int64_t file_len;
     int ret;
 
     if (size <= 0) {
         return 0;
     }
 
+    file_len = bdrv_getlength(bs->file->bs);
+    if (file_len < 0) {
+        return file_len;
+    }
+
+    /* Last cluster of qcow2 image may be semi-allocated, so it's may be OK to
+     * reference some space after file end but it should be less than one
+     * cluster.
+     */
+    if (offset + size - file_len >= s->cluster_size) {
+        fprintf(stderr, "ERROR: counting reference for region exceeding the "
+                "end of the file by one cluster or more: offset 0x%" PRIx64
+                " size 0x%" PRIx64 "\n", offset, size);
+        res->corruptions++;
+        return 0;
+    }
+
     start = start_of_cluster(s, offset);
     last = start_of_cluster(s, offset + size - 1);
     for(cluster_offset = start; cluster_offset <= last;
diff --git a/tests/qemu-iotests/138 b/tests/qemu-iotests/138
index eccbcae3a6..12e20762e5 100755
--- a/tests/qemu-iotests/138
+++ b/tests/qemu-iotests/138
@@ -54,15 +54,13 @@  $QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
 # Put the data cluster at a multiple of 2 TB, resulting in the image apparently
 # having a multiple of 2^32 clusters
 # (To be more specific: It is at 32 PB)
-poke_file "$TEST_IMG" 2048 "\x80\x80\x00\x00\x00\x00\x00\x00"
+poke_file "$TEST_IMG" $((2048 + 8)) "\x00\x80\x00\x00\x00\x00\x00\x00"
 
 # An offset of 32 PB results in qemu-img check having to allocate an in-memory
-# refcount table of 128 TB (16 bit refcounts, 512 byte clusters).
-# This should be generally too much for any system and thus fail.
-# What this test is checking is that the qcow2 driver actually tries to allocate
-# such a large amount of memory (and is consequently aborting) instead of having
-# truncated the cluster count somewhere (which would result in much less memory
-# being allocated and then a segfault occurring).
+# refcount table of 128 TB (16 bit refcounts, 512 byte clusters), if qemu-img
+# don't check that referenced data cluster is far beyond the end of file.
+# But starting from 4.0, qemu-img does this check, and instead of "Cannot
+# allocate memory", we have an error showing that l2 entry is invalid.
 _check_test_img
 
 # success, all done
diff --git a/tests/qemu-iotests/138.out b/tests/qemu-iotests/138.out
index 3fe911f85a..aca7d47a80 100644
--- a/tests/qemu-iotests/138.out
+++ b/tests/qemu-iotests/138.out
@@ -5,5 +5,8 @@  QA output created by 138
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=512
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-qemu-img: Check failed: Cannot allocate memory
+ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x80000000000000 size 0x200
+
+1 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
 *** done

[v4,2/5] qcow2-refcount: avoid eating RAM

Commit Message

Comments

Patch