diff mbox series

[4/7] vmdk: Reject invalid compressed writes

Message ID 20190725155735.11872-5-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series vmdk: Misc fixes | expand

Commit Message

Max Reitz July 25, 2019, 3:57 p.m. UTC
Compressed writes generally have to write full clusters, not just in
theory but also in practice when it comes to vmdk's streamOptimized
subformat.  It currently is just silently broken for writes with
non-zero in-cluster offsets:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 4096
4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
read failed: Invalid argument

(The technical reason is that vmdk_write_extent() just writes the
incomplete compressed data actually to offset 4k.  When reading the
data, vmdk_read_extent() looks at offset 0 and finds the compressed data
size to be 0, because that is what it reads from there.  This yields an
error.)

For incomplete writes with zero in-cluster offsets, the error path when
reading the rest of the cluster is a bit different, but the result is
the same:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
read failed: Invalid argument

(Here, vmdk_read_extent() finds the data and then sees that the
uncompressed data is short.)

It is better to reject invalid writes than to make the user believe they
might have succeeded and then fail when trying to read it back.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

John Snow Aug. 12, 2019, 8:26 p.m. UTC | #1
On 7/25/19 11:57 AM, Max Reitz wrote:
> Compressed writes generally have to write full clusters, not just in
> theory but also in practice when it comes to vmdk's streamOptimized
> subformat.  It currently is just silently broken for writes with
> non-zero in-cluster offsets:
> 
> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
> wrote 4096/4096 bytes at offset 4096
> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
> read failed: Invalid argument
> 
> (The technical reason is that vmdk_write_extent() just writes the
> incomplete compressed data actually to offset 4k.  When reading the
> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
> size to be 0, because that is what it reads from there.  This yields an
> error.)
> 
> For incomplete writes with zero in-cluster offsets, the error path when
> reading the rest of the cluster is a bit different, but the result is
> the same:
> 
> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
> read failed: Invalid argument
> 
> (Here, vmdk_read_extent() finds the data and then sees that the
> uncompressed data is short.)
> 
> It is better to reject invalid writes than to make the user believe they
> might have succeeded and then fail when trying to read it back.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/vmdk.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index db6acfc31e..641acacfe0 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
>      if (extent->compressed) {
>          void *compressed_data;
>  
> +        /* Only whole clusters */
> +        if (offset_in_cluster ||
> +            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
> +            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
> +             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
> +        {
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +
>          if (!extent->has_marker) {
>              ret = -EINVAL;
>              goto out;
> 

What does this look like from a guest's perspective? Is there something
that enforces the alignment in the graph for us?

Or is it the case that indeed guests (or users via qemu-io) can request
invalid writes and we will halt the VM in those cases (in preference to
corrupting the disk)?
Max Reitz Aug. 12, 2019, 9:03 p.m. UTC | #2
On 12.08.19 22:26, John Snow wrote:
> 
> 
> On 7/25/19 11:57 AM, Max Reitz wrote:
>> Compressed writes generally have to write full clusters, not just in
>> theory but also in practice when it comes to vmdk's streamOptimized
>> subformat.  It currently is just silently broken for writes with
>> non-zero in-cluster offsets:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 4096
>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>> read failed: Invalid argument
>>
>> (The technical reason is that vmdk_write_extent() just writes the
>> incomplete compressed data actually to offset 4k.  When reading the
>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>> size to be 0, because that is what it reads from there.  This yields an
>> error.)
>>
>> For incomplete writes with zero in-cluster offsets, the error path when
>> reading the rest of the cluster is a bit different, but the result is
>> the same:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 0
>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>> read failed: Invalid argument
>>
>> (Here, vmdk_read_extent() finds the data and then sees that the
>> uncompressed data is short.)
>>
>> It is better to reject invalid writes than to make the user believe they
>> might have succeeded and then fail when trying to read it back.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/vmdk.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index db6acfc31e..641acacfe0 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
>>      if (extent->compressed) {
>>          void *compressed_data;
>>  
>> +        /* Only whole clusters */
>> +        if (offset_in_cluster ||
>> +            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>> +            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>> +             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>> +        {
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +
>>          if (!extent->has_marker) {
>>              ret = -EINVAL;
>>              goto out;
>>
> 
> What does this look like from a guest's perspective? Is there something
> that enforces the alignment in the graph for us?
> 
> Or is it the case that indeed guests (or users via qemu-io) can request
> invalid writes and we will halt the VM in those cases (in preference to
> corrupting the disk)?

Have you ever tried using a streamOptimized VMDK disk with a guest?

I haven’t, but I know that it won’t work. O:-)  If you try to write to
an already allocated cluster, you’ll get an EIO and an error message via
error_report() (“Could not write to allocated cluster for
streamOptimized”).  So really, the only use of streamOptimized is as a
qemu-img convert source/target, or as a backup/mirror target.  (Just
like compressed clusters in qcow2 images.)

I suppose if I introduced streamOptimized support today, I wouldn’t just
forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
make vmdk_co_pwritev_compressed() only work on streamOptimized images,
and vmdk_co_pwritev() only on everything else.  Then it would be more clear.

Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
writes for any subformat, even if it doesn’t support compression.  So if
you use -c and convert to vmdk, it will succeed, but the result won’t be
compressed,

It’s also a bit weird to accept normal writes for streamOptimized, but
I’m not sure whether that’s really a bug?  In any case, changing this
behavior would not be backwards-compatible...  Should we deprecate
normal writes to streamOptimized?

Max
John Snow Aug. 12, 2019, 9:16 p.m. UTC | #3
On 8/12/19 5:03 PM, Max Reitz wrote:
> On 12.08.19 22:26, John Snow wrote:
>>
>>
>> On 7/25/19 11:57 AM, Max Reitz wrote:
>>> Compressed writes generally have to write full clusters, not just in
>>> theory but also in practice when it comes to vmdk's streamOptimized
>>> subformat.  It currently is just silently broken for writes with
>>> non-zero in-cluster offsets:
>>>
>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>>> wrote 4096/4096 bytes at offset 4096
>>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>>> read failed: Invalid argument
>>>
>>> (The technical reason is that vmdk_write_extent() just writes the
>>> incomplete compressed data actually to offset 4k.  When reading the
>>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>>> size to be 0, because that is what it reads from there.  This yields an
>>> error.)
>>>
>>> For incomplete writes with zero in-cluster offsets, the error path when
>>> reading the rest of the cluster is a bit different, but the result is
>>> the same:
>>>
>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>>> wrote 4096/4096 bytes at offset 0
>>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>>> read failed: Invalid argument
>>>
>>> (Here, vmdk_read_extent() finds the data and then sees that the
>>> uncompressed data is short.)
>>>
>>> It is better to reject invalid writes than to make the user believe they
>>> might have succeeded and then fail when trying to read it back.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  block/vmdk.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/block/vmdk.c b/block/vmdk.c
>>> index db6acfc31e..641acacfe0 100644
>>> --- a/block/vmdk.c
>>> +++ b/block/vmdk.c
>>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
>>>      if (extent->compressed) {
>>>          void *compressed_data;
>>>  
>>> +        /* Only whole clusters */
>>> +        if (offset_in_cluster ||
>>> +            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>>> +            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>>> +             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>>> +        {
>>> +            ret = -EINVAL;
>>> +            goto out;
>>> +        }
>>> +
>>>          if (!extent->has_marker) {
>>>              ret = -EINVAL;
>>>              goto out;
>>>
>>
>> What does this look like from a guest's perspective? Is there something
>> that enforces the alignment in the graph for us?
>>
>> Or is it the case that indeed guests (or users via qemu-io) can request
>> invalid writes and we will halt the VM in those cases (in preference to
>> corrupting the disk)?
> 
> Have you ever tried using a streamOptimized VMDK disk with a guest?
> 

Nope! It's why I'm asking. I have no idea what the whole picture before
and after is.

> I haven’t, but I know that it won’t work. O:-)  If you try to write to
> an already allocated cluster, you’ll get an EIO and an error message via
> error_report() (“Could not write to allocated cluster for
> streamOptimized”).  So really, the only use of streamOptimized is as a
> qemu-img convert source/target, or as a backup/mirror target.  (Just
> like compressed clusters in qcow2 images.)
> 

OK, makes sense. Someone's going to try to use it in cases where it
doesn't make sense though, for sure.

> I suppose if I introduced streamOptimized support today, I wouldn’t just
> forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
> make vmdk_co_pwritev_compressed() only work on streamOptimized images,
> and vmdk_co_pwritev() only on everything else.  Then it would be more clear.
> 
> Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
> writes for any subformat, even if it doesn’t support compression.  So if
> you use -c and convert to vmdk, it will succeed, but the result won’t be
> compressed,
> 
> It’s also a bit weird to accept normal writes for streamOptimized, but
> I’m not sure whether that’s really a bug?  In any case, changing this
> behavior would not be backwards-compatible...  Should we deprecate
> normal writes to streamOptimized?
> 

If it's supposed to be the case that streamOptimized *only* gets
compressed, aligned writes -- it probably is a bug to do normal,
uncompressed writes, isn't it? Does that produce a usable image?

> Max
> 

Anyway, I'm fine with this patch because things aren't any worse, and
our support for non-native formats has always been kind of best-attempt.

Reviewed-by: John Snow <jsnow@redhat.com>
Max Reitz Aug. 13, 2019, 12:58 p.m. UTC | #4
On 12.08.19 23:16, John Snow wrote:
> 
> 
> On 8/12/19 5:03 PM, Max Reitz wrote:
>> On 12.08.19 22:26, John Snow wrote:
>>>
>>>
>>> On 7/25/19 11:57 AM, Max Reitz wrote:
>>>> Compressed writes generally have to write full clusters, not just in
>>>> theory but also in practice when it comes to vmdk's streamOptimized
>>>> subformat.  It currently is just silently broken for writes with
>>>> non-zero in-cluster offsets:
>>>>
>>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>>>> wrote 4096/4096 bytes at offset 4096
>>>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>>>> read failed: Invalid argument
>>>>
>>>> (The technical reason is that vmdk_write_extent() just writes the
>>>> incomplete compressed data actually to offset 4k.  When reading the
>>>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>>>> size to be 0, because that is what it reads from there.  This yields an
>>>> error.)
>>>>
>>>> For incomplete writes with zero in-cluster offsets, the error path when
>>>> reading the rest of the cluster is a bit different, but the result is
>>>> the same:
>>>>
>>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>>>> wrote 4096/4096 bytes at offset 0
>>>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>>>> read failed: Invalid argument
>>>>
>>>> (Here, vmdk_read_extent() finds the data and then sees that the
>>>> uncompressed data is short.)
>>>>
>>>> It is better to reject invalid writes than to make the user believe they
>>>> might have succeeded and then fail when trying to read it back.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/vmdk.c | 10 ++++++++++
>>>>  1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/block/vmdk.c b/block/vmdk.c
>>>> index db6acfc31e..641acacfe0 100644
>>>> --- a/block/vmdk.c
>>>> +++ b/block/vmdk.c
>>>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
>>>>      if (extent->compressed) {
>>>>          void *compressed_data;
>>>>  
>>>> +        /* Only whole clusters */
>>>> +        if (offset_in_cluster ||
>>>> +            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>>>> +            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>>>> +             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>>>> +        {
>>>> +            ret = -EINVAL;
>>>> +            goto out;
>>>> +        }
>>>> +
>>>>          if (!extent->has_marker) {
>>>>              ret = -EINVAL;
>>>>              goto out;
>>>>
>>>
>>> What does this look like from a guest's perspective? Is there something
>>> that enforces the alignment in the graph for us?
>>>
>>> Or is it the case that indeed guests (or users via qemu-io) can request
>>> invalid writes and we will halt the VM in those cases (in preference to
>>> corrupting the disk)?
>>
>> Have you ever tried using a streamOptimized VMDK disk with a guest?
>>
> 
> Nope! It's why I'm asking. I have no idea what the whole picture before
> and after is.
> 
>> I haven’t, but I know that it won’t work. O:-)  If you try to write to
>> an already allocated cluster, you’ll get an EIO and an error message via
>> error_report() (“Could not write to allocated cluster for
>> streamOptimized”).  So really, the only use of streamOptimized is as a
>> qemu-img convert source/target, or as a backup/mirror target.  (Just
>> like compressed clusters in qcow2 images.)
>>
> 
> OK, makes sense. Someone's going to try to use it in cases where it
> doesn't make sense though, for sure.
> 
>> I suppose if I introduced streamOptimized support today, I wouldn’t just
>> forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
>> make vmdk_co_pwritev_compressed() only work on streamOptimized images,
>> and vmdk_co_pwritev() only on everything else.  Then it would be more clear.
>>
>> Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
>> writes for any subformat, even if it doesn’t support compression.  So if
>> you use -c and convert to vmdk, it will succeed, but the result won’t be
>> compressed,
>>
>> It’s also a bit weird to accept normal writes for streamOptimized, but
>> I’m not sure whether that’s really a bug?  In any case, changing this
>> behavior would not be backwards-compatible...  Should we deprecate
>> normal writes to streamOptimized?
>>
> 
> If it's supposed to be the case that streamOptimized *only* gets
> compressed, aligned writes -- it probably is a bug to do normal,
> uncompressed writes, isn't it? Does that produce a usable image?

Well, all writes are silently done as compressed writes.  (With the
alignment requirements added in this patch.)  The image is useful, as
long as you only full clusters, and you may only write to each cluster
once...

>> Max
>>
> 
> Anyway, I'm fine with this patch because things aren't any worse, and
> our support for non-native formats has always been kind of best-attempt.
> 
> Reviewed-by: John Snow <jsnow@redhat.com>

Thanks. :-)

Max
diff mbox series

Patch

diff --git a/block/vmdk.c b/block/vmdk.c
index db6acfc31e..641acacfe0 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1731,6 +1731,16 @@  static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
     if (extent->compressed) {
         void *compressed_data;
 
+        /* Only whole clusters */
+        if (offset_in_cluster ||
+            n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
+            (n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
+             offset + n_bytes != extent->end_sector * SECTOR_SIZE))
+        {
+            ret = -EINVAL;
+            goto out;
+        }
+
         if (!extent->has_marker) {
             ret = -EINVAL;
             goto out;