[for-5.1,1/2] block: Require aligned image size to avoid assertion failure
diff mbox series

Message ID 20200710142149.40962-2-kwolf@redhat.com
State New
Headers show
Series
  • qemu-img convert: Fix abort with unaligned image size
Related show

Commit Message

Kevin Wolf July 10, 2020, 2:21 p.m. UTC
Unaligned requests will automatically be aligned to bl.request_alignment
and we don't want to extend requests to access space beyond the end of
the image, so it's required that the image size is aligned.

With write requests, this could cause assertion failures like this if
RESIZE permissions weren't requested:

qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.

This was e.g. triggered by qemu-img converting to a target image with 4k
request alignment when the image was only aligned to 512 bytes, but not
to 4k.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Eric Blake July 10, 2020, 2:37 p.m. UTC | #1
On 7/10/20 9:21 AM, Kevin Wolf wrote:
> Unaligned requests will automatically be aligned to bl.request_alignment
> and we don't want to extend requests to access space beyond the end of
> the image, so it's required that the image size is aligned.

Yep, that's what I've already done on nbd images.

nbdkit has '--filter=truncate' which rounds an image size up to 
alignment by reading the absent tail as zeros, and permitting writes 
that rewrite zero but failing with EIO any write that would attempt to 
change the tail.  We may eventually want that complexity in qemu's block 
layer for ALL drivers (as part of switching the block layer to 
byte-accurate sizing everywhere), but that's a LOT more effort.  The 
short term of just mandating alignment is much easier and still defensible.

> 
> With write requests, this could cause assertion failures like this if
> RESIZE permissions weren't requested:
> 
> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> 
> This was e.g. triggered by qemu-img converting to a target image with 4k
> request alignment when the image was only aligned to 512 bytes, but not
> to 4k.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block.c b/block.c
> index cc377d7ef3..c635777911 100644
> --- a/block.c
> +++ b/block.c
> @@ -1489,6 +1489,16 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
>           return -EINVAL;
>       }
>   
> +    /*
> +     * Unaligned requests will automatically be aligned to bl.request_alignment
> +     * and we don't want to extend requests to access space beyond the end of
> +     * the image, so it's required that the image size is aligned.
> +     */
> +    if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
> +        error_setg(errp, "Image size is not a multiple of request alignment");
> +        return -EINVAL;
> +    }
> +

Do we have any iotest coverage of this new message?  (If none of our 
existing tests broke, then you should add one...)
Max Reitz July 13, 2020, 11:19 a.m. UTC | #2
On 10.07.20 16:21, Kevin Wolf wrote:
> Unaligned requests will automatically be aligned to bl.request_alignment
> and we don't want to extend requests to access space beyond the end of
> the image, so it's required that the image size is aligned.
> 
> With write requests, this could cause assertion failures like this if
> RESIZE permissions weren't requested:
> 
> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> 
> This was e.g. triggered by qemu-img converting to a target image with 4k
> request alignment when the image was only aligned to 512 bytes, but not
> to 4k.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)

(I think we had some proposal like this before, but I can’t find it,
unfortunately...)

I can’t see how with this patch you could create qcow2 images and then
use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
specifying caching options, so AFAIU you’re stuck with:

$ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16

$ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
multiple of request alignment

(/mnt/tmp is a filesystem on a “losetup -b 4096” device.)

Or you use blockdev-create, that seems to work (because of course you
can set the cache mode on the protocol node when you open it for
formatting).  But, well, I think there should be a working qemu-img
create case.

Also, I’m afraid of breaking existing use cases with this patch (just
qemu-img create + using the image with cache=none).

Max
Max Reitz July 13, 2020, 11:52 a.m. UTC | #3
On 13.07.20 13:19, Max Reitz wrote:
> On 10.07.20 16:21, Kevin Wolf wrote:
>> Unaligned requests will automatically be aligned to bl.request_alignment
>> and we don't want to extend requests to access space beyond the end of
>> the image, so it's required that the image size is aligned.
>>
>> With write requests, this could cause assertion failures like this if
>> RESIZE permissions weren't requested:
>>
>> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
>>
>> This was e.g. triggered by qemu-img converting to a target image with 4k
>> request alignment when the image was only aligned to 512 bytes, but not
>> to 4k.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
> 
> (I think we had some proposal like this before, but I can’t find it,
> unfortunately...)

(Ah, here it is:

https://lists.nongnu.org/archive/html/qemu-devel/2020-03/msg03077.html

(Which interestingly teases yet another mysterious “we had a discussion
on this before”...))

> I can’t see how with this patch you could create qcow2 images and then
> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
> specifying caching options, so AFAIU you’re stuck with:
> 
> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
> 
> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
> multiple of request alignment
> 
> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)
> 
> Or you use blockdev-create, that seems to work (because of course you
> can set the cache mode on the protocol node when you open it for
> formatting).  But, well, I think there should be a working qemu-img
> create case.
> 
> Also, I’m afraid of breaking existing use cases with this patch (just
> qemu-img create + using the image with cache=none).
> 
> Max
>
Kevin Wolf July 13, 2020, 2:29 p.m. UTC | #4
Am 13.07.2020 um 13:19 hat Max Reitz geschrieben:
> On 10.07.20 16:21, Kevin Wolf wrote:
> > Unaligned requests will automatically be aligned to bl.request_alignment
> > and we don't want to extend requests to access space beyond the end of
> > the image, so it's required that the image size is aligned.
> > 
> > With write requests, this could cause assertion failures like this if
> > RESIZE permissions weren't requested:
> > 
> > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> > 
> > This was e.g. triggered by qemu-img converting to a target image with 4k
> > request alignment when the image was only aligned to 512 bytes, but not
> > to 4k.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> 
> (I think we had some proposal like this before, but I can’t find it,
> unfortunately...)
> 
> I can’t see how with this patch you could create qcow2 images and then
> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
> specifying caching options, so AFAIU you’re stuck with:
> 
> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
> 
> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
> multiple of request alignment
> 
> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)

Hm, that looks like some regrettable collateral damage...

Well, you could argue that we should be writing full L1 tables with zero
padding instead of just the used part. I thought we had fixed this long
ago. But looks like we haven't.

But we should still avoid crashing in other cases, so what is the
difference between both? Is it just that qcow2 has the RESIZE permission
anyway so it doesn't matter?

If so, maybe attaching to a block node with WRITE, but not RESIZE is
what needs to fail when the image size is unaligned?

> Or you use blockdev-create, that seems to work (because of course you
> can set the cache mode on the protocol node when you open it for
> formatting).  But, well, I think there should be a working qemu-img
> create case.
> 
> Also, I’m afraid of breaking existing use cases with this patch (just
> qemu-img create + using the image with cache=none).

I think for raw images, failure on start is better than crashing when
the VM is running. The qcow2 case needs to be fixed, of course.

Either case, I guess patch 2 can already be merged and would solve at
least the immediate bug report.

Kevin
Nir Soffer July 13, 2020, 4:33 p.m. UTC | #5
On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Unaligned requests will automatically be aligned to bl.request_alignment
> and we don't want to extend requests to access space beyond the end of
> the image, so it's required that the image size is aligned.
>
> With write requests, this could cause assertion failures like this if
> RESIZE permissions weren't requested:
>
> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
>
> This was e.g. triggered by qemu-img converting to a target image with 4k
> request alignment when the image was only aligned to 512 bytes, but not
> to 4k.

Was it on NFS? Shouldn't this be fix by the next patch then?

>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/block.c b/block.c
> index cc377d7ef3..c635777911 100644
> --- a/block.c
> +++ b/block.c
> @@ -1489,6 +1489,16 @@ static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
>          return -EINVAL;
>      }
>
> +    /*
> +     * Unaligned requests will automatically be aligned to bl.request_alignment
> +     * and we don't want to extend requests to access space beyond the end of
> +     * the image, so it's required that the image size is aligned.
> +     */
> +    if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
> +        error_setg(errp, "Image size is not a multiple of request alignment");
> +        return -EINVAL;
> +    }
> +
>      assert(bdrv_opt_mem_align(bs) != 0);
>      assert(bdrv_min_mem_align(bs) != 0);
>      assert(is_power_of_2(bs->bl.request_alignment));
> --
> 2.25.4
>
Kevin Wolf July 13, 2020, 4:56 p.m. UTC | #6
Am 13.07.2020 um 18:33 hat Nir Soffer geschrieben:
> On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Unaligned requests will automatically be aligned to bl.request_alignment
> > and we don't want to extend requests to access space beyond the end of
> > the image, so it's required that the image size is aligned.
> >
> > With write requests, this could cause assertion failures like this if
> > RESIZE permissions weren't requested:
> >
> > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> >
> > This was e.g. triggered by qemu-img converting to a target image with 4k
> > request alignment when the image was only aligned to 512 bytes, but not
> > to 4k.
> 
> Was it on NFS? Shouldn't this be fix by the next patch then?

Patch 2 makes the problem go away for NFS because NFS doesn't even
require the 4k alignment. But on storage that legitimately needs 4k
alignment (or possibly other filesystems that are misdetected), you
would still hit the same problem.

Kevin
Max Reitz July 14, 2020, 9:56 a.m. UTC | #7
On 13.07.20 16:29, Kevin Wolf wrote:
> Am 13.07.2020 um 13:19 hat Max Reitz geschrieben:
>> On 10.07.20 16:21, Kevin Wolf wrote:
>>> Unaligned requests will automatically be aligned to bl.request_alignment
>>> and we don't want to extend requests to access space beyond the end of
>>> the image, so it's required that the image size is aligned.
>>>
>>> With write requests, this could cause assertion failures like this if
>>> RESIZE permissions weren't requested:
>>>
>>> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
>>>
>>> This was e.g. triggered by qemu-img converting to a target image with 4k
>>> request alignment when the image was only aligned to 512 bytes, but not
>>> to 4k.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>
>> (I think we had some proposal like this before, but I can’t find it,
>> unfortunately...)
>>
>> I can’t see how with this patch you could create qcow2 images and then
>> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
>> specifying caching options, so AFAIU you’re stuck with:
>>
>> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
>> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
>> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
>>
>> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
>> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
>> multiple of request alignment
>>
>> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)
> 
> Hm, that looks like some regrettable collateral damage...
> 
> Well, you could argue that we should be writing full L1 tables with zero
> padding instead of just the used part. I thought we had fixed this long
> ago. But looks like we haven't.

That would help for the standard case.  It wouldn’t when the cluster
size is smaller than the request alignment, which, while maybe not
important, would still be a shame.

> But we should still avoid crashing in other cases, so what is the
> difference between both? Is it just that qcow2 has the RESIZE permission
> anyway so it doesn't matter?

I assume so.

> If so, maybe attaching to a block node with WRITE, but not RESIZE is
> what needs to fail when the image size is unaligned?

That sounds reasonable.

The obvious question is what happens when the RESIZE capability is
removed.  Dropping capabilities may never fail – I suppose we could
force-keep the RESIZE capability for such nodes?

Or we could immediately align such files to the block size once they are
opened (with the RESIZE capability).

>> Or you use blockdev-create, that seems to work (because of course you
>> can set the cache mode on the protocol node when you open it for
>> formatting).  But, well, I think there should be a working qemu-img
>> create case.
>>
>> Also, I’m afraid of breaking existing use cases with this patch (just
>> qemu-img create + using the image with cache=none).
> 
> I think for raw images, failure on start is better than crashing when
> the VM is running.

Agreed.

> The qcow2 case needs to be fixed, of course.
> 
> Either case, I guess patch 2 can already be merged and would solve at
> least the immediate bug report.

Also true.

Max
Kevin Wolf July 14, 2020, 11:08 a.m. UTC | #8
Am 14.07.2020 um 11:56 hat Max Reitz geschrieben:
> On 13.07.20 16:29, Kevin Wolf wrote:
> > Am 13.07.2020 um 13:19 hat Max Reitz geschrieben:
> >> On 10.07.20 16:21, Kevin Wolf wrote:
> >>> Unaligned requests will automatically be aligned to bl.request_alignment
> >>> and we don't want to extend requests to access space beyond the end of
> >>> the image, so it's required that the image size is aligned.
> >>>
> >>> With write requests, this could cause assertion failures like this if
> >>> RESIZE permissions weren't requested:
> >>>
> >>> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> >>>
> >>> This was e.g. triggered by qemu-img converting to a target image with 4k
> >>> request alignment when the image was only aligned to 512 bytes, but not
> >>> to 4k.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>  block.c | 10 ++++++++++
> >>>  1 file changed, 10 insertions(+)
> >>
> >> (I think we had some proposal like this before, but I can’t find it,
> >> unfortunately...)
> >>
> >> I can’t see how with this patch you could create qcow2 images and then
> >> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
> >> specifying caching options, so AFAIU you’re stuck with:
> >>
> >> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
> >> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
> >> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
> >>
> >> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
> >> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
> >> multiple of request alignment
> >>
> >> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)
> > 
> > Hm, that looks like some regrettable collateral damage...
> > 
> > Well, you could argue that we should be writing full L1 tables with zero
> > padding instead of just the used part. I thought we had fixed this long
> > ago. But looks like we haven't.
> 
> That would help for the standard case.  It wouldn’t when the cluster
> size is smaller than the request alignment, which, while maybe not
> important, would still be a shame.

I don't think it would be unreasonable to require a cluster size that is
a multiple of the logical block size of your host storage if you want to
use O_DIRECT.

But we have unaligned images in practice, so this is pure theory anyway.

> > But we should still avoid crashing in other cases, so what is the
> > difference between both? Is it just that qcow2 has the RESIZE permission
> > anyway so it doesn't matter?
> 
> I assume so.
> 
> > If so, maybe attaching to a block node with WRITE, but not RESIZE is
> > what needs to fail when the image size is unaligned?
> 
> That sounds reasonable.
> 
> The obvious question is what happens when the RESIZE capability is
> removed.  Dropping capabilities may never fail – I suppose we could
> force-keep the RESIZE capability for such nodes?

It's not nice, but I think we already have this kind of behaviour for
unlocking failures. So yes, that sounds like an option.

> Or we could immediately align such files to the block size once they
> are opened (with the RESIZE capability).

Automatically resizing the image file is obviously harmless for qcow2
images, but it would be a guest-visible change for raw images. It might
be better to avoid this.

Kevin
Max Reitz July 14, 2020, 4:22 p.m. UTC | #9
On 14.07.20 13:08, Kevin Wolf wrote:
> Am 14.07.2020 um 11:56 hat Max Reitz geschrieben:
>> On 13.07.20 16:29, Kevin Wolf wrote:
>>> Am 13.07.2020 um 13:19 hat Max Reitz geschrieben:
>>>> On 10.07.20 16:21, Kevin Wolf wrote:
>>>>> Unaligned requests will automatically be aligned to bl.request_alignment
>>>>> and we don't want to extend requests to access space beyond the end of
>>>>> the image, so it's required that the image size is aligned.
>>>>>
>>>>> With write requests, this could cause assertion failures like this if
>>>>> RESIZE permissions weren't requested:
>>>>>
>>>>> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
>>>>>
>>>>> This was e.g. triggered by qemu-img converting to a target image with 4k
>>>>> request alignment when the image was only aligned to 512 bytes, but not
>>>>> to 4k.
>>>>>
>>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>>>> ---
>>>>>  block.c | 10 ++++++++++
>>>>>  1 file changed, 10 insertions(+)
>>>>
>>>> (I think we had some proposal like this before, but I can’t find it,
>>>> unfortunately...)
>>>>
>>>> I can’t see how with this patch you could create qcow2 images and then
>>>> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
>>>> specifying caching options, so AFAIU you’re stuck with:
>>>>
>>>> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
>>>> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
>>>> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
>>>>
>>>> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
>>>> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
>>>> multiple of request alignment
>>>>
>>>> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)
>>>
>>> Hm, that looks like some regrettable collateral damage...
>>>
>>> Well, you could argue that we should be writing full L1 tables with zero
>>> padding instead of just the used part. I thought we had fixed this long
>>> ago. But looks like we haven't.
>>
>> That would help for the standard case.  It wouldn’t when the cluster
>> size is smaller than the request alignment, which, while maybe not
>> important, would still be a shame.
> 
> I don't think it would be unreasonable to require a cluster size that is
> a multiple of the logical block size of your host storage if you want to
> use O_DIRECT.

True.

> But we have unaligned images in practice, so this is pure theory anyway.

Hm.  Maybe it would help to just adjust the error message to instruct
the user to resize the image to fit the request alignment?  (e.g. “is
not a multiple of the request alignment %u (try resizing the image to
%llu bytes)”)

>>> But we should still avoid crashing in other cases, so what is the
>>> difference between both? Is it just that qcow2 has the RESIZE permission
>>> anyway so it doesn't matter?
>>
>> I assume so.
>>
>>> If so, maybe attaching to a block node with WRITE, but not RESIZE is
>>> what needs to fail when the image size is unaligned?
>>
>> That sounds reasonable.
>>
>> The obvious question is what happens when the RESIZE capability is
>> removed.  Dropping capabilities may never fail – I suppose we could
>> force-keep the RESIZE capability for such nodes?
> 
> It's not nice, but I think we already have this kind of behaviour for
> unlocking failures. So yes, that sounds like an option.
> 
>> Or we could immediately align such files to the block size once they
>> are opened (with the RESIZE capability).
> 
> Automatically resizing the image file is obviously harmless for qcow2
> images, but it would be a guest-visible change for raw images. It might
> be better to avoid this.

Well, it seems to be what already happens if the guest device has taken
the RESIZE capability (i.e., whenever there’s no failing assertion).
The only difference that appears to me is just that it happens only when
writing to the end of the image instead of unconditionally when opening it.

Max
Kevin Wolf July 15, 2020, 9:20 a.m. UTC | #10
Am 14.07.2020 um 18:22 hat Max Reitz geschrieben:
> On 14.07.20 13:08, Kevin Wolf wrote:
> > Am 14.07.2020 um 11:56 hat Max Reitz geschrieben:
> >> On 13.07.20 16:29, Kevin Wolf wrote:
> >>> Am 13.07.2020 um 13:19 hat Max Reitz geschrieben:
> >>>> On 10.07.20 16:21, Kevin Wolf wrote:
> >>>>> Unaligned requests will automatically be aligned to bl.request_alignment
> >>>>> and we don't want to extend requests to access space beyond the end of
> >>>>> the image, so it's required that the image size is aligned.
> >>>>>
> >>>>> With write requests, this could cause assertion failures like this if
> >>>>> RESIZE permissions weren't requested:
> >>>>>
> >>>>> qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> >>>>>
> >>>>> This was e.g. triggered by qemu-img converting to a target image with 4k
> >>>>> request alignment when the image was only aligned to 512 bytes, but not
> >>>>> to 4k.
> >>>>>
> >>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>>>> ---
> >>>>>  block.c | 10 ++++++++++
> >>>>>  1 file changed, 10 insertions(+)
> >>>>
> >>>> (I think we had some proposal like this before, but I can’t find it,
> >>>> unfortunately...)
> >>>>
> >>>> I can’t see how with this patch you could create qcow2 images and then
> >>>> use them with direct I/O, because AFAICS, qemu-img create doesn’t allow
> >>>> specifying caching options, so AFAIU you’re stuck with:
> >>>>
> >>>> $ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 1M
> >>>> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 cluster_size=65536
> >>>> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
> >>>>
> >>>> $ sudo ./qemu-io -t none /mnt/tmp/foo.qcow2
> >>>> qemu-io: can't open device /mnt/tmp/foo.qcow2: Image size is not a
> >>>> multiple of request alignment
> >>>>
> >>>> (/mnt/tmp is a filesystem on a “losetup -b 4096” device.)
> >>>
> >>> Hm, that looks like some regrettable collateral damage...
> >>>
> >>> Well, you could argue that we should be writing full L1 tables with zero
> >>> padding instead of just the used part. I thought we had fixed this long
> >>> ago. But looks like we haven't.
> >>
> >> That would help for the standard case.  It wouldn’t when the cluster
> >> size is smaller than the request alignment, which, while maybe not
> >> important, would still be a shame.
> > 
> > I don't think it would be unreasonable to require a cluster size that is
> > a multiple of the logical block size of your host storage if you want to
> > use O_DIRECT.
> 
> True.
> 
> > But we have unaligned images in practice, so this is pure theory anyway.
> 
> Hm.  Maybe it would help to just adjust the error message to instruct
> the user to resize the image to fit the request alignment?  (e.g. “is
> not a multiple of the request alignment %u (try resizing the image to
> %llu bytes)”)

This would require management tools to automatically do this or we would
break any users that don't manually invoke QEMU. I don't think this is a
realistic option, especially since "management tools" must probably
include all those one-off shell scripts that people use.

> >>> But we should still avoid crashing in other cases, so what is the
> >>> difference between both? Is it just that qcow2 has the RESIZE permission
> >>> anyway so it doesn't matter?
> >>
> >> I assume so.
> >>
> >>> If so, maybe attaching to a block node with WRITE, but not RESIZE is
> >>> what needs to fail when the image size is unaligned?
> >>
> >> That sounds reasonable.
> >>
> >> The obvious question is what happens when the RESIZE capability is
> >> removed.  Dropping capabilities may never fail – I suppose we could
> >> force-keep the RESIZE capability for such nodes?
> > 
> > It's not nice, but I think we already have this kind of behaviour for
> > unlocking failures. So yes, that sounds like an option.
> > 
> >> Or we could immediately align such files to the block size once they
> >> are opened (with the RESIZE capability).
> > 
> > Automatically resizing the image file is obviously harmless for qcow2
> > images, but it would be a guest-visible change for raw images. It might
> > be better to avoid this.
> 
> Well, it seems to be what already happens if the guest device has taken
> the RESIZE capability (i.e., whenever there’s no failing assertion).
> The only difference that appears to me is just that it happens only when
> writing to the end of the image instead of unconditionally when opening it.

I would have considered this as part of the bug rather than a desirable
future behaviour. blk_check_byte_request() tries to catch any request
going past EOF, it just doesn't know anything about request_alignment.

Kevin
Nir Soffer July 15, 2020, 1:22 p.m. UTC | #11
On Mon, Jul 13, 2020 at 7:56 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 13.07.2020 um 18:33 hat Nir Soffer geschrieben:
> > On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Unaligned requests will automatically be aligned to bl.request_alignment
> > > and we don't want to extend requests to access space beyond the end of
> > > the image, so it's required that the image size is aligned.
> > >
> > > With write requests, this could cause assertion failures like this if
> > > RESIZE permissions weren't requested:
> > >
> > > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> > >
> > > This was e.g. triggered by qemu-img converting to a target image with 4k
> > > request alignment when the image was only aligned to 512 bytes, but not
> > > to 4k.
> >
> > Was it on NFS? Shouldn't this be fix by the next patch then?
>
> Patch 2 makes the problem go away for NFS because NFS doesn't even
> require the 4k alignment. But on storage that legitimately needs 4k
> alignment (or possibly other filesystems that are misdetected), you
> would still hit the same problem.

I want to add oVirt point of view on this. We enforce raw image
alignment of 4k on
file based storage, and 128m on block storage, so our raw images cannot have
this issue.

We have an issue with empty qcow2 images which are unaligned size, but we don't
create such images in normal flows.

Nir
Kevin Wolf July 15, 2020, 1:42 p.m. UTC | #12
Am 15.07.2020 um 15:22 hat Nir Soffer geschrieben:
> On Mon, Jul 13, 2020 at 7:56 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 13.07.2020 um 18:33 hat Nir Soffer geschrieben:
> > > On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > >
> > > > Unaligned requests will automatically be aligned to bl.request_alignment
> > > > and we don't want to extend requests to access space beyond the end of
> > > > the image, so it's required that the image size is aligned.
> > > >
> > > > With write requests, this could cause assertion failures like this if
> > > > RESIZE permissions weren't requested:
> > > >
> > > > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> > > >
> > > > This was e.g. triggered by qemu-img converting to a target image with 4k
> > > > request alignment when the image was only aligned to 512 bytes, but not
> > > > to 4k.
> > >
> > > Was it on NFS? Shouldn't this be fix by the next patch then?
> >
> > Patch 2 makes the problem go away for NFS because NFS doesn't even
> > require the 4k alignment. But on storage that legitimately needs 4k
> > alignment (or possibly other filesystems that are misdetected), you
> > would still hit the same problem.
> 
> I want to add oVirt point of view on this. We enforce raw image
> alignment of 4k on file based storage, and 128m on block storage, so
> our raw images cannot have this issue.

Yes, then you won't hit the problem.

> We have an issue with empty qcow2 images which are unaligned size, but
> we don't create such images in normal flows.

Can you give a reproducer where qcow2 images would be affected?
Generally speaking, the qcow2 driver either takes both WRITE and RESIZE
permissions or neither. So it should just automatically resize the image
as needed instead of crashing.

Kevin
Daniel P. Berrangé July 15, 2020, 2:03 p.m. UTC | #13
On Wed, Jul 15, 2020 at 04:22:06PM +0300, Nir Soffer wrote:
> On Mon, Jul 13, 2020 at 7:56 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 13.07.2020 um 18:33 hat Nir Soffer geschrieben:
> > > On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > >
> > > > Unaligned requests will automatically be aligned to bl.request_alignment
> > > > and we don't want to extend requests to access space beyond the end of
> > > > the image, so it's required that the image size is aligned.
> > > >
> > > > With write requests, this could cause assertion failures like this if
> > > > RESIZE permissions weren't requested:
> > > >
> > > > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> > > >
> > > > This was e.g. triggered by qemu-img converting to a target image with 4k
> > > > request alignment when the image was only aligned to 512 bytes, but not
> > > > to 4k.
> > >
> > > Was it on NFS? Shouldn't this be fix by the next patch then?
> >
> > Patch 2 makes the problem go away for NFS because NFS doesn't even
> > require the 4k alignment. But on storage that legitimately needs 4k
> > alignment (or possibly other filesystems that are misdetected), you
> > would still hit the same problem.
> 
> I want to add oVirt point of view on this. We enforce raw image
> alignment of 4k on
> file based storage, and 128m on block storage, so our raw images cannot have
> this issue.

OpenStack should have minimium alignment of 1 GB for image sizes, so
this change is also no trouble for it.

Regards,
Daniel
Nir Soffer July 15, 2020, 2:03 p.m. UTC | #14
On Wed, Jul 15, 2020 at 4:42 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 15.07.2020 um 15:22 hat Nir Soffer geschrieben:
> > On Mon, Jul 13, 2020 at 7:56 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 13.07.2020 um 18:33 hat Nir Soffer geschrieben:
> > > > On Fri, Jul 10, 2020 at 5:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > > >
> > > > > Unaligned requests will automatically be aligned to bl.request_alignment
> > > > > and we don't want to extend requests to access space beyond the end of
> > > > > the image, so it's required that the image size is aligned.
> > > > >
> > > > > With write requests, this could cause assertion failures like this if
> > > > > RESIZE permissions weren't requested:
> > > > >
> > > > > qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
> > > > >
> > > > > This was e.g. triggered by qemu-img converting to a target image with 4k
> > > > > request alignment when the image was only aligned to 512 bytes, but not
> > > > > to 4k.
> > > >
> > > > Was it on NFS? Shouldn't this be fix by the next patch then?
> > >
> > > Patch 2 makes the problem go away for NFS because NFS doesn't even
> > > require the 4k alignment. But on storage that legitimately needs 4k
> > > alignment (or possibly other filesystems that are misdetected), you
> > > would still hit the same problem.
> >
> > I want to add oVirt point of view on this. We enforce raw image
> > alignment of 4k on file based storage, and 128m on block storage, so
> > our raw images cannot have this issue.
>
> Yes, then you won't hit the problem.
>
> > We have an issue with empty qcow2 images which are unaligned size, but
> > we don't create such images in normal flows.
>
> Can you give a reproducer where qcow2 images would be affected?
> Generally speaking, the qcow2 driver either takes both WRITE and RESIZE
> permissions or neither. So it should just automatically resize the image
> as needed instead of crashing.

I think this is a theoretical issue in other programs trying to access
the unaligned
images using direct I/O.

Patch
diff mbox series

diff --git a/block.c b/block.c
index cc377d7ef3..c635777911 100644
--- a/block.c
+++ b/block.c
@@ -1489,6 +1489,16 @@  static int bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv,
         return -EINVAL;
     }
 
+    /*
+     * Unaligned requests will automatically be aligned to bl.request_alignment
+     * and we don't want to extend requests to access space beyond the end of
+     * the image, so it's required that the image size is aligned.
+     */
+    if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
+        error_setg(errp, "Image size is not a multiple of request alignment");
+        return -EINVAL;
+    }
+
     assert(bdrv_opt_mem_align(bs) != 0);
     assert(bdrv_min_mem_align(bs) != 0);
     assert(is_power_of_2(bs->bl.request_alignment));