Message ID | 1459513321-3776-1-git-send-email-olaf@aepfle.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01.04.2016 14:22, Olaf Hering wrote: > Large discard requests lead to sign expansion errors in qemu. > Since there is no API to tell a guest about the limitations qmeu > has to split a large request itself. > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > Cc: Stefan Hajnoczi <stefanha@redhat.com> > Cc: Kevin Wolf <kwolf@redhat.com> > --- > block/io.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) Hi! Let me first nag about some style problems, and then I'll come to the technical/design aspect. :-) > diff --git a/block/io.c b/block/io.c > index c4869b9..5b6ed58 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2442,7 +2442,7 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) > rwco->ret = bdrv_co_discard(rwco->bs, rwco->sector_num, rwco->nb_sectors); > } > > -int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > +static int __bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, Two things: First, I think we do not like identifiers to start with underscores, especially not with two underscores or with underscore and a capital letter, because in C those combinations are generally reserved for compiler/language extensions or for the operating system (think of __attribute__(), __asm__(), or _Bool). Second, the coroutine_fn should stay. This is just an empty macro (I think) that signifies that a certain function is run inside of a coroutine. That fact will not change if you put a wrapper around it. > int nb_sectors) This should be aligned to the opening paranthesis of the line above. > { > BdrvTrackedRequest req; > @@ -2524,6 +2524,26 @@ out: > return ret; > } > > +int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > + int nb_sectors) > +{ > + int num, ret; > + int limit = BDRV_REQUEST_MAX_SECTORS; > + int remaining = nb_sectors; > + int64_t sector_offset = sector_num; > + > + do { > + num = remaining > limit ? limit : remaining; num = MIN(limit, remaining); would work just as fine and maybe look a bit better. > + ret = __bdrv_co_discard(bs, sector_offset, num); > + if (ret < 0) > + break; In qemu, every block is enclosed by {} braces, even if it contains only a single statement. This is something that the checkpatch script (scripts/checkpatch.pl in the qemu tree) can detect. It is advisible to run that scripts on patches before they are sent to the mailing list. > + remaining -= num; > + sector_offset += num; > + } while (remaining > 0); > + > + return ret; > +} > + > int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) > { > Coroutine *co; > So, now about the technical aspect: Guest requests are not issued to BlockDriverStates directly, they pass through a BlockBackend first. The check whether the request is too large is done there (in blk_check_request() called by blk_co_discard() and blk_aio_discard()). Thus, if a discard request exceeds BDRV_REQUEST_MAX_SECTORS, it will be denied at the BlockBackend level and not reach bdrv_co_discard() at all. At least that is how it's supposed to be. If it isn't, that's a bug. ;-) Finally, I'm not sure whether this is something that should be done in the block layer. I personally feel like it is the device models' responsibility to only submit requests that adhere to the limits established by the block layer. In any case, do you have a test case where a guest was able to submit a request that led to the overflow error you described in the commit message? Max
On Fri, Apr 01, Max Reitz wrote: > In any case, do you have a test case where a guest was able to submit a > request that led to the overflow error you described in the commit message? mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device. When I added discard support to libxl I worked with raw images, so I did not notice this. Not sure why it happens to work in kvm guests. I assume the frontend driver just works around the qemu bug by limiting its request size. Olaf
On 01.04.2016 19:49, Olaf Hering wrote: > On Fri, Apr 01, Max Reitz wrote: > >> In any case, do you have a test case where a guest was able to submit a >> request that led to the overflow error you described in the commit message? > > mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device. > When I added discard support to libxl I worked with raw images, so I did > not notice this. Not sure why it happens to work in kvm guests. I assume > the frontend driver just works around the qemu bug by limiting its > request size. Sorry for not having replied in so long. I know next to nothing about Xen, but I'm very much inclined to think the Xen block driver (hw/block/xen_disk.c) is at fault here. The blkif_request_discard structure it uses for accepting discard requests apparently has a uint64_t nr_sectors field. So I think using the Xen block device, it is possible for a guest to submit discard requests with more then INT_MAX sectors involved, and it's the Xen's block device emulation code job to split those requests into smaller ones. That said, I'm not sure this is ideal. It doesn't really look like other block drivers care so much about splitting requests either. So we're a bit in a pickle there. Anyway, for your issue at hand I think there is a simpler solution. bdrv_co_discard() can and already does split requests. So if we remove the calls to blk_check_request() in blk_discard() and bdrv_check_request() in bdrv_co_discard(), everything should already work just fine. Therefore, I think what could work for now is for blk_discard() and bdrv_co_discard() to modify the checks they are doing on the incoming request. In theory, all we need to do is skip the length check for both, i.e. accept requests even if they are longer than INT_MAX / BDRV_SECTOR_SIZE sectors. I'm not sure how we can do that nicely in practice, though. Max
On Fri, May 06, Max Reitz wrote: > On 01.04.2016 19:49, Olaf Hering wrote: > > On Fri, Apr 01, Max Reitz wrote: > > > >> In any case, do you have a test case where a guest was able to submit a > >> request that led to the overflow error you described in the commit message? > > > > mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device. > > When I added discard support to libxl I worked with raw images, so I did > > not notice this. Not sure why it happens to work in kvm guests. I assume > > the frontend driver just works around the qemu bug by limiting its > > request size. > > Sorry for not having replied in so long. > > I know next to nothing about Xen, but I'm very much inclined to think > the Xen block driver (hw/block/xen_disk.c) is at fault here. The > blkif_request_discard structure it uses for accepting discard requests > apparently has a uint64_t nr_sectors field. Thanks for the pointer. Looking at current master, BLKIF_OP_DISCARD is indeed broken. The values passed from the guest are u64 and get stashed into signed values. I will add a loop to repeatedly call blk_aio_pdiscard with small chunks of BDRV_REQUEST_MAX_SECTORS. We quickly checked other users of blk_aio_pdiscard and it appears they are not affected because they notify the guest abuilt the limits. Olaf
diff --git a/block/io.c b/block/io.c index c4869b9..5b6ed58 100644 --- a/block/io.c +++ b/block/io.c @@ -2442,7 +2442,7 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) rwco->ret = bdrv_co_discard(rwco->bs, rwco->sector_num, rwco->nb_sectors); } -int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, +static int __bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { BdrvTrackedRequest req; @@ -2524,6 +2524,26 @@ out: return ret; } +int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, + int nb_sectors) +{ + int num, ret; + int limit = BDRV_REQUEST_MAX_SECTORS; + int remaining = nb_sectors; + int64_t sector_offset = sector_num; + + do { + num = remaining > limit ? limit : remaining; + ret = __bdrv_co_discard(bs, sector_offset, num); + if (ret < 0) + break; + remaining -= num; + sector_offset += num; + } while (remaining > 0); + + return ret; +} + int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { Coroutine *co;
Large discard requests lead to sign expansion errors in qemu. Since there is no API to tell a guest about the limitations qmeu has to split a large request itself. Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)