Message ID | 20240712140716.517911-1-f.ebner@proxmox.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] block/reqlist: allow adding overlapping requests | expand |
On 12.07.24 17:07, Fiona Ebner wrote: > Allow overlapping request by removing the assert that made it > impossible. There are only two callers: > > 1. block_copy_task_create() > > It already asserts the very same condition before calling > reqlist_init_req(). > > 2. cbw_snapshot_read_lock() > > There is no need to have read requests be non-overlapping in > copy-before-write when used for snapshot-access. In fact, there was no > protection against two callers of cbw_snapshot_read_lock() calling > reqlist_init_req() with overlapping ranges and this could lead to an > assertion failure [1]. > > In particular, with the reproducer script below [0], two > cbw_co_snapshot_block_status() callers could race, with the second > calling reqlist_init_req() before the first one finishes and removes > its conflicting request. > > [0]: > >> #!/bin/bash -e >> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 >> ./qemu-img create /tmp/fleecing.raw -f raw 1G >> ( >> ./qemu-system-x86_64 --qmp stdio \ >> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ >> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ >> <<EOF >> {"execute": "qmp_capabilities"} >> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } >> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } >> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } >> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} >> EOF >> ) & >> sleep 5 >> while true; do >> ./qemu-nbd -d /dev/nbd0 >> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r >> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' >> done > > [1]: > >> #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 >> #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 >> #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 >> #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 >> #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 >> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 >> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 >> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 >> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 >> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 >> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 >> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 >> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 >> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 >> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 > > Cc: qemu-stable@nongnu.org > Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> > --- > > Changes in v2: > * different approach, allowing overlapping requests for > copy-before-write rather than waiting for them. block-copy already > asserts there are no conflicts before adding a request. > > block/copy-before-write.c | 3 ++- > block/reqlist.c | 2 -- > 2 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/block/copy-before-write.c b/block/copy-before-write.c > index 853e01a1eb..28f6a096cd 100644 > --- a/block/copy-before-write.c > +++ b/block/copy-before-write.c > @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState { > > /* > * @frozen_read_reqs: current read requests for fleecing user in bs->file > - * node. These areas must not be rewritten by guest. > + * node. These areas must not be rewritten by guest. There can be multiple > + * overlapping read requests. > */ > BlockReqList frozen_read_reqs; > > diff --git a/block/reqlist.c b/block/reqlist.c > index 08cb57cfa4..098e807378 100644 > --- a/block/reqlist.c > +++ b/block/reqlist.c > @@ -20,8 +20,6 @@ > void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset, > int64_t bytes) > { > - assert(!reqlist_find_conflict(reqs, offset, bytes)); > - > *req = (BlockReq) { > .offset = offset, > .bytes = bytes, Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Thanks, applied to my block branch
12.07.2024 17:07, Fiona Ebner wrote: > Allow overlapping request by removing the assert that made it > impossible. There are only two callers: > > 1. block_copy_task_create() > > It already asserts the very same condition before calling > reqlist_init_req(). > > 2. cbw_snapshot_read_lock() > > There is no need to have read requests be non-overlapping in > copy-before-write when used for snapshot-access. In fact, there was no > protection against two callers of cbw_snapshot_read_lock() calling > reqlist_init_req() with overlapping ranges and this could lead to an > assertion failure [1]. > > In particular, with the reproducer script below [0], two > cbw_co_snapshot_block_status() callers could race, with the second > calling reqlist_init_req() before the first one finishes and removes > its conflicting request. > > [0]: > >> #!/bin/bash -e >> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 >> ./qemu-img create /tmp/fleecing.raw -f raw 1G >> ( >> ./qemu-system-x86_64 --qmp stdio \ >> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ >> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ >> <<EOF >> {"execute": "qmp_capabilities"} >> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } >> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } >> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } >> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} >> EOF >> ) & >> sleep 5 >> while true; do >> ./qemu-nbd -d /dev/nbd0 >> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r >> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' >> done > > [1]: > >> #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 >> #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 >> #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 >> #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 >> #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 >> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 >> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 >> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 >> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 >> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 >> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 >> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 >> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 >> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 >> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 > > Cc: qemu-stable@nongnu.org > Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Hi! Has this been forgotten or is it not needed for 9.1? Thanks, /mjt > > Changes in v2: > * different approach, allowing overlapping requests for > copy-before-write rather than waiting for them. block-copy already > asserts there are no conflicts before adding a request. > > block/copy-before-write.c | 3 ++- > block/reqlist.c | 2 -- > 2 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/block/copy-before-write.c b/block/copy-before-write.c > index 853e01a1eb..28f6a096cd 100644 > --- a/block/copy-before-write.c > +++ b/block/copy-before-write.c > @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState { > > /* > * @frozen_read_reqs: current read requests for fleecing user in bs->file > - * node. These areas must not be rewritten by guest. > + * node. These areas must not be rewritten by guest. There can be multiple > + * overlapping read requests. > */ > BlockReqList frozen_read_reqs; > > diff --git a/block/reqlist.c b/block/reqlist.c > index 08cb57cfa4..098e807378 100644 > --- a/block/reqlist.c > +++ b/block/reqlist.c > @@ -20,8 +20,6 @@ > void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset, > int64_t bytes) > { > - assert(!reqlist_find_conflict(reqs, offset, bytes)); > - > *req = (BlockReq) { > .offset = offset, > .bytes = bytes,
On 11.08.24 20:55, Michael Tokarev wrote: > 12.07.2024 17:07, Fiona Ebner wrote: >> Allow overlapping request by removing the assert that made it >> impossible. There are only two callers: >> >> 1. block_copy_task_create() >> >> It already asserts the very same condition before calling >> reqlist_init_req(). >> >> 2. cbw_snapshot_read_lock() >> >> There is no need to have read requests be non-overlapping in >> copy-before-write when used for snapshot-access. In fact, there was no >> protection against two callers of cbw_snapshot_read_lock() calling >> reqlist_init_req() with overlapping ranges and this could lead to an >> assertion failure [1]. >> >> In particular, with the reproducer script below [0], two >> cbw_co_snapshot_block_status() callers could race, with the second >> calling reqlist_init_req() before the first one finishes and removes >> its conflicting request. >> >> [0]: >> >>> #!/bin/bash -e >>> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 >>> ./qemu-img create /tmp/fleecing.raw -f raw 1G >>> ( >>> ./qemu-system-x86_64 --qmp stdio \ >>> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ >>> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ >>> <<EOF >>> {"execute": "qmp_capabilities"} >>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } >>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } >>> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } >>> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} >>> EOF >>> ) & >>> sleep 5 >>> while true; do >>> ./qemu-nbd -d /dev/nbd0 >>> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r >>> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' >>> done >> >> [1]: >> >>> #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 >>> #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 >>> #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 >>> #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 >>> #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 >>> #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 >>> #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 >>> #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 >>> #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 >>> #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 >>> #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 >>> #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 >>> #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 >>> #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 >>> #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 >> >> Cc: qemu-stable@nongnu.org >> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> > > Hi! > > Has this been forgotten or is it not needed for 9.1? > My apologies, this is forgotten. I think rc4 is too late, I'll send Pull request as soon as 9.2 window open.
diff --git a/block/copy-before-write.c b/block/copy-before-write.c index 853e01a1eb..28f6a096cd 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -66,7 +66,8 @@ typedef struct BDRVCopyBeforeWriteState { /* * @frozen_read_reqs: current read requests for fleecing user in bs->file - * node. These areas must not be rewritten by guest. + * node. These areas must not be rewritten by guest. There can be multiple + * overlapping read requests. */ BlockReqList frozen_read_reqs; diff --git a/block/reqlist.c b/block/reqlist.c index 08cb57cfa4..098e807378 100644 --- a/block/reqlist.c +++ b/block/reqlist.c @@ -20,8 +20,6 @@ void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset, int64_t bytes) { - assert(!reqlist_find_conflict(reqs, offset, bytes)); - *req = (BlockReq) { .offset = offset, .bytes = bytes,