Message ID | 20230203152202.49054-1-kwolf@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | block: Lock the graph, part 2 (BlockDriver callbacks) | expand |
Am 03/02/2023 um 16:21 schrieb Kevin Wolf: > After introducing the graph lock in a previous series, this series > actually starts making widespread use of it. > > Most of the BlockDriver callbacks access the children list in some way, > so you need to hold the graph lock to call them. The patches in this > series add the corresponding GRAPH_RDLOCK annotations and take the lock > in places where it doesn't happen yet - all of the bdrv_*() co_wrappers > are already covered, but in particular BlockBackend coroutine_fns still > need it. > > There is no particularly good reason why exactly these patches and not > others are included in the series. I couldn't find a self-contained part > that could reasonable be addressed in a single series. So these just > happen to be patches that are somewhat related (centered around the > BlockDriver callback theme), are ready and their number looks > manageable. You will still see some FIXMEs at the end of the series > that will only be addressed in future patches. Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > > Emanuele Giuseppe Esposito (5): > block/qed: add missing graph rdlock in qed_need_check_timer_entry > block: Mark bdrv_co_flush() and callers GRAPH_RDLOCK > block: Mark bdrv_co_pdiscard() and callers GRAPH_RDLOCK > block: Mark bdrv_co_copy_range() GRAPH_RDLOCK > block: Mark bdrv_co_is_inserted() and callers GRAPH_RDLOCK > > Kevin Wolf (18): > block: Make bdrv_can_set_read_only() static > mirror: Fix access of uninitialised fields during start > block: Mark bdrv_co_truncate() and callers GRAPH_RDLOCK > block: Mark bdrv_co_block_status() and callers GRAPH_RDLOCK > block: Mark bdrv_co_ioctl() and callers GRAPH_RDLOCK > block: Mark bdrv_co_pwrite_zeroes() and callers GRAPH_RDLOCK > block: Mark read/write in block/io.c GRAPH_RDLOCK > block: Mark public read/write functions GRAPH_RDLOCK > block: Mark bdrv_co_pwrite_sync() and callers GRAPH_RDLOCK > block: Mark bdrv_co_do_pwrite_zeroes() GRAPH_RDLOCK > block: Mark preadv_snapshot/snapshot_block_status GRAPH_RDLOCK > block: Mark bdrv_co_create() and callers GRAPH_RDLOCK > block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK > block: Mark bdrv_co_eject/lock_medium() and callers GRAPH_RDLOCK > block: Mark bdrv_(un)register_buf() GRAPH_RDLOCK > block: Mark bdrv_co_delete_file() and callers GRAPH_RDLOCK > block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK > block: Mark bdrv_co_refresh_total_sectors() and callers GRAPH_RDLOCK > > block/coroutines.h | 2 +- > block/qcow2.h | 27 +++-- > block/qed.h | 45 ++++---- > include/block/block-copy.h | 6 +- > include/block/block-global-state.h | 14 ++- > include/block/block-io.h | 110 +++++++++--------- > include/block/block_int-common.h | 173 ++++++++++++++++------------- > include/block/block_int-io.h | 53 ++++----- > include/block/dirty-bitmap.h | 12 +- > include/sysemu/block-backend-io.h | 7 +- > block.c | 12 +- > block/backup.c | 3 + > block/blkdebug.c | 19 ++-- > block/blklogwrites.c | 35 +++--- > block/blkreplay.c | 24 ++-- > block/blkverify.c | 5 +- > block/block-backend.c | 39 +++++-- > block/block-copy.c | 32 +++--- > block/bochs.c | 2 +- > block/commit.c | 5 +- > block/copy-before-write.c | 33 +++--- > block/copy-on-read.c | 44 ++++---- > block/create.c | 9 +- > block/crypto.c | 16 +-- > block/dirty-bitmap.c | 2 + > block/file-posix.c | 27 ++--- > block/file-win32.c | 7 +- > block/filter-compress.c | 36 +++--- > block/io.c | 108 +++++++++++------- > block/iscsi.c | 28 ++--- > block/mirror.c | 59 ++++++---- > block/parallels.c | 33 +++--- > block/preallocate.c | 38 ++++--- > block/qcow.c | 46 ++++---- > block/qcow2-cluster.c | 17 ++- > block/qcow2.c | 136 ++++++++++++----------- > block/qed-check.c | 3 +- > block/qed-table.c | 10 +- > block/qed.c | 101 +++++++++-------- > block/quorum.c | 62 ++++++----- > block/raw-format.c | 76 ++++++------- > block/replication.c | 18 ++- > block/snapshot-access.c | 8 +- > block/stream.c | 40 ++++--- > block/throttle.c | 36 +++--- > block/vdi.c | 11 +- > block/vhdx.c | 18 +-- > block/vmdk.c | 132 ++++++++++------------ > block/vpc.c | 11 +- > qemu-img.c | 8 +- > tests/unit/test-bdrv-drain.c | 20 ++-- > tests/unit/test-block-iothread.c | 3 +- > 52 files changed, 983 insertions(+), 838 deletions(-) >
On Fri, Feb 03, 2023 at 04:21:39PM +0100, Kevin Wolf wrote: > After introducing the graph lock in a previous series, this series > actually starts making widespread use of it. > > Most of the BlockDriver callbacks access the children list in some way, > so you need to hold the graph lock to call them. The patches in this > series add the corresponding GRAPH_RDLOCK annotations and take the lock > in places where it doesn't happen yet - all of the bdrv_*() co_wrappers > are already covered, but in particular BlockBackend coroutine_fns still > need it. > > There is no particularly good reason why exactly these patches and not > others are included in the series. I couldn't find a self-contained part > that could reasonable be addressed in a single series. So these just > happen to be patches that are somewhat related (centered around the > BlockDriver callback theme), are ready and their number looks > manageable. You will still see some FIXMEs at the end of the series > that will only be addressed in future patches. Two things occurred to me: 1. The graph lock is becoming the new AioContext lock in the sense that code using the block layer APIs needs to carefully acquire and release the lock around operations. Why is it necessary to explicitly take the rdlock in mirror_iteration()? + WITH_GRAPH_RDLOCK_GUARD() { ret = bdrv_block_status_above(source, NULL, offset, I guess because bdrv_*() APIs are unlocked? The equivalent blk_*() API would have taken the graph lock internally. Do we want to continue using bdrv APIs even though it spreads graph locking concerns into block jobs? 2. This series touches block drivers like qcow2. Luckily block drivers just need to annotate their BlockDriver functions to indicate they run under the rdlock, a lock that the block driver itself doesn't mess with. It makes me wonder whether there is any point in annotating the BlockDriver function pointers? It would be simpler if the block drivers were unaware of the graph lock. Stefan
Am 21.02.2023 um 23:13 hat Stefan Hajnoczi geschrieben: > On Fri, Feb 03, 2023 at 04:21:39PM +0100, Kevin Wolf wrote: > > After introducing the graph lock in a previous series, this series > > actually starts making widespread use of it. > > > > Most of the BlockDriver callbacks access the children list in some way, > > so you need to hold the graph lock to call them. The patches in this > > series add the corresponding GRAPH_RDLOCK annotations and take the lock > > in places where it doesn't happen yet - all of the bdrv_*() co_wrappers > > are already covered, but in particular BlockBackend coroutine_fns still > > need it. > > > > There is no particularly good reason why exactly these patches and not > > others are included in the series. I couldn't find a self-contained part > > that could reasonable be addressed in a single series. So these just > > happen to be patches that are somewhat related (centered around the > > BlockDriver callback theme), are ready and their number looks > > manageable. You will still see some FIXMEs at the end of the series > > that will only be addressed in future patches. > > Two things occurred to me: > > 1. The graph lock is becoming the new AioContext lock in the sense that > code using the block layer APIs needs to carefully acquire and release > the lock around operations. Why is it necessary to explicitly take the > rdlock in mirror_iteration()? > > + WITH_GRAPH_RDLOCK_GUARD() { > ret = bdrv_block_status_above(source, NULL, offset, > > I guess because bdrv_*() APIs are unlocked? The equivalent blk_*() API > would have taken the graph lock internally. Do we want to continue using > bdrv APIs even though it spreads graph locking concerns into block jobs? The thing that makes it a bit ugly is that block jobs mix bdrv_*() and blk_*() calls. If they only used blk_*() we wouldn't have to take care of locking (but that means that the job code itself must not have a problem with a changing graph!). If they only used bdrv_*(), the function could just take a lock at the start and only temporarily release it around pause points. Both ways would look nicer than what we have now. > 2. This series touches block drivers like qcow2. Luckily block drivers > just need to annotate their BlockDriver functions to indicate they run > under the rdlock, a lock that the block driver itself doesn't mess with. > It makes me wonder whether there is any point in annotating the > BlockDriver function pointers? It would be simpler if the block drivers > were unaware of the graph lock. If you're unaware of the graph lock, how do you tell if you can call certain block layer functions that require the lock? Especially since different BlockDriver callbacks have different rules (some have a reader lock, some have a writer lock, and some may stay unlocked even in the future), it would seem really hard to keep track of this when you don't make it explicit. Kevin
On Thu, Feb 23, 2023 at 12:48:18PM +0100, Kevin Wolf wrote: > Am 21.02.2023 um 23:13 hat Stefan Hajnoczi geschrieben: > > On Fri, Feb 03, 2023 at 04:21:39PM +0100, Kevin Wolf wrote: > > > After introducing the graph lock in a previous series, this series > > > actually starts making widespread use of it. > > > > > > Most of the BlockDriver callbacks access the children list in some way, > > > so you need to hold the graph lock to call them. The patches in this > > > series add the corresponding GRAPH_RDLOCK annotations and take the lock > > > in places where it doesn't happen yet - all of the bdrv_*() co_wrappers > > > are already covered, but in particular BlockBackend coroutine_fns still > > > need it. > > > > > > There is no particularly good reason why exactly these patches and not > > > others are included in the series. I couldn't find a self-contained part > > > that could reasonable be addressed in a single series. So these just > > > happen to be patches that are somewhat related (centered around the > > > BlockDriver callback theme), are ready and their number looks > > > manageable. You will still see some FIXMEs at the end of the series > > > that will only be addressed in future patches. > > > > Two things occurred to me: > > > > 1. The graph lock is becoming the new AioContext lock in the sense that > > code using the block layer APIs needs to carefully acquire and release > > the lock around operations. Why is it necessary to explicitly take the > > rdlock in mirror_iteration()? > > > > + WITH_GRAPH_RDLOCK_GUARD() { > > ret = bdrv_block_status_above(source, NULL, offset, > > > > I guess because bdrv_*() APIs are unlocked? The equivalent blk_*() API > > would have taken the graph lock internally. Do we want to continue using > > bdrv APIs even though it spreads graph locking concerns into block jobs? > > The thing that makes it a bit ugly is that block jobs mix bdrv_*() and > blk_*() calls. If they only used blk_*() we wouldn't have to take care > of locking (but that means that the job code itself must not have a > problem with a changing graph!). If they only used bdrv_*(), the > function could just take a lock at the start and only temporarily > release it around pause points. Both ways would look nicer than what we > have now. > > > 2. This series touches block drivers like qcow2. Luckily block drivers > > just need to annotate their BlockDriver functions to indicate they run > > under the rdlock, a lock that the block driver itself doesn't mess with. > > It makes me wonder whether there is any point in annotating the > > BlockDriver function pointers? It would be simpler if the block drivers > > were unaware of the graph lock. > > If you're unaware of the graph lock, how do you tell if you can call > certain block layer functions that require the lock? > > Especially since different BlockDriver callbacks have different rules > (some have a reader lock, some have a writer lock, and some may stay > unlocked even in the future), it would seem really hard to keep track of > this when you don't make it explicit. Hi Kevin, Can you give an example of where it is necessary (not accidental complexity) to expose an unlocked API and put the responsibility of locking on the caller? Thanks, Stefan
On Thu, Feb 23, 2023 at 12:48:18PM +0100, Kevin Wolf wrote: > Am 21.02.2023 um 23:13 hat Stefan Hajnoczi geschrieben: > > On Fri, Feb 03, 2023 at 04:21:39PM +0100, Kevin Wolf wrote: > > > After introducing the graph lock in a previous series, this series > > > actually starts making widespread use of it. > > > > > > Most of the BlockDriver callbacks access the children list in some way, > > > so you need to hold the graph lock to call them. The patches in this > > > series add the corresponding GRAPH_RDLOCK annotations and take the lock > > > in places where it doesn't happen yet - all of the bdrv_*() co_wrappers > > > are already covered, but in particular BlockBackend coroutine_fns still > > > need it. > > > > > > There is no particularly good reason why exactly these patches and not > > > others are included in the series. I couldn't find a self-contained part > > > that could reasonable be addressed in a single series. So these just > > > happen to be patches that are somewhat related (centered around the > > > BlockDriver callback theme), are ready and their number looks > > > manageable. You will still see some FIXMEs at the end of the series > > > that will only be addressed in future patches. > > > > Two things occurred to me: > > > > 1. The graph lock is becoming the new AioContext lock in the sense that > > code using the block layer APIs needs to carefully acquire and release > > the lock around operations. Why is it necessary to explicitly take the > > rdlock in mirror_iteration()? > > > > + WITH_GRAPH_RDLOCK_GUARD() { > > ret = bdrv_block_status_above(source, NULL, offset, > > > > I guess because bdrv_*() APIs are unlocked? The equivalent blk_*() API > > would have taken the graph lock internally. Do we want to continue using > > bdrv APIs even though it spreads graph locking concerns into block jobs? > > The thing that makes it a bit ugly is that block jobs mix bdrv_*() and > blk_*() calls. If they only used blk_*() we wouldn't have to take care > of locking (but that means that the job code itself must not have a > problem with a changing graph!). If they only used bdrv_*(), the > function could just take a lock at the start and only temporarily > release it around pause points. Both ways would look nicer than what we > have now. > > > 2. This series touches block drivers like qcow2. Luckily block drivers > > just need to annotate their BlockDriver functions to indicate they run > > under the rdlock, a lock that the block driver itself doesn't mess with. > > It makes me wonder whether there is any point in annotating the > > BlockDriver function pointers? It would be simpler if the block drivers > > were unaware of the graph lock. > > If you're unaware of the graph lock, how do you tell if you can call > certain block layer functions that require the lock? > > Especially since different BlockDriver callbacks have different rules > (some have a reader lock, some have a writer lock, and some may stay > unlocked even in the future), it would seem really hard to keep track of > this when you don't make it explicit. I discussed this offline with Kevin some more today. While there might be opportunities to hide the lock (thereby making it easier), it's not easy to do because we don't want to give up TSA static checking. Let's put the graph lock in place first and worry about that later. Acked-by: Stefan Hajnoczi <stefanha@redhat.com>