mbox series

[00/20] Protect the block layer with a rwlock: part 1

Message ID 20221116134850.3051419-1-eesposit@redhat.com (mailing list archive)
Headers show
Series Protect the block layer with a rwlock: part 1 | expand

Message

Emanuele Giuseppe Esposito Nov. 16, 2022, 1:48 p.m. UTC
This serie is the first of four series that aim to introduce and use a new
graph rwlock in the QEMU block layer.
The aim is to replace the current AioContext lock with much fine-grained locks,
aimed to protect only specific data.
Currently the AioContext lock is used pretty much everywhere, and it's not
even clear what it is protecting exactly.

The aim of the rwlock is to cover graph modifications: more precisely,
when a BlockDriverState parent or child list is modified or read, since it can
be concurrently accessed by the main loop and iothreads.

The main assumption is that the main loop is the only one allowed to perform
graph modifications, and so far this has always been held by the current code.

The rwlock is inspired from cpus-common.c implementation, and aims to
reduce cacheline bouncing by having per-aiocontext counter of readers.
All details and implementation of the lock are in patch 1.

We distinguish between writer (main loop, under BQL) that modifies the
graph, and readers (all other coroutines running in various AioContext),
that go through the graph edges, reading ->parents and->children.
The writer (main loop)  has an "exclusive" access, so it first waits for
current read to finish, and then prevents incoming ones from
entering while it has the exclusive access.
The readers (coroutines in multiple AioContext) are free to
access the graph as long the writer is not modifying the graph.
In case it is, they go in a CoQueue and sleep until the writer
is done.

In this first serie, my aim is to introduce the lock (patches 1-3,6), cover the
main graph writer (patch 4), define assertions (patch 5) and start using the
read lock in the generated_co_wrapper functions (7-20).
Such functions recursively traverse the BlockDriverState graph, so they must
take the graph rdlock.

We distinguish two cases related to the generated_co_wrapper (often shortened
to g_c_w):
- qemu_in_coroutine(), which means the function is already running in a
  coroutine. This means we don't take the lock, because the coroutine must
  have taken it when it started
- !qemu_in_coroutine(), which means we need to create a new coroutine that
  performs the operation requested. In this case we take the rdlock as soon as
  the coroutine starts, and release only before finishing.

In this and following series, we try to follow the following locking pattern:
- bdrv_co_* functions that call BlockDriver callbacks always expect the lock
  to be taken, therefore they assert.
- blk_co_* functions usually call blk_wait_while_drained(), therefore they must
  take the lock internally. Therefore we introduce generated_co_wrapper_blk,
  which does not take the rdlock when starting the coroutine.

The long term goal of this series is to eventually replace the AioContext lock,
so that we can get rid of it once and for all.

This serie is based on v4 of "Still more coroutine and various fixes in block layer".

Based-on: <20221116122241.2856527-1-eesposit@redhat.com>

Thank you,
Emanuele

Emanuele Giuseppe Esposito (19):
  graph-lock: introduce BdrvGraphRWlock structure
  async: register/unregister aiocontext in graph lock list
  block.c: wrlock in bdrv_replace_child_noperm
  block: remove unnecessary assert_bdrv_graph_writable()
  block: assert that graph read and writes are performed correctly
  graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD
    macros
  block-coroutine-wrapper.py: take the graph rdlock in bdrv_* functions
  block-backend: introduce new generated_co_wrapper_blk annotation
  block-gen: assert that {bdrv/blk}_co_truncate is always called with
    graph rdlock taken
  block-gen: assert that bdrv_co_{check/invalidate_cache} are always
    called with graph rdlock taken
  block-gen: assert that bdrv_co_pwrite is always called with graph
    rdlock taken
  block-gen: assert that bdrv_co_pwrite_{zeros/sync} is always called
    with graph rdlock taken
  block-gen: assert that bdrv_co_pread is always called with graph
    rdlock taken
  block-gen: assert that {bdrv/blk}_co_flush is always called with graph
    rdlock taken
  block-gen: assert that bdrv_co_{read/write}v_vmstate are always called
    with graph rdlock taken
  block-gen: assert that bdrv_co_pdiscard is always called with graph
    rdlock taken
  block-gen: assert that bdrv_co_common_block_status_above is always
    called with graph rdlock taken
  block-gen: assert that bdrv_co_ioctl is always called with graph
    rdlock taken
  block-gen: assert that nbd_co_do_establish_connection is always called
    with graph rdlock taken

Paolo Bonzini (1):
  block: introduce a lock to protect graph operations

 block.c                                |  15 +-
 block/backup.c                         |   3 +
 block/block-backend.c                  |  10 +-
 block/block-copy.c                     |  10 +-
 block/graph-lock.c                     | 255 +++++++++++++++++++++++++
 block/io.c                             |  15 ++
 block/meson.build                      |   1 +
 block/mirror.c                         |  20 +-
 block/nbd.c                            |   1 +
 block/stream.c                         |  32 ++--
 include/block/aio.h                    |   9 +
 include/block/block-common.h           |   1 +
 include/block/block_int-common.h       |  36 +++-
 include/block/block_int-global-state.h |  17 --
 include/block/block_int-io.h           |   2 +
 include/block/block_int.h              |   1 +
 include/block/graph-lock.h             | 180 +++++++++++++++++
 include/sysemu/block-backend-io.h      |  69 +++----
 qemu-img.c                             |   4 +-
 scripts/block-coroutine-wrapper.py     |  13 +-
 tests/unit/test-bdrv-drain.c           |   2 +
 util/async.c                           |   4 +
 util/meson.build                       |   1 +
 23 files changed, 615 insertions(+), 86 deletions(-)
 create mode 100644 block/graph-lock.c
 create mode 100644 include/block/graph-lock.h

Comments

Emanuele Giuseppe Esposito Nov. 21, 2022, 3:02 p.m. UTC | #1
Ok, as I expected simple changes in a previous based-on serie provoke a
cascade of changes that inevitably affect these patches too.

While I strongly suggest to have an initial look at these patches
because it gives an idea on what am I trying to accomplish, I would not
go looking at nitpicks and trivial errors that came up from the based-on
series (ie "just as in the previous serie, fix this").

The order of the series is:
1. Still more coroutine and various fixes in block layer
2. Protect the block layer with a rwlock: part 1
3. Protect the block layer with a rwlock: part 2
4. Protect the block layer with a rwlock: part 3

Thank you,
Emanuele

Am 16/11/2022 um 14:48 schrieb Emanuele Giuseppe Esposito:
> This serie is the first of four series that aim to introduce and use a new
> graph rwlock in the QEMU block layer.
> The aim is to replace the current AioContext lock with much fine-grained locks,
> aimed to protect only specific data.
> Currently the AioContext lock is used pretty much everywhere, and it's not
> even clear what it is protecting exactly.
> 
> The aim of the rwlock is to cover graph modifications: more precisely,
> when a BlockDriverState parent or child list is modified or read, since it can
> be concurrently accessed by the main loop and iothreads.
> 
> The main assumption is that the main loop is the only one allowed to perform
> graph modifications, and so far this has always been held by the current code.
> 
> The rwlock is inspired from cpus-common.c implementation, and aims to
> reduce cacheline bouncing by having per-aiocontext counter of readers.
> All details and implementation of the lock are in patch 1.
> 
> We distinguish between writer (main loop, under BQL) that modifies the
> graph, and readers (all other coroutines running in various AioContext),
> that go through the graph edges, reading ->parents and->children.
> The writer (main loop)  has an "exclusive" access, so it first waits for
> current read to finish, and then prevents incoming ones from
> entering while it has the exclusive access.
> The readers (coroutines in multiple AioContext) are free to
> access the graph as long the writer is not modifying the graph.
> In case it is, they go in a CoQueue and sleep until the writer
> is done.
> 
> In this first serie, my aim is to introduce the lock (patches 1-3,6), cover the
> main graph writer (patch 4), define assertions (patch 5) and start using the
> read lock in the generated_co_wrapper functions (7-20).
> Such functions recursively traverse the BlockDriverState graph, so they must
> take the graph rdlock.
> 
> We distinguish two cases related to the generated_co_wrapper (often shortened
> to g_c_w):
> - qemu_in_coroutine(), which means the function is already running in a
>   coroutine. This means we don't take the lock, because the coroutine must
>   have taken it when it started
> - !qemu_in_coroutine(), which means we need to create a new coroutine that
>   performs the operation requested. In this case we take the rdlock as soon as
>   the coroutine starts, and release only before finishing.
> 
> In this and following series, we try to follow the following locking pattern:
> - bdrv_co_* functions that call BlockDriver callbacks always expect the lock
>   to be taken, therefore they assert.
> - blk_co_* functions usually call blk_wait_while_drained(), therefore they must
>   take the lock internally. Therefore we introduce generated_co_wrapper_blk,
>   which does not take the rdlock when starting the coroutine.
> 
> The long term goal of this series is to eventually replace the AioContext lock,
> so that we can get rid of it once and for all.
> 
> This serie is based on v4 of "Still more coroutine and various fixes in block layer".
> 
> Based-on: <20221116122241.2856527-1-eesposit@redhat.com>
> 
> Thank you,
> Emanuele
> 
> Emanuele Giuseppe Esposito (19):
>   graph-lock: introduce BdrvGraphRWlock structure
>   async: register/unregister aiocontext in graph lock list
>   block.c: wrlock in bdrv_replace_child_noperm
>   block: remove unnecessary assert_bdrv_graph_writable()
>   block: assert that graph read and writes are performed correctly
>   graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD
>     macros
>   block-coroutine-wrapper.py: take the graph rdlock in bdrv_* functions
>   block-backend: introduce new generated_co_wrapper_blk annotation
>   block-gen: assert that {bdrv/blk}_co_truncate is always called with
>     graph rdlock taken
>   block-gen: assert that bdrv_co_{check/invalidate_cache} are always
>     called with graph rdlock taken
>   block-gen: assert that bdrv_co_pwrite is always called with graph
>     rdlock taken
>   block-gen: assert that bdrv_co_pwrite_{zeros/sync} is always called
>     with graph rdlock taken
>   block-gen: assert that bdrv_co_pread is always called with graph
>     rdlock taken
>   block-gen: assert that {bdrv/blk}_co_flush is always called with graph
>     rdlock taken
>   block-gen: assert that bdrv_co_{read/write}v_vmstate are always called
>     with graph rdlock taken
>   block-gen: assert that bdrv_co_pdiscard is always called with graph
>     rdlock taken
>   block-gen: assert that bdrv_co_common_block_status_above is always
>     called with graph rdlock taken
>   block-gen: assert that bdrv_co_ioctl is always called with graph
>     rdlock taken
>   block-gen: assert that nbd_co_do_establish_connection is always called
>     with graph rdlock taken
> 
> Paolo Bonzini (1):
>   block: introduce a lock to protect graph operations
> 
>  block.c                                |  15 +-
>  block/backup.c                         |   3 +
>  block/block-backend.c                  |  10 +-
>  block/block-copy.c                     |  10 +-
>  block/graph-lock.c                     | 255 +++++++++++++++++++++++++
>  block/io.c                             |  15 ++
>  block/meson.build                      |   1 +
>  block/mirror.c                         |  20 +-
>  block/nbd.c                            |   1 +
>  block/stream.c                         |  32 ++--
>  include/block/aio.h                    |   9 +
>  include/block/block-common.h           |   1 +
>  include/block/block_int-common.h       |  36 +++-
>  include/block/block_int-global-state.h |  17 --
>  include/block/block_int-io.h           |   2 +
>  include/block/block_int.h              |   1 +
>  include/block/graph-lock.h             | 180 +++++++++++++++++
>  include/sysemu/block-backend-io.h      |  69 +++----
>  qemu-img.c                             |   4 +-
>  scripts/block-coroutine-wrapper.py     |  13 +-
>  tests/unit/test-bdrv-drain.c           |   2 +
>  util/async.c                           |   4 +
>  util/meson.build                       |   1 +
>  23 files changed, 615 insertions(+), 86 deletions(-)
>  create mode 100644 block/graph-lock.c
>  create mode 100644 include/block/graph-lock.h
>