mbox series

[0/3] block: Make bdrv_refresh_limits() non-recursive

Message ID 20220215135727.28521-1-hreitz@redhat.com (mailing list archive)
Headers show
Series block: Make bdrv_refresh_limits() non-recursive | expand

Message

Hanna Czenczek Feb. 15, 2022, 1:57 p.m. UTC
Hi,

Most bdrv_refresh_limits() callers do not drain the subtree of the node
whose limits are refreshed, so concurrent I/O requests to child nodes
can occur (if the node is in an I/O thread).  bdrv_refresh_limits() is
recursive, so such requests can happen to a node whose limits are being
refreshed.

bdrv_refresh_limits() is not atomic, and so the I/O requests can
encounter invalid limits, like a 0 request_alignment.  This will crash
qemu (e.g. because of a division by 0, or a failed assertion).

On inspection, bdrv_refresh_limits() doesn’t look like it really needs
to be recursive.  It just has always been.  Dropping the recursion fixes
those crashes, because all callers of bdrv_refresh_limits() make sure
one way or another that concurrent requests to the node whose limits are
to be refreshed are at leased paused (by draining, and/or by acquiring
the AioContext).

I see two other ways to fix it:
(A) Have all bdrv_refresh_limits() callers drain the entire subtree,
(B) Protect BDS.bl with RCU, which would make concurrent I/O just fine.

(A) is kind of ugly, and after starting down that path two times, both
times I decided I didn’t want to follow through with it.  It was always
an AioContext-juggling mess.  (E.g. bdrv_set_backing_hd() would need to
drain the subtree; but that means having to acquire the `backing_hd`
context, too, because `bs` might be moved into that context, and so when
`backing_hd` is attached to `bs`, `backing_hd` would be drained in the
new context.  But we can’t acquire a context twice, so we can only
acquire `backing_hd`’s context if the caller hasn’t done so already.
But the worst is that we can’t actually acquire that context: If `bs` is
moved into `backing_hd`’s context, then `bdrv_set_aio_context_ignore()`
requires us not to hold that context.  It’s just kind of a mess.)

I tried (B), and it worked, and I liked it very much; but it requires
quite a bit of refactoring (every BDS.bl reader must then use
qatomic_rcu_read() and take the RCU read lock), so it feels really
difficult to justify when the fix this series proposes just removes four
lines of code.


Hanna Reitz (3):
  block: Make bdrv_refresh_limits() non-recursive
  iotests: Allow using QMP with the QSD
  iotests/graph-changes-while-io: New test

 block/io.c                                    |  4 -
 tests/qemu-iotests/iotests.py                 | 29 +++++-
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++++++++++++++++++
 .../tests/graph-changes-while-io.out          |  5 +
 4 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out