diff mbox series

[1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine

Message ID 20220208153655.1251658-2-eesposit@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: bug fixes in preparation of AioContext removal | expand

Commit Message

Emanuele Giuseppe Esposito Feb. 8, 2022, 3:36 p.m. UTC
Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
is not a good idea: the callback might be called when running
a drain in a coroutine, and bdrv_drained_begin_poll() does not
handle that case, resulting in assertion failure.

Instead, bdrv_do_drained_begin with no recursion and poll
will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
but will firstly check if we are already in a coroutine, and exit
from that via bdrv_co_yield_to_drain().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                  | 2 +-
 block/io.c               | 7 ++++++-
 include/block/block-io.h | 8 +++++---
 3 files changed, 12 insertions(+), 5 deletions(-)

Comments

Stefan Hajnoczi Feb. 10, 2022, 2:14 p.m. UTC | #1
On Tue, Feb 08, 2022 at 10:36:50AM -0500, Emanuele Giuseppe Esposito wrote:
> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> is not a good idea: the callback might be called when running
> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> handle that case, resulting in assertion failure.
> 
> Instead, bdrv_do_drained_begin with no recursion and poll
> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
> but will firstly check if we are already in a coroutine, and exit
> from that via bdrv_co_yield_to_drain().
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c                  | 2 +-
>  block/io.c               | 7 ++++++-
>  include/block/block-io.h | 8 +++++---
>  3 files changed, 12 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kevin Wolf Feb. 11, 2022, 11:54 a.m. UTC | #2
Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> is not a good idea: the callback might be called when running
> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> handle that case, resulting in assertion failure.

I remembered that we talked about this only recently on IRC, but it
didn't make any sense to me again when I read this commit message. So I
think we need --verbose.

The .drained_begin callback was always meant to run outside of coroutine
context, so the unexpected part isn't that it calls a function that
can't run in coroutine context, but that it is already called itself in
coroutine context.

The problematic path is bdrv_replace_child_noperm() which then calls
bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
context is dangerous, it can cause deadlocks because the caller of the
coroutine can't make progress. So I believe this call is already wrong
in coroutine context.

Now I don't know the call path up to bdrv_replace_child_noperm(), but as
far as I remember, that was another function that was originally never
run in coroutine context. Maybe we have good reason to change this, I
can't point to anything that would be inherently wrong with it, but I
would still be curious in which context it does run in a coroutine now.

Anyway, whatever the specific place is, I believe we must drop out of
coroutine context _before_ calling bdrv_parent_drained_begin_single(),
not only in callbacks called by it.

> Instead, bdrv_do_drained_begin with no recursion and poll
> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
> but will firstly check if we are already in a coroutine, and exit
> from that via bdrv_co_yield_to_drain().
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Kevin
Emanuele Giuseppe Esposito Feb. 14, 2022, 10:27 a.m. UTC | #3
On 11/02/2022 12:54, Kevin Wolf wrote:
> Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
>> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
>> is not a good idea: the callback might be called when running
>> a drain in a coroutine, and bdrv_drained_begin_poll() does not
>> handle that case, resulting in assertion failure.
> 
> I remembered that we talked about this only recently on IRC, but it
> didn't make any sense to me again when I read this commit message. So I
> think we need --verbose.
> 
> The .drained_begin callback was always meant to run outside of coroutine
> context, so the unexpected part isn't that it calls a function that
> can't run in coroutine context, but that it is already called itself in
> coroutine context.
> 
> The problematic path is bdrv_replace_child_noperm() which then calls
> bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
> context is dangerous, it can cause deadlocks because the caller of the
> coroutine can't make progress. So I believe this call is already wrong
> in coroutine context.

Ok, you added this assertion in dcf94a23, but at that time there was no
bdrv_parent_drained_begin_single, and the polling was only done in
bdrv_do_drained_begin. So I think that to keep the same logic, the
assertion should be moved in bdrv_parent_drained_begin_single()? And
even more specifically, only if the poll flag is true.

I triggered this by adding additional drains in the callers of
bdrv_replace_child_noperm(), and I think some test (probably unit test)
was failing because of either the drained_begin callback itself called
by the drain, or as you suggested the callbacks called by
bdrv_parent_drained_begin_single from bdrv_replace_child_noperm.

Anyways, I think that in addition to the fix in this patch, we should
also fix bdrv_parent_drained_begin_single(poll=true) in
bdrv_replace_child_noperm, with something similar to what is done in
bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
runs the same logic but in the main loop, but then somehow wait that it
finishes before continuing?
Even though at that point we would have a coroutine waiting for the main
loop, which I don't think it's something we want.

Alternatively, we would forbid polling in coroutines at all. And the
only place I can see that is using the drain in coroutine is mirror (see
below).


Additional question: I also noticed that there is a bdrv_drained_begin()
call in mirror.c in the JobDriver run() callback. How can this even
work? If a parent uses bdrv_child_cb_drained_begin (which should not be
so rare) it will crash because of the assertion.

Further additional question: actually I don't understand also the
polling logic of mirror (mirror_drained_poll), as if we are draining in
the coroutine with in_drain = true I think we can have a deadlock if
in_flight>0?

Emanuele

> 
> Now I don't know the call path up to bdrv_replace_child_noperm(), but as
> far as I remember, that was another function that was originally never
> run in coroutine context. Maybe we have good reason to change this, I
> can't point to anything that would be inherently wrong with it, but I
> would still be curious in which context it does run in a coroutine now.
> 
> Anyway, whatever the specific place is, I believe we must drop out of
> coroutine context _before_ calling bdrv_parent_drained_begin_single(),
> not only in callbacks called by it.
> 
>> Instead, bdrv_do_drained_begin with no recursion and poll
>> will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
>> but will firstly check if we are already in a coroutine, and exit
>> from that via bdrv_co_yield_to_drain().
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> Kevin
>
Paolo Bonzini Feb. 14, 2022, 11:57 a.m. UTC | #4
On 2/14/22 11:27, Emanuele Giuseppe Esposito wrote:
> Anyways, I think that in addition to the fix in this patch, we should
> also fix bdrv_parent_drained_begin_single(poll=true) in
> bdrv_replace_child_noperm, with something similar to what is done in
> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
> runs the same logic but in the main loop, but then somehow wait that it
> finishes before continuing?
> 
> Alternatively, we would forbid polling in coroutines at all. And the
> only place I can see that is using the drain in coroutine is mirror (see
> below).

I think you should first of all see what breaks if you forbid 
bdrv_replace_child_noperm() from coroutine context.

Drain in coroutines does not poll, it gets out of the coroutine through 
a bottom half before polling.  So if bdrv_replace_child_noperm() doesn't 
require it, polling in coroutines can still be forbidden.

This patch is correct nevertheless.

Paolo
Kevin Wolf Feb. 14, 2022, 12:03 p.m. UTC | #5
Am 14.02.2022 um 11:27 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> On 11/02/2022 12:54, Kevin Wolf wrote:
> > Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> >> Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
> >> is not a good idea: the callback might be called when running
> >> a drain in a coroutine, and bdrv_drained_begin_poll() does not
> >> handle that case, resulting in assertion failure.
> > 
> > I remembered that we talked about this only recently on IRC, but it
> > didn't make any sense to me again when I read this commit message. So I
> > think we need --verbose.
> > 
> > The .drained_begin callback was always meant to run outside of coroutine
> > context, so the unexpected part isn't that it calls a function that
> > can't run in coroutine context, but that it is already called itself in
> > coroutine context.
> > 
> > The problematic path is bdrv_replace_child_noperm() which then calls
> > bdrv_parent_drained_begin_single(poll=true). Polling in coroutine
> > context is dangerous, it can cause deadlocks because the caller of the
> > coroutine can't make progress. So I believe this call is already wrong
> > in coroutine context.
> 
> Ok, you added this assertion in dcf94a23, but at that time there was no
> bdrv_parent_drained_begin_single, and the polling was only done in
> bdrv_do_drained_begin. So I think that to keep the same logic, the
> assertion should be moved in bdrv_parent_drained_begin_single()? And
> even more specifically, only if the poll flag is true.

I wouldn't necessarily say move, but copying it there makes sense to me.
In order to keep the interface constraints simple, I would assert it
independent of the poll parameter.

> I triggered this by adding additional drains in the callers of
> bdrv_replace_child_noperm(), and I think some test (probably unit test)
> was failing because of either the drained_begin callback itself called
> by the drain, or as you suggested the callbacks called by
> bdrv_parent_drained_begin_single from bdrv_replace_child_noperm.
> 
> Anyways, I think that in addition to the fix in this patch, we should
> also fix bdrv_parent_drained_begin_single(poll=true) in
> bdrv_replace_child_noperm, with something similar to what is done in
> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
> runs the same logic but in the main loop, but then somehow wait that it
> finishes before continuing?
> Even though at that point we would have a coroutine waiting for the main
> loop, which I don't think it's something we want.

Coroutines are waiting for the main loop all the time, why would this be
a problem?

Yes, I think a mechanism similar to bdrv_co_yield_to_drain() is needed
if we want to allow callers to be in coroutine context.

And once we have this mechanism, it's actually not in addition to this
patch, but instead of it, because this patch isn't needed any more when
we know that we can't be in coroutine context.

> Alternatively, we would forbid polling in coroutines at all. And the
> only place I can see that is using the drain in coroutine is mirror
> (see below).

Well, my point is that it is already forbidden because it can deadlock.
Code that polls in coroutine context anyway is probably buggy, unless it
can guarantee very specific circumstances that make a deadlock
impossible.

Maybe we can actually assert this in AIO_WAIT_WHILE().

> Additional question: I also noticed that there is a bdrv_drained_begin()
> call in mirror.c in the JobDriver run() callback. How can this even
> work? If a parent uses bdrv_child_cb_drained_begin (which should not be
> so rare) it will crash because of the assertion.

bdrv_co_yield_to_drain() lets this code run in the main loop.

> Further additional question: actually I don't understand also the
> polling logic of mirror (mirror_drained_poll), as if we are draining in
> the coroutine with in_drain = true I think we can have a deadlock if
> in_flight>0?

You mean for a drain issued by the mirror job itself? The in-flight
requests are still processed by the polling loop, so eventually
in_flight should become 0.

Kevin
Emanuele Giuseppe Esposito Feb. 17, 2022, 3:49 p.m. UTC | #6
On 14/02/2022 12:57, Paolo Bonzini wrote:
> On 2/14/22 11:27, Emanuele Giuseppe Esposito wrote:
>> Anyways, I think that in addition to the fix in this patch, we should
>> also fix bdrv_parent_drained_begin_single(poll=true) in
>> bdrv_replace_child_noperm, with something similar to what is done in
>> bdrv_co_yield_to_drain? ie if we are in coroutine, schedule a BH that
>> runs the same logic but in the main loop, but then somehow wait that it
>> finishes before continuing?
>>
>> Alternatively, we would forbid polling in coroutines at all. And the
>> only place I can see that is using the drain in coroutine is mirror (see
>> below).
> 
> I think you should first of all see what breaks if you forbid
> bdrv_replace_child_noperm() from coroutine context.

Checked. Basically, if I add the assertion assert(!qemu_in_coroutine())
only in bdrv_parent_drained_begin_single(), iotests and unit tests run
as intended.

If I add the assertion in bdrv_replace_child_noperm(), so that the
function can't be invoked by coroutines, everything breaks. Starting
from qemu-img, as it uses bdrv_create_co_entry() and therefore
qcow2_co_create_opts() for qcow2 format.

On the other side, if I add subtree_drains throughout the code, we end
up having  bdrv_parent_drained_begin_single() called much more
frequently, and since bdrv_replace_child_noperm() *is* called by
coroutines, the drain count won't match, so
bdrv_parent_drained_begin_single() will be called much more frequently
and the assertion will fail.

I am testing the fix agreed with Kevin, i.e. implement something very
similar to bdrv_co_yield_to_drain() in
bdrv_parent_drained_begin_single(), where we just create a bh to do the
work in the main loop, and it seems to be working. Maybe that's the way
to go for bdrv_replace_child_noperm?

Should I get rid of this one or have both fixes in two patches? This
patch just fixes part of the problem, but as Paolo said it's correct
nevertheless.

Emanuele

> 
> Drain in coroutines does not poll, it gets out of the coroutine through
> a bottom half before polling.  So if bdrv_replace_child_noperm() doesn't
> require it, polling in coroutines can still be forbidden.
> 
> This patch is correct nevertheless.
> 
> Paolo
>
diff mbox series

Patch

diff --git a/block.c b/block.c
index e42a99d2af..4551eba2aa 100644
--- a/block.c
+++ b/block.c
@@ -1203,7 +1203,7 @@  static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL, false);
+    bdrv_drained_begin_no_poll(bs);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index 683a32c955..d3ee3e1a7c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -425,7 +425,7 @@  static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+static void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
                                    BdrvChild *parent, bool ignore_bds_parents)
 {
     assert(!qemu_in_coroutine());
@@ -487,6 +487,11 @@  void bdrv_subtree_drained_begin(BlockDriverState *bs)
     bdrv_do_drained_begin(bs, true, NULL, false, true);
 }
 
+void bdrv_drained_begin_no_poll(BlockDriverState *bs)
+{
+    bdrv_do_drained_begin(bs, false, NULL, false, false);
+}
+
 /**
  * This function does not poll, nor must any of its recursively called
  * functions.  The *drained_end_counter pointee will be incremented
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 7c04bc3049..48d3442970 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -308,13 +308,15 @@  bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
 void bdrv_drained_begin(BlockDriverState *bs);
 
 /**
- * bdrv_do_drained_begin_quiesce:
+ * bdrv_drained_begin_no_poll:
  *
  * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
  * running requests to complete.
+ * Same as bdrv_drained_begin(), but do not poll for the subgraph to
+ * actually become unquiesced. Therefore, no graph changes will occur
+ * with this function.
  */
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent, bool ignore_bds_parents);
+void bdrv_drained_begin_no_poll(BlockDriverState *bs);
 
 /**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for