[v2,2/2] mirror: Wait only for in-flight operations
diff mbox series

Message ID 20200326153628.4869-3-kwolf@redhat.com
State New
Headers show
Series
  • mirror: Fix hang (operation waiting for itself/circular dependency)
Related show

Commit Message

Kevin Wolf March 26, 2020, 3:36 p.m. UTC
mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, a MirrorOp is already in s->ops_in_flight when
mirror_co_read() waits for free slots, so if not enough slots are
immediately available, an operation can end up waiting for itself, or
two or more operations can wait for each other to complete, which
results in a hang.

Fix this by adding a flag to MirrorOp that tells us if the request is
already in flight (and therefore occupies slots that it will later
free), and picking only such operations for waiting.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Max Reitz March 26, 2020, 6:27 p.m. UTC | #1
On 26.03.20 16:36, Kevin Wolf wrote:
> mirror_wait_for_free_in_flight_slot() just picks a random operation to
> wait for. However, a MirrorOp is already in s->ops_in_flight when
> mirror_co_read() waits for free slots, so if not enough slots are
> immediately available, an operation can end up waiting for itself, or
> two or more operations can wait for each other to complete, which
> results in a hang.
> 
> Fix this by adding a flag to MirrorOp that tells us if the request is
> already in flight (and therefore occupies slots that it will later
> free), and picking only such operations for waiting.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/mirror.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 393131b135..88414d1653 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c

[...]

> @@ -1318,6 +1324,7 @@ static MirrorOp *coroutine_fn active_write_prepare(MirrorBlockJob *s,
>          .offset             = offset,
>          .bytes              = bytes,
>          .is_active_write    = true,
> +        .is_in_flight       = true,
>      };
>      qemu_co_queue_init(&op->waiting_requests);
>      QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next);
> 

There is a mirror_wait_on_conflicts() call after this.  I was a bit
worried about dependencies there.  But I don’t think there’s any
problem, because mirror_wait_for_any_operation() is only called by:

(1) mirror_wait_for_free_in_flight_slot(), which makes it look for
non-active operations only, and

(2) mirror_run(), which specifically waits for all active operations to
settle, so it makes sense to wait for all of them, even when they are
still doing their own dependency-waiting.

But still, I’m not sure whether this is conceptually the best thing to
do.  I think what we actually want is for
mirror_wait_for_free_in_flight_slot() to only wait for in-flight
operations; but the call in mirror_run() that waits for active-mirror
operations wants to wait for all active-mirror operations, not just the
ones that are in flight.

So I think conceptually it would make more sense to set is_in_flight
only after mirror_wait_on_conflicts(), and ensure that the
mirror_wait_for_any_operation() call from mirror_run() ignores
is_in_flight.  E.g. by having another parameter “bool in_flight” for
mirror_wait_for_any_operation() that chooses whether to check
is_in_flight or whether to ignore it.

In practice, @in_flight would always be the same as @active, but they
are different things.  But that would mean we would always ignore
is_in_flight for active-mirror operations.

In practice, there’s no difference to what this patch does, i.e. to just
let active-mirror operations have is_in_flight to be always true and let
mirror_wait_for_any_operation() check is_in_flight unconditionally.

So I don’t know.  Maybe this is a start:

Functionally-reviewed-by: Max Reitz <mreitz@redhat.com>

Max

Patch
diff mbox series

diff --git a/block/mirror.c b/block/mirror.c
index 393131b135..88414d1653 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -102,6 +102,7 @@  struct MirrorOp {
 
     bool is_pseudo_op;
     bool is_active_write;
+    bool is_in_flight;
     CoQueue waiting_requests;
     Coroutine *co;
 
@@ -293,7 +294,9 @@  mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
          * to wait on. */
-        if (!op->is_pseudo_op && op->is_active_write == active) {
+        if (!op->is_pseudo_op && op->is_in_flight &&
+            op->is_active_write == active)
+        {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
@@ -367,6 +370,7 @@  static void coroutine_fn mirror_co_read(void *opaque)
     /* Copy the dirty cluster.  */
     s->in_flight++;
     s->bytes_in_flight += op->bytes;
+    op->is_in_flight = true;
     trace_mirror_one_iteration(s, op->offset, op->bytes);
 
     ret = bdrv_co_preadv(s->mirror_top_bs->backing, op->offset, op->bytes,
@@ -382,6 +386,7 @@  static void coroutine_fn mirror_co_zero(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
                                op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
@@ -396,6 +401,7 @@  static void coroutine_fn mirror_co_discard(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
     mirror_write_complete(op, ret);
@@ -1318,6 +1324,7 @@  static MirrorOp *coroutine_fn active_write_prepare(MirrorBlockJob *s,
         .offset             = offset,
         .bytes              = bytes,
         .is_active_write    = true,
+        .is_in_flight       = true,
     };
     qemu_co_queue_init(&op->waiting_requests);
     QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next);