[07/12] nbd: Increase bs->in_flight during AioContext switch
diff mbox series

Message ID 20190218161822.3573-8-kwolf@redhat.com
State New
Headers show
Series
  • block: bdrv_set_aio_context() related fixes
Related show

Commit Message

Kevin Wolf Feb. 18, 2019, 4:18 p.m. UTC
bdrv_drain() must not leave connection_co scheduled, so bs->in_flight
needs to be increased while the coroutine is waiting to be scheduled
in the new AioContext after nbd_client_attach_aio_context().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/nbd-client.h  |  1 +
 include/block/nbd.h |  5 +++--
 block/nbd-client.c  | 16 ++++++++++++----
 nbd/client.c        | 23 +++++++++++++++++------
 4 files changed, 33 insertions(+), 12 deletions(-)

Comments

Paolo Bonzini Feb. 18, 2019, 5:22 p.m. UTC | #1
On 18/02/19 17:18, Kevin Wolf wrote:
> +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> +             * the qio_channel_yield() below. */
> +            assert(!*aio_ctx_switch);
>              bdrv_dec_in_flight(bs);
>              qio_channel_yield(ioc, G_IO_IN);
> -            bdrv_inc_in_flight(bs);
> +            if (*aio_ctx_switch) {
> +                /* nbd_client_attach_aio_context() already increased in_flight
> +                 * when scheduling this coroutine for reentry */
> +                *aio_ctx_switch = false;
> +            } else {
> +                bdrv_inc_in_flight(bs);
> +            }

Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
unconditionally here, and in nbd_connection_entry do the opposite, like

	if (s->aio_ctx_switch) {
	    s->aio_ctx_switch = false;
	    bdrv_dec_in_flight(bs);
	}

but I guess the problem is that then bdrv_drain could hang.

So my question is:

1) is there a testcase that shows the problem with this "obvious"
refactoring;

2) maybe instead of aio_co_schedul-ing client->connection_co and having
the s->aio_ctx_switch flag, you could go through a bottom half that does
the bdrv_inc_in_flight and then enters client->connection_co?

Paolo
Kevin Wolf Feb. 19, 2019, 11:11 a.m. UTC | #2
Am 18.02.2019 um 18:22 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> > +             * the qio_channel_yield() below. */
> > +            assert(!*aio_ctx_switch);
> >              bdrv_dec_in_flight(bs);
> >              qio_channel_yield(ioc, G_IO_IN);
> > -            bdrv_inc_in_flight(bs);
> > +            if (*aio_ctx_switch) {
> > +                /* nbd_client_attach_aio_context() already increased in_flight
> > +                 * when scheduling this coroutine for reentry */
> > +                *aio_ctx_switch = false;
> > +            } else {
> > +                bdrv_inc_in_flight(bs);
> > +            }
> 
> Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
> unconditionally here, and in nbd_connection_entry do the opposite, like
> 
> 	if (s->aio_ctx_switch) {
> 	    s->aio_ctx_switch = false;
> 	    bdrv_dec_in_flight(bs);
> 	}
> 
> but I guess the problem is that then bdrv_drain could hang.

Yes, the important part is that in_flight can drop to 0 while we're in
qio_channel_yield().

> So my question is:
> 
> 1) is there a testcase that shows the problem with this "obvious"
> refactoring;

I haven't actually tried it out because it's "obviously" wrong, but in
any test case where no requests are running, you'd never leave this
loop, so it should trivially trigger the problem.

In other words, I think qemu-iotests 094 covers this.

> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
> the s->aio_ctx_switch flag, you could go through a bottom half that does
> the bdrv_inc_in_flight and then enters client->connection_co?

That would be too easy. :-)

But I agree, that might indeed be the better solution.

I think I'd keep patch 6 anyway so that we know the exact yield that
we'll interrupt, even if it's not strictly necessary as long as we know
that nbd_receive_reply() can only yield in places that are safe to be
interrupted. While intuitively I think it's true, I don't feel like
actually auditing the code, and at some point we'd probably fail to
check that new code won't violate this invariant.

Kevin
Paolo Bonzini Feb. 19, 2019, 2:12 p.m. UTC | #3
On 19/02/19 12:11, Kevin Wolf wrote:
>> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
>> the s->aio_ctx_switch flag, you could go through a bottom half that does
>> the bdrv_inc_in_flight and then enters client->connection_co?
> That would be too easy. :-)
> 
> But I agree, that might indeed be the better solution.
> 
> I think I'd keep patch 6 anyway so that we know the exact yield that
> we'll interrupt, even if it's not strictly necessary as long as we know
> that nbd_receive_reply() can only yield in places that are safe to be
> interrupted. While intuitively I think it's true, I don't feel like
> actually auditing the code, and at some point we'd probably fail to
> check that new code won't violate this invariant.

Yes, I agree with keeping patch 6.

Paolo
Kevin Wolf Feb. 20, 2019, 4:33 p.m. UTC | #4
Am 18.02.2019 um 18:22 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> > +             * the qio_channel_yield() below. */
> > +            assert(!*aio_ctx_switch);
> >              bdrv_dec_in_flight(bs);
> >              qio_channel_yield(ioc, G_IO_IN);
> > -            bdrv_inc_in_flight(bs);
> > +            if (*aio_ctx_switch) {
> > +                /* nbd_client_attach_aio_context() already increased in_flight
> > +                 * when scheduling this coroutine for reentry */
> > +                *aio_ctx_switch = false;
> > +            } else {
> > +                bdrv_inc_in_flight(bs);
> > +            }
> 
> Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
> unconditionally here, and in nbd_connection_entry do the opposite, like
> 
> 	if (s->aio_ctx_switch) {
> 	    s->aio_ctx_switch = false;
> 	    bdrv_dec_in_flight(bs);
> 	}
> 
> but I guess the problem is that then bdrv_drain could hang.
> 
> So my question is:
> 
> 1) is there a testcase that shows the problem with this "obvious"
> refactoring;
> 
> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
> the s->aio_ctx_switch flag, you could go through a bottom half that does
> the bdrv_inc_in_flight and then enters client->connection_co?

Actually, this is going to become a bit ugly, too. I can't just schedule
the BH and return because then the node isn't drained any more when the
BH actually runs - and when it's not drained, we don't know where the
coroutine is, so we can't reenter it.

With an AIO_WAIT_WHILE() in the old thread, it should work, though...

Kevin

Patch
diff mbox series

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 09e03013d2..ee4e8544fe 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -37,6 +37,7 @@  typedef struct NBDClientSession {
     NBDReply reply;
     BlockDriverState *bs;
     bool quit;
+    bool aio_ctx_switch;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/include/block/nbd.h b/include/block/nbd.h
index c6ef1ef42e..73e6e95095 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,8 +300,9 @@  int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, bool *aio_ctx_switch,
+                                   QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 688993652d..433de9e275 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -75,6 +75,11 @@  static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;
 
+    /* Keep the initial bs->in_flight increase; decrease only temporarily in
+     * nbd_read_eof() and at the end of this function. */
+    assert(s->aio_ctx_switch);
+    s->aio_ctx_switch = false;
+
     while (!s->quit) {
         /*
          * The NBD client can only really be considered idle when it has
@@ -86,7 +91,8 @@  static coroutine_fn void nbd_connection_entry(void *opaque)
          * only drop it temporarily here.
          */
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
+        ret = nbd_receive_reply(s->bs, &s->aio_ctx_switch, s->ioc, &s->reply,
+                                &local_err);
 
         if (local_err) {
             trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
@@ -983,8 +989,11 @@  void nbd_client_attach_aio_context(BlockDriverState *bs,
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
 
-    /* FIXME Really need a bdrv_inc_in_flight() here, but the corresponding
-     * bdrv_dec_in_flight() would have to be in QIOChannel code :-/ */
+    /* The corresponding bdrv_dec_in_flight() is in nbd_read_eof() */
+    assert(!client->aio_ctx_switch);
+    client->aio_ctx_switch = true;
+    bdrv_inc_in_flight(bs);
+
     aio_co_schedule(new_context, client->connection_co);
 }
 
@@ -1091,7 +1100,6 @@  static int nbd_client_connect(BlockDriverState *bs,
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
     client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
-    bdrv_inc_in_flight(bs);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
diff --git a/nbd/client.c b/nbd/client.c
index de7da48246..12bd24d8fe 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1394,8 +1394,8 @@  static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
  *         negative errno on failure (errp is set)
  */
 static inline int coroutine_fn
-nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
-             Error **errp)
+nbd_read_eof(BlockDriverState *bs, bool *aio_ctx_switch, QIOChannel *ioc,
+             void *buffer, size_t size, Error **errp)
 {
     bool partial = false;
 
@@ -1406,9 +1406,18 @@  nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
 
         len = qio_channel_readv(ioc, &iov, 1, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
+            /* aio_ctx_switch is only supposed to be set if we're sitting in
+             * the qio_channel_yield() below. */
+            assert(!*aio_ctx_switch);
             bdrv_dec_in_flight(bs);
             qio_channel_yield(ioc, G_IO_IN);
-            bdrv_inc_in_flight(bs);
+            if (*aio_ctx_switch) {
+                /* nbd_client_attach_aio_context() already increased in_flight
+                 * when scheduling this coroutine for reentry */
+                *aio_ctx_switch = false;
+            } else {
+                bdrv_inc_in_flight(bs);
+            }
             continue;
         } else if (len < 0) {
             return -EIO;
@@ -1439,13 +1448,15 @@  nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
  *         0 on eof, when no data was read (errp is not set)
  *         negative errno on failure (errp is set)
  */
-int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, bool *aio_ctx_switch,
+                                   QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp)
 {
     int ret;
     const char *type;
 
-    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, aio_ctx_switch, ioc, &reply->magic,
+                       sizeof(reply->magic), errp);
     if (ret <= 0) {
         return ret;
     }