diff mbox series

[v7,13/13] block: Convert 'block_resize' to coroutine

Message ID 20200909151149.490589-14-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series monitor: Optionally run handlers in coroutines | expand

Commit Message

Kevin Wolf Sept. 9, 2020, 3:11 p.m. UTC
block_resize performs some I/O that could potentially take quite some
time, so use it as an example for the new 'coroutine': true annotation
in the QAPI schema.

bdrv_truncate() requires that we're already in the right AioContext for
the BlockDriverState if called in coroutine context. So instead of just
taking the AioContext lock, move the QMP handler coroutine to the
context.

Call blk_unref() only after switching back because blk_unref() may only
be called in the main thread.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json |  3 ++-
 blockdev.c           | 13 ++++++-------
 hmp-commands.hx      |  1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

Comments

Stefan Hajnoczi Sept. 15, 2020, 2:57 p.m. UTC | #1
On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote:
> @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device,
>          return;
>      }
>  
> -    aio_context = bdrv_get_aio_context(bs);
> -    aio_context_acquire(aio_context);
> +    old_ctx = bdrv_co_move_to_aio_context(bs);
>  
>      if (size < 0) {
>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");

Is it safe to call blk_new() outside the BQL since it mutates global state?

In other words, could another thread race with us?

> @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device,
>      bdrv_drained_end(bs);
>  
>  out:
> +    aio_co_reschedule_self(old_ctx);
>      blk_unref(blk);
> -    aio_context_release(aio_context);

The following precondition is violated by the blk_unref -> bdrv_drain ->
AIO_WAIT_WHILE() call if blk->refcnt is 1 here:

 * The caller's thread must be the IOThread that owns @ctx or the main loop
 * thread (with @ctx acquired exactly once).

blk_unref() is called from the main loop thread without having acquired
blk's AioContext.

Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but
I'm not sure if that can be guaranteed.

The following seems safer although it's uglier:

  aio_context = bdrv_get_aio_context(bs);
  aio_context_acquire(aio_context);
  blk_unref(blk);
  aio_context_release(aio_context);
Kevin Wolf Sept. 25, 2020, 4:07 p.m. UTC | #2
Am 15.09.2020 um 16:57 hat Stefan Hajnoczi geschrieben:
> On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote:
> > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device,
> >          return;
> >      }
> >  
> > -    aio_context = bdrv_get_aio_context(bs);
> > -    aio_context_acquire(aio_context);
> > +    old_ctx = bdrv_co_move_to_aio_context(bs);
> >  
> >      if (size < 0) {
> >          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
> 
> Is it safe to call blk_new() outside the BQL since it mutates global state?
> 
> In other words, could another thread race with us?

Hm, probably not.

Would it be safer to have the bdrv_co_move_to_aio_context() call only
immediately before the drain?

> > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device,
> >      bdrv_drained_end(bs);
> >  
> >  out:
> > +    aio_co_reschedule_self(old_ctx);
> >      blk_unref(blk);
> > -    aio_context_release(aio_context);
> 
> The following precondition is violated by the blk_unref -> bdrv_drain ->
> AIO_WAIT_WHILE() call if blk->refcnt is 1 here:
> 
>  * The caller's thread must be the IOThread that owns @ctx or the main loop
>  * thread (with @ctx acquired exactly once).
> 
> blk_unref() is called from the main loop thread without having acquired
> blk's AioContext.
> 
> Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but
> I'm not sure if that can be guaranteed.
> 
> The following seems safer although it's uglier:
> 
>   aio_context = bdrv_get_aio_context(bs);
>   aio_context_acquire(aio_context);
>   blk_unref(blk);
>   aio_context_release(aio_context);

May we actually acquire aio_context if blk is in the main thread? I
think we must only do this if it's in a different iothread because we'd
end up with a recursive lock and drain would hang.

Kevin
Stefan Hajnoczi Sept. 28, 2020, 9:05 a.m. UTC | #3
On Fri, Sep 25, 2020 at 06:07:50PM +0200, Kevin Wolf wrote:
> Am 15.09.2020 um 16:57 hat Stefan Hajnoczi geschrieben:
> > On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote:
> > > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device,
> > >          return;
> > >      }
> > >  
> > > -    aio_context = bdrv_get_aio_context(bs);
> > > -    aio_context_acquire(aio_context);
> > > +    old_ctx = bdrv_co_move_to_aio_context(bs);
> > >  
> > >      if (size < 0) {
> > >          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
> > 
> > Is it safe to call blk_new() outside the BQL since it mutates global state?
> > 
> > In other words, could another thread race with us?
> 
> Hm, probably not.
> 
> Would it be safer to have the bdrv_co_move_to_aio_context() call only
> immediately before the drain?

Yes, sounds good.

> > > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device,
> > >      bdrv_drained_end(bs);
> > >  
> > >  out:
> > > +    aio_co_reschedule_self(old_ctx);
> > >      blk_unref(blk);
> > > -    aio_context_release(aio_context);
> > 
> > The following precondition is violated by the blk_unref -> bdrv_drain ->
> > AIO_WAIT_WHILE() call if blk->refcnt is 1 here:
> > 
> >  * The caller's thread must be the IOThread that owns @ctx or the main loop
> >  * thread (with @ctx acquired exactly once).
> > 
> > blk_unref() is called from the main loop thread without having acquired
> > blk's AioContext.
> > 
> > Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but
> > I'm not sure if that can be guaranteed.
> > 
> > The following seems safer although it's uglier:
> > 
> >   aio_context = bdrv_get_aio_context(bs);
> >   aio_context_acquire(aio_context);
> >   blk_unref(blk);
> >   aio_context_release(aio_context);
> 
> May we actually acquire aio_context if blk is in the main thread? I
> think we must only do this if it's in a different iothread because we'd
> end up with a recursive lock and drain would hang.

Right :). Maybe an aio_context_acquire_once() API would help.

Stefan
Kevin Wolf Sept. 28, 2020, 10:33 a.m. UTC | #4
Am 28.09.2020 um 11:05 hat Stefan Hajnoczi geschrieben:
> On Fri, Sep 25, 2020 at 06:07:50PM +0200, Kevin Wolf wrote:
> > Am 15.09.2020 um 16:57 hat Stefan Hajnoczi geschrieben:
> > > On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote:
> > > > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device,
> > > >          return;
> > > >      }
> > > >  
> > > > -    aio_context = bdrv_get_aio_context(bs);
> > > > -    aio_context_acquire(aio_context);
> > > > +    old_ctx = bdrv_co_move_to_aio_context(bs);
> > > >  
> > > >      if (size < 0) {
> > > >          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
> > > 
> > > Is it safe to call blk_new() outside the BQL since it mutates global state?
> > > 
> > > In other words, could another thread race with us?
> > 
> > Hm, probably not.
> > 
> > Would it be safer to have the bdrv_co_move_to_aio_context() call only
> > immediately before the drain?
> 
> Yes, sounds good.
> 
> > > > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device,
> > > >      bdrv_drained_end(bs);
> > > >  
> > > >  out:
> > > > +    aio_co_reschedule_self(old_ctx);
> > > >      blk_unref(blk);
> > > > -    aio_context_release(aio_context);
> > > 
> > > The following precondition is violated by the blk_unref -> bdrv_drain ->
> > > AIO_WAIT_WHILE() call if blk->refcnt is 1 here:
> > > 
> > >  * The caller's thread must be the IOThread that owns @ctx or the main loop
> > >  * thread (with @ctx acquired exactly once).
> > > 
> > > blk_unref() is called from the main loop thread without having acquired
> > > blk's AioContext.
> > > 
> > > Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but
> > > I'm not sure if that can be guaranteed.
> > > 
> > > The following seems safer although it's uglier:
> > > 
> > >   aio_context = bdrv_get_aio_context(bs);
> > >   aio_context_acquire(aio_context);
> > >   blk_unref(blk);
> > >   aio_context_release(aio_context);
> > 
> > May we actually acquire aio_context if blk is in the main thread? I
> > think we must only do this if it's in a different iothread because we'd
> > end up with a recursive lock and drain would hang.
> 
> Right :). Maybe an aio_context_acquire_once() API would help.

If you want it to work in the general case, how would you implement
this? As far as I know there is no way to tell whether we already own
the lock or not.

Something like aio_context_acquire_unless_self() might be easier to
implement.

Kevin
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0345f6f2d2..d3e49c9419 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1302,7 +1302,8 @@ 
 { 'command': 'block_resize',
   'data': { '*device': 'str',
             '*node-name': 'str',
-            'size': 'int' } }
+            'size': 'int' },
+  'coroutine': true }
 
 ##
 # @NewImageMode:
diff --git a/blockdev.c b/blockdev.c
index 7f2561081e..064989fc2d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2439,14 +2439,14 @@  BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
     return ret;
 }
 
-void qmp_block_resize(bool has_device, const char *device,
-                      bool has_node_name, const char *node_name,
-                      int64_t size, Error **errp)
+void coroutine_fn qmp_block_resize(bool has_device, const char *device,
+                                   bool has_node_name, const char *node_name,
+                                   int64_t size, Error **errp)
 {
     Error *local_err = NULL;
     BlockBackend *blk = NULL;
     BlockDriverState *bs;
-    AioContext *aio_context;
+    AioContext *old_ctx;
 
     bs = bdrv_lookup_bs(has_device ? device : NULL,
                         has_node_name ? node_name : NULL,
@@ -2456,8 +2456,7 @@  void qmp_block_resize(bool has_device, const char *device,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
+    old_ctx = bdrv_co_move_to_aio_context(bs);
 
     if (size < 0) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size");
@@ -2479,8 +2478,8 @@  void qmp_block_resize(bool has_device, const char *device,
     bdrv_drained_end(bs);
 
 out:
+    aio_co_reschedule_self(old_ctx);
     blk_unref(blk);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 60f395c276..ac360b73f6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -76,6 +76,7 @@  ERST
         .params     = "device size",
         .help       = "resize a block image",
         .cmd        = hmp_block_resize,
+        .coroutine  = true,
     },
 
 SRST