diff mbox series

[3/4] nbd: make client_close synchronous

Message ID 20200625142540.24589-4-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series Fix crash due to NBD export leak | expand

Commit Message

Vladimir Sementsov-Ogievskiy June 25, 2020, 2:25 p.m. UTC
client_close doesn't guarantee that client is closed: nbd_trip() keeps
reference to it. Let's wait for nbd_trip to finish.

Without this fix, the following crash is possible:

- export bitmap through unternal Qemu NBD server
- connect a client
- shutdown Qemu

On shutdown nbd_export_close_all is called, but it actually don't wait
for nbd_trip() to finish and to release its references. So, export is
not release, and exported bitmap remains busy, and on try to remove the
bitmap (which is part of bdrv_close()) the assertion fairs:

bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/server.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Eric Blake June 25, 2020, 2:48 p.m. UTC | #1
On 6/25/20 9:25 AM, Vladimir Sementsov-Ogievskiy wrote:
> client_close doesn't guarantee that client is closed: nbd_trip() keeps
> reference to it. Let's wait for nbd_trip to finish.
> 
> Without this fix, the following crash is possible:
> 
> - export bitmap through unternal Qemu NBD server

internal

> - connect a client
> - shutdown Qemu
> 
> On shutdown nbd_export_close_all is called, but it actually don't wait
> for nbd_trip() to finish and to release its references. So, export is
> not release, and exported bitmap remains busy, and on try to remove the
> bitmap (which is part of bdrv_close()) the assertion fairs:

fails

> 
> bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   nbd/server.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 20754e9ebc..5e27a8d31a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1419,6 +1419,8 @@ static void client_close(NBDClient *client, bool negotiated)
>       qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>                            NULL);
>   
> +    AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine);
> +
>       /* Also tell the client, so that they release their reference.  */
>       if (client->close_fn) {
>           client->close_fn(client, negotiated);
> @@ -2450,6 +2452,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>   
>       trace_nbd_trip();
>       if (client->closing) {
> +        client->recv_coroutine = NULL;
>           nbd_client_put(client);
>           return;
>       }
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
Vladimir Sementsov-Ogievskiy June 29, 2020, 7:55 a.m. UTC | #2
25.06.2020 17:25, Vladimir Sementsov-Ogievskiy wrote:
> client_close doesn't guarantee that client is closed: nbd_trip() keeps
> reference to it. Let's wait for nbd_trip to finish.
> 
> Without this fix, the following crash is possible:
> 
> - export bitmap through unternal Qemu NBD server
> - connect a client
> - shutdown Qemu
> 
> On shutdown nbd_export_close_all is called, but it actually don't wait
> for nbd_trip() to finish and to release its references. So, export is
> not release, and exported bitmap remains busy, and on try to remove the
> bitmap (which is part of bdrv_close()) the assertion fairs:
> 
> bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   nbd/server.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 20754e9ebc..5e27a8d31a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1419,6 +1419,8 @@ static void client_close(NBDClient *client, bool negotiated)
>       qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>                            NULL);
>   
> +    AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine);
> +
>       /* Also tell the client, so that they release their reference.  */
>       if (client->close_fn) {
>           client->close_fn(client, negotiated);
> @@ -2450,6 +2452,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>   
>       trace_nbd_trip();
>       if (client->closing) {
> +        client->recv_coroutine = NULL;
>           nbd_client_put(client);
>           return;
>       }
> 

I have a doubt, isn't it possible to hang in AIO_WAIT_WHILE() after this patch?

Do we need aio_wait_kick() in the nbd_trip()? Or may be, something like
aio_wait_kick(), but to wake exactly client->exp->ctx aio context?

Also, why in block/io.c we kick the main context, but not bs->aio_context?
Stefan Hajnoczi June 29, 2020, 1:56 p.m. UTC | #3
On Mon, Jun 29, 2020 at 10:55:06AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Also, why in block/io.c we kick the main context, but not bs->aio_context?

From AIO_WAIT_WHILE():

 * The caller's thread must be the IOThread that owns @ctx or the main loop
 * thread (with @ctx acquired exactly once).  This function cannot be used to
 * wait on conditions between two IOThreads since that could lead to deadlock,
 * go via the main loop instead.

Case 1: IOThread

  while ((cond)) {                                           \
      aio_poll(ctx_, true);                                  \
      waited_ = true;                                        \
  }                                                          \

In this case aio_poll() returns after every event loop iteration and we
will re-evaluate cond. Therefore we don't need to be kicked.

Case 2: Main loop thread

In this case we need the kick since we're waiting on the main loop
AioContext, not the IOThread AioContext that is doing the actual work.
aio_wait_kick() schedules a dummy BH to wake up the main loop thread.

There is no harm in scheduling the dummy BH in the main loop thread when
AIO_WAIT_WHILE() is called from an IOThread.

Hope this helps,
Stefan
Vladimir Sementsov-Ogievskiy July 1, 2020, 10:19 a.m. UTC | #4
29.06.2020 16:56, Stefan Hajnoczi wrote:
> On Mon, Jun 29, 2020 at 10:55:06AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Also, why in block/io.c we kick the main context, but not bs->aio_context?
> 
>  From AIO_WAIT_WHILE():
> 
>   * The caller's thread must be the IOThread that owns @ctx or the main loop
>   * thread (with @ctx acquired exactly once).  This function cannot be used to
>   * wait on conditions between two IOThreads since that could lead to deadlock,
>   * go via the main loop instead.
> 
> Case 1: IOThread
> 
>    while ((cond)) {                                           \
>        aio_poll(ctx_, true);                                  \
>        waited_ = true;                                        \
>    }                                                          \
> 
> In this case aio_poll() returns after every event loop iteration and we
> will re-evaluate cond. Therefore we don't need to be kicked.
> 
> Case 2: Main loop thread
> 
> In this case we need the kick since we're waiting on the main loop
> AioContext, not the IOThread AioContext that is doing the actual work.
> aio_wait_kick() schedules a dummy BH to wake up the main loop thread.
> 
> There is no harm in scheduling the dummy BH in the main loop thread when
> AIO_WAIT_WHILE() is called from an IOThread.
> 
> Hope this helps,

Thanks!

Looking at this all again, I think that client->recv_coroutine == NULL is a bad marker of finish, as it's not directly bound to _put. I'll try another approach, to make nbd_export_close_all() be synchronous instead by waiting for all export to be actually freed.
diff mbox series

Patch

diff --git a/nbd/server.c b/nbd/server.c
index 20754e9ebc..5e27a8d31a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1419,6 +1419,8 @@  static void client_close(NBDClient *client, bool negotiated)
     qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
                          NULL);
 
+    AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine);
+
     /* Also tell the client, so that they release their reference.  */
     if (client->close_fn) {
         client->close_fn(client, negotiated);
@@ -2450,6 +2452,7 @@  static coroutine_fn void nbd_trip(void *opaque)
 
     trace_nbd_trip();
     if (client->closing) {
+        client->recv_coroutine = NULL;
         nbd_client_put(client);
         return;
     }