Message ID | 20200625142540.24589-4-vsementsov@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix crash due to NBD export leak | expand |
On 6/25/20 9:25 AM, Vladimir Sementsov-Ogievskiy wrote: > client_close doesn't guarantee that client is closed: nbd_trip() keeps > reference to it. Let's wait for nbd_trip to finish. > > Without this fix, the following crash is possible: > > - export bitmap through unternal Qemu NBD server internal > - connect a client > - shutdown Qemu > > On shutdown nbd_export_close_all is called, but it actually don't wait > for nbd_trip() to finish and to release its references. So, export is > not release, and exported bitmap remains busy, and on try to remove the > bitmap (which is part of bdrv_close()) the assertion fairs: fails > > bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > nbd/server.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index 20754e9ebc..5e27a8d31a 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1419,6 +1419,8 @@ static void client_close(NBDClient *client, bool negotiated) > qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, > NULL); > > + AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine); > + > /* Also tell the client, so that they release their reference. */ > if (client->close_fn) { > client->close_fn(client, negotiated); > @@ -2450,6 +2452,7 @@ static coroutine_fn void nbd_trip(void *opaque) > > trace_nbd_trip(); > if (client->closing) { > + client->recv_coroutine = NULL; > nbd_client_put(client); > return; > } > Reviewed-by: Eric Blake <eblake@redhat.com>
25.06.2020 17:25, Vladimir Sementsov-Ogievskiy wrote: > client_close doesn't guarantee that client is closed: nbd_trip() keeps > reference to it. Let's wait for nbd_trip to finish. > > Without this fix, the following crash is possible: > > - export bitmap through unternal Qemu NBD server > - connect a client > - shutdown Qemu > > On shutdown nbd_export_close_all is called, but it actually don't wait > for nbd_trip() to finish and to release its references. So, export is > not release, and exported bitmap remains busy, and on try to remove the > bitmap (which is part of bdrv_close()) the assertion fairs: > > bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > nbd/server.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index 20754e9ebc..5e27a8d31a 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1419,6 +1419,8 @@ static void client_close(NBDClient *client, bool negotiated) > qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, > NULL); > > + AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine); > + > /* Also tell the client, so that they release their reference. */ > if (client->close_fn) { > client->close_fn(client, negotiated); > @@ -2450,6 +2452,7 @@ static coroutine_fn void nbd_trip(void *opaque) > > trace_nbd_trip(); > if (client->closing) { > + client->recv_coroutine = NULL; > nbd_client_put(client); > return; > } > I have a doubt, isn't it possible to hang in AIO_WAIT_WHILE() after this patch? Do we need aio_wait_kick() in the nbd_trip()? Or may be, something like aio_wait_kick(), but to wake exactly client->exp->ctx aio context? Also, why in block/io.c we kick the main context, but not bs->aio_context?
On Mon, Jun 29, 2020 at 10:55:06AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Also, why in block/io.c we kick the main context, but not bs->aio_context?
From AIO_WAIT_WHILE():
* The caller's thread must be the IOThread that owns @ctx or the main loop
* thread (with @ctx acquired exactly once). This function cannot be used to
* wait on conditions between two IOThreads since that could lead to deadlock,
* go via the main loop instead.
Case 1: IOThread
while ((cond)) { \
aio_poll(ctx_, true); \
waited_ = true; \
} \
In this case aio_poll() returns after every event loop iteration and we
will re-evaluate cond. Therefore we don't need to be kicked.
Case 2: Main loop thread
In this case we need the kick since we're waiting on the main loop
AioContext, not the IOThread AioContext that is doing the actual work.
aio_wait_kick() schedules a dummy BH to wake up the main loop thread.
There is no harm in scheduling the dummy BH in the main loop thread when
AIO_WAIT_WHILE() is called from an IOThread.
Hope this helps,
Stefan
29.06.2020 16:56, Stefan Hajnoczi wrote: > On Mon, Jun 29, 2020 at 10:55:06AM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Also, why in block/io.c we kick the main context, but not bs->aio_context? > > From AIO_WAIT_WHILE(): > > * The caller's thread must be the IOThread that owns @ctx or the main loop > * thread (with @ctx acquired exactly once). This function cannot be used to > * wait on conditions between two IOThreads since that could lead to deadlock, > * go via the main loop instead. > > Case 1: IOThread > > while ((cond)) { \ > aio_poll(ctx_, true); \ > waited_ = true; \ > } \ > > In this case aio_poll() returns after every event loop iteration and we > will re-evaluate cond. Therefore we don't need to be kicked. > > Case 2: Main loop thread > > In this case we need the kick since we're waiting on the main loop > AioContext, not the IOThread AioContext that is doing the actual work. > aio_wait_kick() schedules a dummy BH to wake up the main loop thread. > > There is no harm in scheduling the dummy BH in the main loop thread when > AIO_WAIT_WHILE() is called from an IOThread. > > Hope this helps, Thanks! Looking at this all again, I think that client->recv_coroutine == NULL is a bad marker of finish, as it's not directly bound to _put. I'll try another approach, to make nbd_export_close_all() be synchronous instead by waiting for all export to be actually freed.
diff --git a/nbd/server.c b/nbd/server.c index 20754e9ebc..5e27a8d31a 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1419,6 +1419,8 @@ static void client_close(NBDClient *client, bool negotiated) qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); + AIO_WAIT_WHILE(client->exp->ctx, client->recv_coroutine); + /* Also tell the client, so that they release their reference. */ if (client->close_fn) { client->close_fn(client, negotiated); @@ -2450,6 +2452,7 @@ static coroutine_fn void nbd_trip(void *opaque) trace_nbd_trip(); if (client->closing) { + client->recv_coroutine = NULL; nbd_client_put(client); return; }
client_close doesn't guarantee that client is closed: nbd_trip() keeps reference to it. Let's wait for nbd_trip to finish. Without this fix, the following crash is possible: - export bitmap through unternal Qemu NBD server - connect a client - shutdown Qemu On shutdown nbd_export_close_all is called, but it actually don't wait for nbd_trip() to finish and to release its references. So, export is not release, and exported bitmap remains busy, and on try to remove the bitmap (which is part of bdrv_close()) the assertion fairs: bdrv_release_dirty_bitmap_locked: Assertion `!bdrv_dirty_bitmap_busy(bitmap)' failed Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- nbd/server.c | 3 +++ 1 file changed, 3 insertions(+)