Message ID | 20210128201418.607640-3-rvkagan@yandex-team.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/nbd: fix crashers in reconnect while migrating | expand |
28.01.2021 23:14, Roman Kagan wrote: > When an NBD block driver state is moved from one aio_context to another > (e.g. when doing a drain in a migration thread), > nbd_client_attach_aio_context_bh is executed that enters the connection > coroutine. > > However, the assumption that ->connection_co is always present here > appears incorrect: the connection may have encountered an error other > than -EIO in the underlying transport, and thus may have decided to quit > rather than keep trying to reconnect, and therefore it may have > terminated the connection coroutine. As a result an attempt to reassign > the client in this state (NBD_CLIENT_QUIT) to a different aio_context > leads to a null pointer dereference: > > at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 > opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 > at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 > at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 > at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 > blocking=blocking@entry=true) > at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 > cb=<optimized out>, opaque=<optimized out>) > at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 > bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 > new_context=new_context@entry=0x5618056c7580, > ignore=ignore@entry=0x7f60e1e63780) > at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 > bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, > ignore_child=<optimized out>, errp=<optimized out>) > at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 > new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, > errp=errp@entry=0x0) > at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 > new_context=<optimized out>, errp=errp@entry=0x0) > at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 > at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 > at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 > running=0, state=<optimized out>) > at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 > state=state@entry=RUN_STATE_FINISH_MIGRATE) > at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 > send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 > at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 > at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 > at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 > at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 > from /lib/x86_64-linux-gnu/libpthread.so.0 > > Fix it by checking that the connection coroutine is non-null before > trying to enter it. If it is null, no entering is needed, as the > connection is probably going down anyway. > > Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > block/nbd.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index bcd6641e90..b3cbbeb4b0 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque) > BlockDriverState *bs = opaque; > BDRVNBDState *s = (BDRVNBDState *)bs->opaque; > > - /* > - * The node is still drained, so we know the coroutine has yielded in > - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is > - * entered for the first time. Both places are safe for entering the > - * coroutine. > - */ > - qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); > + if (s->connection_co) { > + /* > + * The node is still drained, so we know the coroutine has yielded in > + * nbd_read_eof(), the only place where bs->in_flight can reach 0, or > + * it is entered for the first time. Both places are safe for entering > + * the coroutine. > + */ > + qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); > + } > bdrv_dec_in_flight(bs); > } > >
diff --git a/block/nbd.c b/block/nbd.c index bcd6641e90..b3cbbeb4b0 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque) BlockDriverState *bs = opaque; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; - /* - * The node is still drained, so we know the coroutine has yielded in - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is - * entered for the first time. Both places are safe for entering the - * coroutine. - */ - qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); + if (s->connection_co) { + /* + * The node is still drained, so we know the coroutine has yielded in + * nbd_read_eof(), the only place where bs->in_flight can reach 0, or + * it is entered for the first time. Both places are safe for entering + * the coroutine. + */ + qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); + } bdrv_dec_in_flight(bs); }
When an NBD block driver state is moved from one aio_context to another (e.g. when doing a drain in a migration thread), nbd_client_attach_aio_context_bh is executed that enters the connection coroutine. However, the assumption that ->connection_co is always present here appears incorrect: the connection may have encountered an error other than -EIO in the underlying transport, and thus may have decided to quit rather than keep trying to reconnect, and therefore it may have terminated the connection coroutine. As a result an attempt to reassign the client in this state (NBD_CLIENT_QUIT) to a different aio_context leads to a null pointer dereference: at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 blocking=blocking@entry=true) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 cb=<optimized out>, opaque=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 new_context=new_context@entry=0x5618056c7580, ignore=ignore@entry=0x7f60e1e63780) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, ignore_child=<optimized out>, errp=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 new_context=<optimized out>, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 running=0, state=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 from /lib/x86_64-linux-gnu/libpthread.so.0 Fix it by checking that the connection coroutine is non-null before trying to enter it. If it is null, no entering is needed, as the connection is probably going down anyway. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> --- block/nbd.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)