Message ID | 20220609143727.1151816-2-eesposit@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-blk: removal of AioContext lock | expand |
On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote: > @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) > > s->dataplane_starting = false; > s->dataplane_started = true; > - aio_context_release(s->ctx); > return 0; This looks risky because s->dataplane_started is accessed by IO code and there is a race condition here. Maybe you can refactor the code along the lines of virtio-blk to avoid the race.
Am 05/07/2022 um 16:11 schrieb Stefan Hajnoczi: > On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote: >> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) >> >> s->dataplane_starting = false; >> s->dataplane_started = true; >> - aio_context_release(s->ctx); >> return 0; > > This looks risky because s->dataplane_started is accessed by IO code and > there is a race condition here. Maybe you can refactor the code along > the lines of virtio-blk to avoid the race. > Uhmm could you explain why is virtio-blk also safe here? And what is currently protecting dataplane_started (in both blk and scsi, as I don't see any other AioContext lock taken)? Because I see that for example virtio_blk_req_complete is IO_CODE, so it could theoretically read dataplane_started while it is being changed in dataplane_stop? Even though I guess it doesn't because we disable and clean the host notifier before modifying it? But if so, I don't get what is the difference with scsi code, and why we need to protect only that instance with the aiocontext lock? Thank you, Emanuele
On Fri, Jul 08, 2022 at 11:01:37AM +0200, Emanuele Giuseppe Esposito wrote: > > > Am 05/07/2022 um 16:11 schrieb Stefan Hajnoczi: > > On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote: > >> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) > >> > >> s->dataplane_starting = false; > >> s->dataplane_started = true; > >> - aio_context_release(s->ctx); > >> return 0; > > > > This looks risky because s->dataplane_started is accessed by IO code and > > there is a race condition here. Maybe you can refactor the code along > > the lines of virtio-blk to avoid the race. > > > > Uhmm could you explain why is virtio-blk also safe here? > And what is currently protecting dataplane_started (in both blk and > scsi, as I don't see any other AioContext lock taken)? dataplane_started is assigned before the host notifier is set up, which I'm assuming is an implicit write barrier. > Because I see that for example virtio_blk_req_complete is IO_CODE, so it > could theoretically read dataplane_started while it is being changed in > dataplane_stop? Even though I guess it doesn't because we disable and > clean the host notifier before modifying it? virtio_blk_data_plane_stop() has: aio_context_acquire(s->ctx); aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s); /* Drain and try to switch bs back to the QEMU main loop. If other users * keep the BlockBackend in the iothread, that's ok */ blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context(), NULL); aio_context_release(s->ctx); and disables host notifiers. At that point the IOThread no longer receives virtqueue kicks and all in-flight requests have completed. dataplane_started is only written afterwards so there is no race with virtio_blk_req_complete(). > > But if so, I don't get what is the difference with scsi code, and why we > need to protect only that instance with the aiocontext lock? The race condition I pointed out is not with virtio_blk_req_complete() and data_plane_stop(). It's data_plane_start() racing with virtio_blk_req_complete(). The virtio-scsi dataplane code is different for historical reasons and happens to have the race. I don't think the virtio-blk code is affected. Stefan
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 49276e46f2..f9224f23d2 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -167,6 +167,8 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) Error *local_err = NULL; int r; + GLOBAL_STATE_CODE(); + if (vblk->dataplane_started || s->starting) { return 0; } @@ -243,13 +245,11 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) } /* Get this show started by hooking up our callbacks */ - aio_context_acquire(s->ctx); for (i = 0; i < nvqs; i++) { VirtQueue *vq = virtio_get_queue(s->vdev, i); virtio_queue_aio_attach_host_notifier(vq, s->ctx); } - aio_context_release(s->ctx); return 0; fail_aio_context: @@ -304,6 +304,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) unsigned i; unsigned nvqs = s->conf->num_queues; + GLOBAL_STATE_CODE(); + if (!vblk->dataplane_started || s->stopping) { return; } @@ -318,6 +320,14 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) trace_virtio_blk_data_plane_stop(s); aio_context_acquire(s->ctx); + /* + * TODO: virtio_blk_data_plane_stop_bh() does not need the AioContext lock, + * because even though virtio_queue_aio_detach_host_notifier() runs in + * Iothread context, such calls are serialized by the BQL held (this + * function runs in the main loop). + * On the other side, virtio_queue_aio_attach_host_notifier* always runs + * in the main loop, therefore it doesn't need the AioContext lock. + */ aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s); /* Drain and try to switch bs back to the QEMU main loop. If other users diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e9ba752f6b..8d0590cc76 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -121,6 +121,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret) VirtIOBlock *s = next->dev; VirtIODevice *vdev = VIRTIO_DEVICE(s); + IO_CODE(); + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk)); while (next) { VirtIOBlockReq *req = next; diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index 8bb6e6acfc..7080e9caa9 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -91,6 +91,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev); VirtIOSCSI *s = VIRTIO_SCSI(vdev); + GLOBAL_STATE_CODE(); + if (s->dataplane_started || s->dataplane_starting || s->dataplane_fenced) { @@ -136,7 +138,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) memory_region_transaction_commit(); - aio_context_acquire(s->ctx); virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx); virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx); @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) s->dataplane_starting = false; s->dataplane_started = true; - aio_context_release(s->ctx); return 0; fail_host_notifiers: @@ -193,6 +193,14 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev) s->dataplane_stopping = true; aio_context_acquire(s->ctx); + /* + * TODO: virtio_scsi_dataplane_stop_bh() does not need the AioContext lock, + * because even though virtio_queue_aio_detach_host_notifier() runs in + * Iothread context, such calls are serialized by the BQL held (this + * function runs in the main loop). + * On the other side, virtio_queue_aio_attach_host_notifier* always runs + * in the main loop, therefore it doesn't need the AioContext lock. + */ aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s); aio_context_release(s->ctx);
virtio_queue_aio_attach_host_notifier() and virtio_queue_aio_attach_host_notifier_nopoll() run always in the main loop, so there is no need to protect them with AioContext lock. On the other side, virtio_queue_aio_detach_host_notifier() runs in a bh in the iothread context, but it is always scheduled (thus serialized) by the main loop. Therefore removing the AioContext lock is safe, but unfortunately we can't do it right now since bdrv_set_aio_context() and aio_wait_bh_oneshot() still need to have it. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> --- hw/block/dataplane/virtio-blk.c | 14 ++++++++++++-- hw/block/virtio-blk.c | 2 ++ hw/scsi/virtio-scsi-dataplane.c | 12 ++++++++++-- 3 files changed, 24 insertions(+), 4 deletions(-)