From patchwork Thu Jun 1 15:25:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9DF0C77B7E for ; Thu, 1 Jun 2023 15:31:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235109AbjFAPb1 (ORCPT ); Thu, 1 Jun 2023 11:31:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233812AbjFAPbK (ORCPT ); Thu, 1 Jun 2023 11:31:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89B2EE51 for ; Thu, 1 Jun 2023 08:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LUW/UViDGW02kQ/3j6Afz3hRTQNvXIZflOpv+pi7CHk=; b=LcMlSEXQWUOXJKQ8XgHEdgMH8VYIjjMYeodFxLyp44cwg/ZU3N80cjL3dGcTWJCgSMjp+d 5/oyrDoJl/tgi73E2Zg4lvQZMKRAgpPH5DBMCPhhjaCP2E6TkbjobYvebsWbhm/xNBcER4 e3G5xJ4xBEkUNTY6plGags5HIw/T1ok= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-608-97tanEttN06cpnPrWUSb1A-1; Thu, 01 Jun 2023 11:26:19 -0400 X-MC-Unique: 97tanEttN06cpnPrWUSb1A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3741C803CA2; Thu, 1 Jun 2023 15:25:57 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7572F20296C6; Thu, 1 Jun 2023 15:25:56 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 1/8] block: add blk_io_plug_call() API Date: Thu, 1 Jun 2023 11:25:45 -0400 Message-Id: <20230601152552.1603119-2-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce a new API for thread-local blk_io_plug() that does not traverse the block graph. The goal is to make blk_io_plug() multi-queue friendly. Instead of having block drivers track whether or not we're in a plugged section, provide an API that allows them to defer a function call until we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is called multiple times with the same fn/opaque pair, then fn() is only called once at the end of the function - resulting in batching. This patch introduces the API and changes blk_io_plug()/blk_io_unplug(). blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument because the plug state is now thread-local. Later patches convert block drivers to blk_io_plug_call() and then we can finally remove .bdrv_co_io_plug() once all block drivers have been converted. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Stefano Garzarella Acked-by: Kevin Wolf Message-id: 20230530180959.1108766-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- MAINTAINERS | 1 + include/sysemu/block-backend-io.h | 13 +-- block/block-backend.c | 22 ----- block/plug.c | 159 ++++++++++++++++++++++++++++++ hw/block/dataplane/xen-block.c | 8 +- hw/block/virtio-blk.c | 4 +- hw/scsi/virtio-scsi.c | 6 +- block/meson.build | 1 + 8 files changed, 173 insertions(+), 41 deletions(-) create mode 100644 block/plug.c diff --git a/MAINTAINERS b/MAINTAINERS index 4b025a7b63..89f274f85e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2650,6 +2650,7 @@ F: util/aio-*.c F: util/aio-*.h F: util/fdmon-*.c F: block/io.c +F: block/plug.c F: migration/block* F: include/block/aio.h F: include/block/aio-wait.h diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index d62a7ee773..be4dcef59d 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -100,16 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error); int blk_get_max_iov(BlockBackend *blk); int blk_get_max_hw_iov(BlockBackend *blk); -/* - * blk_io_plug/unplug are thread-local operations. This means that multiple - * IOThreads can simultaneously call plug/unplug, but the caller must ensure - * that each unplug() is called in the same IOThread of the matching plug(). - */ -void coroutine_fn blk_co_io_plug(BlockBackend *blk); -void co_wrapper blk_io_plug(BlockBackend *blk); - -void coroutine_fn blk_co_io_unplug(BlockBackend *blk); -void co_wrapper blk_io_unplug(BlockBackend *blk); +void blk_io_plug(void); +void blk_io_unplug(void); +void blk_io_plug_call(void (*fn)(void *), void *opaque); AioContext *blk_get_aio_context(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); diff --git a/block/block-backend.c b/block/block-backend.c index 241f643507..4009ed5fed 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2582,28 +2582,6 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier *notify) notifier_list_add(&blk->insert_bs_notifiers, notify); } -void coroutine_fn blk_co_io_plug(BlockBackend *blk) -{ - BlockDriverState *bs = blk_bs(blk); - IO_CODE(); - GRAPH_RDLOCK_GUARD(); - - if (bs) { - bdrv_co_io_plug(bs); - } -} - -void coroutine_fn blk_co_io_unplug(BlockBackend *blk) -{ - BlockDriverState *bs = blk_bs(blk); - IO_CODE(); - GRAPH_RDLOCK_GUARD(); - - if (bs) { - bdrv_co_io_unplug(bs); - } -} - BlockAcctStats *blk_get_stats(BlockBackend *blk) { IO_CODE(); diff --git a/block/plug.c b/block/plug.c new file mode 100644 index 0000000000..98a155d2f4 --- /dev/null +++ b/block/plug.c @@ -0,0 +1,159 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Block I/O plugging + * + * Copyright Red Hat. + * + * This API defers a function call within a blk_io_plug()/blk_io_unplug() + * section, allowing multiple calls to batch up. This is a performance + * optimization that is used in the block layer to submit several I/O requests + * at once instead of individually: + * + * blk_io_plug(); <-- start of plugged region + * ... + * blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call + * blk_io_plug_call(my_func, my_obj); <-- another + * blk_io_plug_call(my_func, my_obj); <-- another + * ... + * blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once + * + * This code is actually generic and not tied to the block layer. If another + * subsystem needs this functionality, it could be renamed. + */ + +#include "qemu/osdep.h" +#include "qemu/coroutine-tls.h" +#include "qemu/notify.h" +#include "qemu/thread.h" +#include "sysemu/block-backend.h" + +/* A function call that has been deferred until unplug() */ +typedef struct { + void (*fn)(void *); + void *opaque; +} UnplugFn; + +/* Per-thread state */ +typedef struct { + unsigned count; /* how many times has plug() been called? */ + GArray *unplug_fns; /* functions to call at unplug time */ +} Plug; + +/* Use get_ptr_plug() to fetch this thread-local value */ +QEMU_DEFINE_STATIC_CO_TLS(Plug, plug); + +/* Called at thread cleanup time */ +static void blk_io_plug_atexit(Notifier *n, void *value) +{ + Plug *plug = get_ptr_plug(); + g_array_free(plug->unplug_fns, TRUE); +} + +/* This won't involve coroutines, so use __thread */ +static __thread Notifier blk_io_plug_atexit_notifier; + +/** + * blk_io_plug_call: + * @fn: a function pointer to be invoked + * @opaque: a user-defined argument to @fn() + * + * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug() + * section. + * + * Otherwise defer the call until the end of the outermost + * blk_io_plug()/blk_io_unplug() section in this thread. If the same + * @fn/@opaque pair has already been deferred, it will only be called once upon + * blk_io_unplug() so that accumulated calls are batched into a single call. + * + * The caller must ensure that @opaque is not freed before @fn() is invoked. + */ +void blk_io_plug_call(void (*fn)(void *), void *opaque) +{ + Plug *plug = get_ptr_plug(); + + /* Call immediately if we're not plugged */ + if (plug->count == 0) { + fn(opaque); + return; + } + + GArray *array = plug->unplug_fns; + if (!array) { + array = g_array_new(FALSE, FALSE, sizeof(UnplugFn)); + plug->unplug_fns = array; + blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit; + qemu_thread_atexit_add(&blk_io_plug_atexit_notifier); + } + + UnplugFn *fns = (UnplugFn *)array->data; + UnplugFn new_fn = { + .fn = fn, + .opaque = opaque, + }; + + /* + * There won't be many, so do a linear search. If this becomes a bottleneck + * then a binary search (glib 2.62+) or different data structure could be + * used. + */ + for (guint i = 0; i < array->len; i++) { + if (memcmp(&fns[i], &new_fn, sizeof(new_fn)) == 0) { + return; /* already exists */ + } + } + + g_array_append_val(array, new_fn); +} + +/** + * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug() + * + * blk_io_plug/unplug are thread-local operations. This means that multiple + * threads can simultaneously call plug/unplug, but the caller must ensure that + * each unplug() is called in the same thread of the matching plug(). + * + * Nesting is supported. blk_io_plug_call() functions are only called at the + * outermost blk_io_unplug(). + */ +void blk_io_plug(void) +{ + Plug *plug = get_ptr_plug(); + + assert(plug->count < UINT32_MAX); + + plug->count++; +} + +/** + * blk_io_unplug: Run any pending blk_io_plug_call() functions + * + * There must have been a matching blk_io_plug() call in the same thread prior + * to this blk_io_unplug() call. + */ +void blk_io_unplug(void) +{ + Plug *plug = get_ptr_plug(); + + assert(plug->count > 0); + + if (--plug->count > 0) { + return; + } + + GArray *array = plug->unplug_fns; + if (!array) { + return; + } + + UnplugFn *fns = (UnplugFn *)array->data; + + for (guint i = 0; i < array->len; i++) { + fns[i].fn(fns[i].opaque); + } + + /* + * This resets the array without freeing memory so that appending is cheap + * in the future. + */ + g_array_set_size(array, 0); +} diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 2597f38805..3b6f2b0aa2 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -537,7 +537,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) * is below us. */ if (inflight_atstart > IO_PLUG_THRESHOLD) { - blk_io_plug(dataplane->blk); + blk_io_plug(); } while (rc != rp) { /* pull request from ring */ @@ -577,12 +577,12 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) if (inflight_atstart > IO_PLUG_THRESHOLD && batched >= inflight_atstart) { - blk_io_unplug(dataplane->blk); + blk_io_unplug(); } xen_block_do_aio(request); if (inflight_atstart > IO_PLUG_THRESHOLD) { if (batched >= inflight_atstart) { - blk_io_plug(dataplane->blk); + blk_io_plug(); batched = 0; } else { batched++; @@ -590,7 +590,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) } } if (inflight_atstart > IO_PLUG_THRESHOLD) { - blk_io_unplug(dataplane->blk); + blk_io_unplug(); } return done_something; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 4ca66b5860..39e7f23fab 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -1134,7 +1134,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) bool suppress_notifications = virtio_queue_get_notification(vq); aio_context_acquire(blk_get_aio_context(s->blk)); - blk_io_plug(s->blk); + blk_io_plug(); do { if (suppress_notifications) { @@ -1158,7 +1158,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq) virtio_blk_submit_multireq(s, &mrb); } - blk_io_unplug(s->blk); + blk_io_unplug(); aio_context_release(blk_get_aio_context(s->blk)); } diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 4a8849cc7e..9c8ef0aaa6 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -799,7 +799,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req) return -ENOBUFS; } scsi_req_ref(req->sreq); - blk_io_plug(d->conf.blk); + blk_io_plug(); object_unref(OBJECT(d)); return 0; } @@ -810,7 +810,7 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req) if (scsi_req_enqueue(sreq)) { scsi_req_continue(sreq); } - blk_io_unplug(sreq->dev->conf.blk); + blk_io_unplug(); scsi_req_unref(sreq); } @@ -836,7 +836,7 @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq) while (!QTAILQ_EMPTY(&reqs)) { req = QTAILQ_FIRST(&reqs); QTAILQ_REMOVE(&reqs, req, next); - blk_io_unplug(req->sreq->dev->conf.blk); + blk_io_unplug(); scsi_req_unref(req->sreq); virtqueue_detach_element(req->vq, &req->elem, 0); virtio_scsi_free_req(req); diff --git a/block/meson.build b/block/meson.build index 486dda8b85..fb4332bd66 100644 --- a/block/meson.build +++ b/block/meson.build @@ -23,6 +23,7 @@ block_ss.add(files( 'mirror.c', 'nbd.c', 'null.c', + 'plug.c', 'qapi.c', 'qcow2-bitmap.c', 'qcow2-cache.c', From patchwork Thu Jun 1 15:25:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5F27C7EE2A for ; Thu, 1 Jun 2023 15:26:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234179AbjFAP0s (ORCPT ); Thu, 1 Jun 2023 11:26:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232667AbjFAP0r (ORCPT ); Thu, 1 Jun 2023 11:26:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1950D12C for ; Thu, 1 Jun 2023 08:26:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=56sRbxTeLa9JZmQN0B8QX1cUKslR1+CU7IhQ7z+Xw5c=; b=GCkuwf8h0D1LtK4V3BWWO1wPO2sc1DVJkTCtIhFcGyGlhYpjwWaTlpWQRzVf0JOOVSyviY na/7uaiR8Hk83dw628aGzhjSHM6gy3U4P+szTpluClOijLs8ZrJd3Eg6tF5EAJgAIBo/X2 CE1M6fJUjTPa0UAgZ2DSnmCGKsmhUsw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-227-lWN_D032PEW-zCcOIkDfhw-1; Thu, 01 Jun 2023 11:25:59 -0400 X-MC-Unique: lWN_D032PEW-zCcOIkDfhw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 27AEA800159; Thu, 1 Jun 2023 15:25:59 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A0932166B27; Thu, 1 Jun 2023 15:25:58 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 2/8] block/nvme: convert to blk_io_plug_call() API Date: Thu, 1 Jun 2023 11:25:46 -0400 Message-Id: <20230601152552.1603119-3-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Stefano Garzarella Acked-by: Kevin Wolf Message-id: 20230530180959.1108766-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- block/nvme.c | 44 ++++++++++++-------------------------------- block/trace-events | 1 - 2 files changed, 12 insertions(+), 33 deletions(-) diff --git a/block/nvme.c b/block/nvme.c index 17937d398d..7ca85bc44a 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -25,6 +25,7 @@ #include "qemu/vfio-helpers.h" #include "block/block-io.h" #include "block/block_int.h" +#include "sysemu/block-backend.h" #include "sysemu/replay.h" #include "trace.h" @@ -119,7 +120,6 @@ struct BDRVNVMeState { int blkshift; uint64_t max_transfer; - bool plugged; bool supports_write_zeroes; bool supports_discard; @@ -282,7 +282,7 @@ static void nvme_kick(NVMeQueuePair *q) { BDRVNVMeState *s = q->s; - if (s->plugged || !q->need_kick) { + if (!q->need_kick) { return; } trace_nvme_kick(s, q->index); @@ -387,10 +387,6 @@ static bool nvme_process_completion(NVMeQueuePair *q) NvmeCqe *c; trace_nvme_process_completion(s, q->index, q->inflight); - if (s->plugged) { - trace_nvme_process_completion_queue_plugged(s, q->index); - return false; - } /* * Support re-entrancy when a request cb() function invokes aio_poll(). @@ -480,6 +476,15 @@ static void nvme_trace_command(const NvmeCmd *cmd) } } +static void nvme_unplug_fn(void *opaque) +{ + NVMeQueuePair *q = opaque; + + QEMU_LOCK_GUARD(&q->lock); + nvme_kick(q); + nvme_process_completion(q); +} + static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req, NvmeCmd *cmd, BlockCompletionFunc cb, void *opaque) @@ -496,8 +501,7 @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req, q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd)); q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE; q->need_kick++; - nvme_kick(q); - nvme_process_completion(q); + blk_io_plug_call(nvme_unplug_fn, q); qemu_mutex_unlock(&q->lock); } @@ -1567,27 +1571,6 @@ static void nvme_attach_aio_context(BlockDriverState *bs, } } -static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs) -{ - BDRVNVMeState *s = bs->opaque; - assert(!s->plugged); - s->plugged = true; -} - -static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs) -{ - BDRVNVMeState *s = bs->opaque; - assert(s->plugged); - s->plugged = false; - for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) { - NVMeQueuePair *q = s->queues[i]; - qemu_mutex_lock(&q->lock); - nvme_kick(q); - nvme_process_completion(q); - qemu_mutex_unlock(&q->lock); - } -} - static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size, Error **errp) { @@ -1664,9 +1647,6 @@ static BlockDriver bdrv_nvme = { .bdrv_detach_aio_context = nvme_detach_aio_context, .bdrv_attach_aio_context = nvme_attach_aio_context, - .bdrv_co_io_plug = nvme_co_io_plug, - .bdrv_co_io_unplug = nvme_co_io_unplug, - .bdrv_register_buf = nvme_register_buf, .bdrv_unregister_buf = nvme_unregister_buf, }; diff --git a/block/trace-events b/block/trace-events index 32665158d6..048ad27519 100644 --- a/block/trace-events +++ b/block/trace-events @@ -141,7 +141,6 @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u" nvme_dma_flush_queue_wait(void *s) "s %p" nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x" nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u inflight %d" -nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u" nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d" nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d" nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x" From patchwork Thu Jun 1 15:25:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C4FEC77B7A for ; Thu, 1 Jun 2023 15:26:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234350AbjFAP0w (ORCPT ); Thu, 1 Jun 2023 11:26:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233818AbjFAP0u (ORCPT ); Thu, 1 Jun 2023 11:26:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24C48134 for ; Thu, 1 Jun 2023 08:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633164; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/KanOY4J9dCjz5B75DsOt8HVMCuzZufc1fAM7/vf+jo=; b=bdHA1MtumnETKswAD3ooM2yJrecYCRwtXgI70Jso4+ZwSUk8NnVOwOp6xC1OQvgJ06/rq7 E5jsgmAJWgIbo4o6hwg2ZOmTav87LJu9On2qSKP0gRep865ZZX8i1hSwzQJA3MWppkwzfd L/m0EshstPY8g1ILq8o2BB9Wa9TDG8Q= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-623-0oUwZqO3Oui-d_EN81nNuQ-1; Thu, 01 Jun 2023 11:26:02 -0400 X-MC-Unique: 0oUwZqO3Oui-d_EN81nNuQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 41EFA3823A0B; Thu, 1 Jun 2023 15:26:01 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC307492B0B; Thu, 1 Jun 2023 15:26:00 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 3/8] block/blkio: convert to blk_io_plug_call() API Date: Thu, 1 Jun 2023 11:25:47 -0400 Message-Id: <20230601152552.1603119-4-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Stefano Garzarella Acked-by: Kevin Wolf Message-id: 20230530180959.1108766-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- block/blkio.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/block/blkio.c b/block/blkio.c index 72117fa005..11be8787a3 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -17,6 +17,7 @@ #include "qemu/error-report.h" #include "qapi/qmp/qdict.h" #include "qemu/module.h" +#include "sysemu/block-backend.h" #include "exec/memory.h" /* for ram_block_discard_disable() */ #include "block/block-io.h" @@ -320,16 +321,30 @@ static void blkio_detach_aio_context(BlockDriverState *bs) NULL, NULL, NULL); } -/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */ -static void blkio_submit_io(BlockDriverState *bs) +/* + * Called by blk_io_unplug() or immediately if not plugged. Called without + * blkio_lock. + */ +static void blkio_unplug_fn(void *opaque) { - if (qatomic_read(&bs->io_plugged) == 0) { - BDRVBlkioState *s = bs->opaque; + BDRVBlkioState *s = opaque; + WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_do_io(s->blkioq, NULL, 0, 0, NULL); } } +/* + * Schedule I/O submission after enqueuing a new request. Called without + * blkio_lock. + */ +static void blkio_submit_io(BlockDriverState *bs) +{ + BDRVBlkioState *s = bs->opaque; + + blk_io_plug_call(blkio_unplug_fn, s); +} + static int coroutine_fn blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes) { @@ -340,9 +355,9 @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes) WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_discard(s->blkioq, offset, bytes, &cod, 0); - blkio_submit_io(bs); } + blkio_submit_io(bs); qemu_coroutine_yield(); return cod.ret; } @@ -373,9 +388,9 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes, WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0); - blkio_submit_io(bs); } + blkio_submit_io(bs); qemu_coroutine_yield(); if (use_bounce_buffer) { @@ -418,9 +433,9 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset, WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags); - blkio_submit_io(bs); } + blkio_submit_io(bs); qemu_coroutine_yield(); if (use_bounce_buffer) { @@ -439,9 +454,9 @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs) WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_flush(s->blkioq, &cod, 0); - blkio_submit_io(bs); } + blkio_submit_io(bs); qemu_coroutine_yield(); return cod.ret; } @@ -467,22 +482,13 @@ static int coroutine_fn blkio_co_pwrite_zeroes(BlockDriverState *bs, WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags); - blkio_submit_io(bs); } + blkio_submit_io(bs); qemu_coroutine_yield(); return cod.ret; } -static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs) -{ - BDRVBlkioState *s = bs->opaque; - - WITH_QEMU_LOCK_GUARD(&s->blkio_lock) { - blkio_submit_io(bs); - } -} - typedef enum { BMRR_OK, BMRR_SKIP, @@ -1004,7 +1010,6 @@ static void blkio_refresh_limits(BlockDriverState *bs, Error **errp) .bdrv_co_pwritev = blkio_co_pwritev, \ .bdrv_co_flush_to_disk = blkio_co_flush, \ .bdrv_co_pwrite_zeroes = blkio_co_pwrite_zeroes, \ - .bdrv_co_io_unplug = blkio_co_io_unplug, \ .bdrv_refresh_limits = blkio_refresh_limits, \ .bdrv_register_buf = blkio_register_buf, \ .bdrv_unregister_buf = blkio_unregister_buf, \ From patchwork Thu Jun 1 15:25:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CFEDC77B7E for ; Thu, 1 Jun 2023 15:27:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234879AbjFAP07 (ORCPT ); Thu, 1 Jun 2023 11:26:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234771AbjFAP05 (ORCPT ); Thu, 1 Jun 2023 11:26:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15B62107 for ; Thu, 1 Jun 2023 08:26:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633167; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=isHwy///XBfEj9leDzUcC4RkN1ck1qXgOorGTTirv6Q=; b=a3nxDyurx+vnomk1yWYTNfjxmaGpgI4tl6gQS0eJ1m2plHachKBHJJVXS3U0BmI9D2rTpv 6fW7MSdj0ErOmKfQ3FOKc0iMC58Oat/YHFaxL7EOj7XXQh/VpBCKQr30pLUHieP9+wK1iF jneXjyQyWvDkrsKXAlsnSufF6a1aD20= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-313-2vgpgg3mPcSjbZVrWqxdXw-1; Thu, 01 Jun 2023 11:26:04 -0400 X-MC-Unique: 2vgpgg3mPcSjbZVrWqxdXw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 382503C397F2; Thu, 1 Jun 2023 15:26:03 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 94ACA8162; Thu, 1 Jun 2023 15:26:02 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 4/8] block/io_uring: convert to blk_io_plug_call() API Date: Thu, 1 Jun 2023 11:25:48 -0400 Message-Id: <20230601152552.1603119-5-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Stefano Garzarella Acked-by: Kevin Wolf Message-id: 20230530180959.1108766-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- include/block/raw-aio.h | 7 ------- block/file-posix.c | 10 ---------- block/io_uring.c | 44 ++++++++++++++++------------------------- block/trace-events | 5 ++--- 4 files changed, 19 insertions(+), 47 deletions(-) diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index 0fe85ade77..da60ca13ef 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -81,13 +81,6 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset, QEMUIOVector *qiov, int type); void luring_detach_aio_context(LuringState *s, AioContext *old_context); void luring_attach_aio_context(LuringState *s, AioContext *new_context); - -/* - * luring_io_plug/unplug work in the thread's current AioContext, therefore the - * caller must ensure that they are paired in the same IOThread. - */ -void luring_io_plug(void); -void luring_io_unplug(void); #endif #ifdef _WIN32 diff --git a/block/file-posix.c b/block/file-posix.c index 0ab158efba..7baa8491dd 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2558,11 +2558,6 @@ static void coroutine_fn raw_co_io_plug(BlockDriverState *bs) laio_io_plug(); } #endif -#ifdef CONFIG_LINUX_IO_URING - if (s->use_linux_io_uring) { - luring_io_plug(); - } -#endif } static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs) @@ -2573,11 +2568,6 @@ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs) laio_io_unplug(s->aio_max_batch); } #endif -#ifdef CONFIG_LINUX_IO_URING - if (s->use_linux_io_uring) { - luring_io_unplug(); - } -#endif } static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs) diff --git a/block/io_uring.c b/block/io_uring.c index 3a77480e16..69d9820928 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -16,6 +16,7 @@ #include "block/raw-aio.h" #include "qemu/coroutine.h" #include "qapi/error.h" +#include "sysemu/block-backend.h" #include "trace.h" /* Only used for assertions. */ @@ -41,7 +42,6 @@ typedef struct LuringAIOCB { } LuringAIOCB; typedef struct LuringQueue { - int plugged; unsigned int in_queue; unsigned int in_flight; bool blocked; @@ -267,7 +267,7 @@ static void luring_process_completions_and_submit(LuringState *s) { luring_process_completions(s); - if (!s->io_q.plugged && s->io_q.in_queue > 0) { + if (s->io_q.in_queue > 0) { ioq_submit(s); } } @@ -301,29 +301,17 @@ static void qemu_luring_poll_ready(void *opaque) static void ioq_init(LuringQueue *io_q) { QSIMPLEQ_INIT(&io_q->submit_queue); - io_q->plugged = 0; io_q->in_queue = 0; io_q->in_flight = 0; io_q->blocked = false; } -void luring_io_plug(void) +static void luring_unplug_fn(void *opaque) { - AioContext *ctx = qemu_get_current_aio_context(); - LuringState *s = aio_get_linux_io_uring(ctx); - trace_luring_io_plug(s); - s->io_q.plugged++; -} - -void luring_io_unplug(void) -{ - AioContext *ctx = qemu_get_current_aio_context(); - LuringState *s = aio_get_linux_io_uring(ctx); - assert(s->io_q.plugged); - trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged, - s->io_q.in_queue, s->io_q.in_flight); - if (--s->io_q.plugged == 0 && - !s->io_q.blocked && s->io_q.in_queue > 0) { + LuringState *s = opaque; + trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue, + s->io_q.in_flight); + if (!s->io_q.blocked && s->io_q.in_queue > 0) { ioq_submit(s); } } @@ -370,14 +358,16 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next); s->io_q.in_queue++; - trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged, - s->io_q.in_queue, s->io_q.in_flight); - if (!s->io_q.blocked && - (!s->io_q.plugged || - s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) { - ret = ioq_submit(s); - trace_luring_do_submit_done(s, ret); - return ret; + trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue, + s->io_q.in_flight); + if (!s->io_q.blocked) { + if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) { + ret = ioq_submit(s); + trace_luring_do_submit_done(s, ret); + return ret; + } + + blk_io_plug_call(luring_unplug_fn, s); } return 0; } diff --git a/block/trace-events b/block/trace-events index 048ad27519..6f121b7636 100644 --- a/block/trace-events +++ b/block/trace-events @@ -64,9 +64,8 @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) " # io_uring.c luring_init_state(void *s, size_t size) "s %p size %zu" luring_cleanup_state(void *s) "%p freed" -luring_io_plug(void *s) "LuringState %p plug" -luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d" -luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d" +luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d" +luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d" luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d" luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d" luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d" From patchwork Thu Jun 1 15:25:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4F4EC7EE31 for ; Thu, 1 Jun 2023 15:26:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234813AbjFAP07 (ORCPT ); Thu, 1 Jun 2023 11:26:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234704AbjFAP04 (ORCPT ); Thu, 1 Jun 2023 11:26:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B56A123 for ; Thu, 1 Jun 2023 08:26:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633167; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c2lndZBb5pLwi3SunbidGAiBTlv7HE8p3pWS3Fk2JpE=; b=UkZWt/sGrOK8G7sMXJW8rCV6W++/XsGirnGWefTjhe9l+aRMfa2QJj/eoFoothb+kndGi1 zwD23cgB/1EIeBDapDH81WZMsqFQYWGgN61+0IRdL5w7x+8DtPFHWBV8oUZFHz7GJCv+r8 Qd0HK/Y6/iGCrA5ijO0I/EPVs36kKR8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-29-8xRkLrphPXe219NBEoZYwg-1; Thu, 01 Jun 2023 11:26:06 -0400 X-MC-Unique: 8xRkLrphPXe219NBEoZYwg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 49B5E85829A; Thu, 1 Jun 2023 15:26:05 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8AA1040C6EC4; Thu, 1 Jun 2023 15:26:04 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 5/8] block/linux-aio: convert to blk_io_plug_call() API Date: Thu, 1 Jun 2023 11:25:49 -0400 Message-Id: <20230601152552.1603119-6-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Note that a dev_max_batch check is dropped in laio_io_unplug() because the semantics of unplug_fn() are different from .bdrv_co_unplug(): 1. unplug_fn() is only called when the last blk_io_unplug() call occurs, not every time blk_io_unplug() is called. 2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no way to get per-BlockDriverState fields like dev_max_batch. Therefore this condition cannot be moved to laio_unplug_fn(). It is not obvious that this condition affects performance in practice, so I am removing it instead of trying to come up with a more complex mechanism to preserve the condition. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Acked-by: Kevin Wolf Reviewed-by: Stefano Garzarella Message-id: 20230530180959.1108766-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- include/block/raw-aio.h | 7 ------- block/file-posix.c | 28 ---------------------------- block/linux-aio.c | 41 +++++++++++------------------------------ 3 files changed, 11 insertions(+), 65 deletions(-) diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index da60ca13ef..0f63c2800c 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov, void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context); void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context); - -/* - * laio_io_plug/unplug work in the thread's current AioContext, therefore the - * caller must ensure that they are paired in the same IOThread. - */ -void laio_io_plug(void); -void laio_io_unplug(uint64_t dev_max_batch); #endif /* io_uring.c - Linux io_uring implementation */ #ifdef CONFIG_LINUX_IO_URING diff --git a/block/file-posix.c b/block/file-posix.c index 7baa8491dd..ac1ed54811 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset, return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE); } -static void coroutine_fn raw_co_io_plug(BlockDriverState *bs) -{ - BDRVRawState __attribute__((unused)) *s = bs->opaque; -#ifdef CONFIG_LINUX_AIO - if (s->use_linux_aio) { - laio_io_plug(); - } -#endif -} - -static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs) -{ - BDRVRawState __attribute__((unused)) *s = bs->opaque; -#ifdef CONFIG_LINUX_AIO - if (s->use_linux_aio) { - laio_io_unplug(s->aio_max_batch); - } -#endif -} - static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs) { BDRVRawState *s = bs->opaque; @@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = { .bdrv_co_copy_range_from = raw_co_copy_range_from, .bdrv_co_copy_range_to = raw_co_copy_range_to, .bdrv_refresh_limits = raw_refresh_limits, - .bdrv_co_io_plug = raw_co_io_plug, - .bdrv_co_io_unplug = raw_co_io_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, .bdrv_co_truncate = raw_co_truncate, @@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = { .bdrv_co_copy_range_from = raw_co_copy_range_from, .bdrv_co_copy_range_to = raw_co_copy_range_to, .bdrv_refresh_limits = raw_refresh_limits, - .bdrv_co_io_plug = raw_co_io_plug, - .bdrv_co_io_unplug = raw_co_io_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, .bdrv_co_truncate = raw_co_truncate, @@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = { .bdrv_co_pwritev = raw_co_pwritev, .bdrv_co_flush_to_disk = raw_co_flush_to_disk, .bdrv_refresh_limits = cdrom_refresh_limits, - .bdrv_co_io_plug = raw_co_io_plug, - .bdrv_co_io_unplug = raw_co_io_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, .bdrv_co_truncate = raw_co_truncate, @@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = { .bdrv_co_pwritev = raw_co_pwritev, .bdrv_co_flush_to_disk = raw_co_flush_to_disk, .bdrv_refresh_limits = cdrom_refresh_limits, - .bdrv_co_io_plug = raw_co_io_plug, - .bdrv_co_io_unplug = raw_co_io_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, .bdrv_co_truncate = raw_co_truncate, diff --git a/block/linux-aio.c b/block/linux-aio.c index 916f001e32..561c71a9ae 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -15,6 +15,7 @@ #include "qemu/event_notifier.h" #include "qemu/coroutine.h" #include "qapi/error.h" +#include "sysemu/block-backend.h" /* Only used for assertions. */ #include "qemu/coroutine_int.h" @@ -46,7 +47,6 @@ struct qemu_laiocb { }; typedef struct { - int plugged; unsigned int in_queue; unsigned int in_flight; bool blocked; @@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s) { qemu_laio_process_completions(s); - if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) { + if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) { ioq_submit(s); } } @@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque) static void ioq_init(LaioQueue *io_q) { QSIMPLEQ_INIT(&io_q->pending); - io_q->plugged = 0; io_q->in_queue = 0; io_q->in_flight = 0; io_q->blocked = false; @@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch) return max_batch; } -void laio_io_plug(void) +static void laio_unplug_fn(void *opaque) { - AioContext *ctx = qemu_get_current_aio_context(); - LinuxAioState *s = aio_get_linux_aio(ctx); + LinuxAioState *s = opaque; - s->io_q.plugged++; -} - -void laio_io_unplug(uint64_t dev_max_batch) -{ - AioContext *ctx = qemu_get_current_aio_context(); - LinuxAioState *s = aio_get_linux_aio(ctx); - - assert(s->io_q.plugged); - s->io_q.plugged--; - - /* - * Why max batch checking is performed here: - * Another BDS may have queued requests with a higher dev_max_batch and - * therefore in_queue could now exceed our dev_max_batch. Re-check the max - * batch so we can honor our device's dev_max_batch. - */ - if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || - (!s->io_q.plugged && - !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { + if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) { ioq_submit(s); } } @@ -410,10 +389,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next); s->io_q.in_queue++; - if (!s->io_q.blocked && - (!s->io_q.plugged || - s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) { - ioq_submit(s); + if (!s->io_q.blocked) { + if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) { + ioq_submit(s); + } else { + blk_io_plug_call(laio_unplug_fn, s); + } } return 0; From patchwork Thu Jun 1 15:25:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07571C77B7A for ; Thu, 1 Jun 2023 15:27:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235031AbjFAP1L (ORCPT ); Thu, 1 Jun 2023 11:27:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234771AbjFAP1A (ORCPT ); Thu, 1 Jun 2023 11:27:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14564137 for ; Thu, 1 Jun 2023 08:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633171; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MBeGNWL9b7nl56yHjfm0zojV1HoxshEgnMZCDHHylAU=; b=hGShmMII3p7XquFJ1JzbHhJYAaThaaGoxl6nz0/tdukMXgh/WDWErCyLYKjocd4jjQfD2/ 3R+2td9G+gfnyplbN56TO2jy6UqtEvEzPvoigQsyJO5Y1Dy3L3MnVelzFbfN5M093wZpk3 Zu1FkeHbs5UpvhOCfbvNS4/rE140GXU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-307-lczibXIIP_WCrqNM8ZWL6A-1; Thu, 01 Jun 2023 11:26:08 -0400 X-MC-Unique: lczibXIIP_WCrqNM8ZWL6A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6140A3823A28; Thu, 1 Jun 2023 15:26:07 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id C73522166B25; Thu, 1 Jun 2023 15:26:06 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 6/8] block: remove bdrv_co_io_plug() API Date: Thu, 1 Jun 2023 11:25:50 -0400 Message-Id: <20230601152552.1603119-7-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org No block driver implements .bdrv_co_io_plug() anymore. Get rid of the function pointers. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Stefano Garzarella Acked-by: Kevin Wolf Message-id: 20230530180959.1108766-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- include/block/block-io.h | 3 --- include/block/block_int-common.h | 11 ---------- block/io.c | 37 -------------------------------- 3 files changed, 51 deletions(-) diff --git a/include/block/block-io.h b/include/block/block-io.h index a27e471a87..43af816d75 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -259,9 +259,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx); AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c); -void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs); -void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs); - bool coroutine_fn GRAPH_RDLOCK bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name, uint32_t granularity, Error **errp); diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index b1cbc1e00c..74195c3004 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -768,11 +768,6 @@ struct BlockDriver { void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)( BlockDriverState *bs, BlkdebugEvent event); - /* io queue for linux-aio */ - void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState *bs); - void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)( - BlockDriverState *bs); - bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs); bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)( @@ -1227,12 +1222,6 @@ struct BlockDriverState { unsigned int in_flight; unsigned int serialising_in_flight; - /* - * counter for nested bdrv_io_plug. - * Accessed with atomic ops. - */ - unsigned io_plugged; - /* do we need to tell the quest if we have a volatile write cache? */ int enable_write_cache; diff --git a/block/io.c b/block/io.c index 540bf8d26d..f2dfc7c405 100644 --- a/block/io.c +++ b/block/io.c @@ -3223,43 +3223,6 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size) return mem; } -void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs) -{ - BdrvChild *child; - IO_CODE(); - assert_bdrv_graph_readable(); - - QLIST_FOREACH(child, &bs->children, next) { - bdrv_co_io_plug(child->bs); - } - - if (qatomic_fetch_inc(&bs->io_plugged) == 0) { - BlockDriver *drv = bs->drv; - if (drv && drv->bdrv_co_io_plug) { - drv->bdrv_co_io_plug(bs); - } - } -} - -void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs) -{ - BdrvChild *child; - IO_CODE(); - assert_bdrv_graph_readable(); - - assert(bs->io_plugged); - if (qatomic_fetch_dec(&bs->io_plugged) == 1) { - BlockDriver *drv = bs->drv; - if (drv && drv->bdrv_co_io_unplug) { - drv->bdrv_co_io_unplug(bs); - } - } - - QLIST_FOREACH(child, &bs->children, next) { - bdrv_co_io_unplug(child->bs); - } -} - /* Helper that undoes bdrv_register_buf() when it fails partway through */ static void GRAPH_RDLOCK bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size, From patchwork Thu Jun 1 15:25:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0930EC7EE2F for ; Thu, 1 Jun 2023 15:27:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234969AbjFAP1P (ORCPT ); Thu, 1 Jun 2023 11:27:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234908AbjFAP1A (ORCPT ); Thu, 1 Jun 2023 11:27:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7204BE2 for ; Thu, 1 Jun 2023 08:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+2jIiSec093zxYLQEOwE5JssP3b16O/dNh3vcZRftBM=; b=D5SvOMUkQYxtKKuoo7d8aPj1lek+3p1KfQlEkt5E3FSWFl3luku9Mtw9Q/NLYnfOBESB/D HsOcGe2Bp/EYNwpf1dDt4Eb28xn4OkWr0JaG2a0UXOOYSDqlPpoGlgtKwX8pR9XTNItigK nDU2gQurIdbJSFMCw2jBbbx2fwG4oFI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-341-uDT_KEXNMA2D1W8fWckpaA-1; Thu, 01 Jun 2023 11:26:10 -0400 X-MC-Unique: uDT_KEXNMA2D1W8fWckpaA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4B4333823A15; Thu, 1 Jun 2023 15:26:09 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id A8F5A112132C; Thu, 1 Jun 2023 15:26:08 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 7/8] block/blkio: use qemu_open() to support fd passing for virtio-blk Date: Thu, 1 Jun 2023 11:25:51 -0400 Message-Id: <20230601152552.1603119-8-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Stefano Garzarella Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd passing. Let's expose this to the user, so the management layer can pass the file descriptor of an already opened path. If the libblkio virtio-blk driver supports fd passing, let's always use qemu_open() to open the `path`, so we can handle fd passing from the management layer through the "/dev/fdset/N" special path. Reviewed-by: Stefan Hajnoczi Signed-off-by: Stefano Garzarella Message-id: 20230530071941.8954-2-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi --- block/blkio.c | 53 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/block/blkio.c b/block/blkio.c index 11be8787a3..527323d625 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -673,25 +673,60 @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs, { const char *path = qdict_get_try_str(options, "path"); BDRVBlkioState *s = bs->opaque; - int ret; + bool fd_supported = false; + int fd, ret; if (!path) { error_setg(errp, "missing 'path' option"); return -EINVAL; } - ret = blkio_set_str(s->blkio, "path", path); - qdict_del(options, "path"); - if (ret < 0) { - error_setg_errno(errp, -ret, "failed to set path: %s", - blkio_get_error_msg()); - return ret; - } - if (!(flags & BDRV_O_NOCACHE)) { error_setg(errp, "cache.direct=off is not supported"); return -EINVAL; } + + if (blkio_get_int(s->blkio, "fd", &fd) == 0) { + fd_supported = true; + } + + /* + * If the libblkio driver supports fd passing, let's always use qemu_open() + * to open the `path`, so we can handle fd passing from the management + * layer through the "/dev/fdset/N" special path. + */ + if (fd_supported) { + int open_flags; + + if (flags & BDRV_O_RDWR) { + open_flags = O_RDWR; + } else { + open_flags = O_RDONLY; + } + + fd = qemu_open(path, open_flags, errp); + if (fd < 0) { + return -EINVAL; + } + + ret = blkio_set_int(s->blkio, "fd", fd); + if (ret < 0) { + error_setg_errno(errp, -ret, "failed to set fd: %s", + blkio_get_error_msg()); + qemu_close(fd); + return ret; + } + } else { + ret = blkio_set_str(s->blkio, "path", path); + if (ret < 0) { + error_setg_errno(errp, -ret, "failed to set path: %s", + blkio_get_error_msg()); + return ret; + } + } + + qdict_del(options, "path"); + return 0; } From patchwork Thu Jun 1 15:25:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 13264132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4199CC7EE32 for ; Thu, 1 Jun 2023 15:27:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235037AbjFAP1N (ORCPT ); Thu, 1 Jun 2023 11:27:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234915AbjFAP1A (ORCPT ); Thu, 1 Jun 2023 11:27:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBA60136 for ; Thu, 1 Jun 2023 08:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685633173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nZ5BNX4c5KUjeA09sd5fBjLXzeUNTb+dTFMcgwON1rg=; b=bE6rK6I/0tid1D28TYhciFJi4wHXRqaOPrMa8YT4rXtGS1cNgLUiaa142mY1UnLqFg/3/K lXB8VIBKkKKnck9CGR02YOuEylEpgjCKt3BvWPf7TpLxIIBA6abRSiH3glKRrOzaiDLrhV 6V9J8fhXNPtXzWkFpw5W9cvaoP1i+HU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-437-A75QHkCMNk6EcxhtamKjMA-1; Thu, 01 Jun 2023 11:26:12 -0400 X-MC-Unique: A75QHkCMNk6EcxhtamKjMA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 27CDE817050; Thu, 1 Jun 2023 15:26:11 +0000 (UTC) Received: from localhost (unknown [10.39.194.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F5AF492B0A; Thu, 1 Jun 2023 15:26:10 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, Stefano Stabellini , Aarushi Mehta , Anthony Perard , Thomas Huth , Julia Suvorova , Paolo Bonzini , Fam Zheng , Hanna Reitz , =?utf-8?q?Phil?= =?utf-8?q?ippe_Mathieu-Daud=C3=A9?= , Stefano Garzarella , "Michael S. Tsirkin" , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Markus Armbruster , Cornelia Huck , =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , xen-devel@lists.xenproject.org, Paul Durrant , Kevin Wolf , Richard Henderson , Eric Blake , Stefan Hajnoczi , Raphael Norwitz , kvm@vger.kernel.org Subject: [PULL 8/8] qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa Date: Thu, 1 Jun 2023 11:25:52 -0400 Message-Id: <20230601152552.1603119-9-stefanha@redhat.com> In-Reply-To: <20230601152552.1603119-1-stefanha@redhat.com> References: <20230601152552.1603119-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Stefano Garzarella The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd passing through the new 'fd' property. Since now we are using qemu_open() on '@path' if the virtio-blk driver supports the fd passing, let's announce it. In this way, the management layer can pass the file descriptor of an already opened vhost-vdpa character device. This is useful especially when the device can only be accessed with certain privileges. Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver in libblkio supports it. Suggested-by: Markus Armbruster Reviewed-by: Stefan Hajnoczi Signed-off-by: Stefano Garzarella Message-id: 20230530071941.8954-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi --- qapi/block-core.json | 6 ++++++ meson.build | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/qapi/block-core.json b/qapi/block-core.json index 98d9116dae..4bf89171c6 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -3955,10 +3955,16 @@ # # @path: path to the vhost-vdpa character device. # +# Features: +# @fdset: Member @path supports the special "/dev/fdset/N" path +# (since 8.1) +# # Since: 7.2 ## { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa', 'data': { 'path': 'str' }, + 'features': [ { 'name' :'fdset', + 'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ], 'if': 'CONFIG_BLKIO' } ## diff --git a/meson.build b/meson.build index bc76ea96bf..a61d3e9b06 100644 --- a/meson.build +++ b/meson.build @@ -2106,6 +2106,10 @@ config_host_data.set('CONFIG_LZO', lzo.found()) config_host_data.set('CONFIG_MPATH', mpathpersist.found()) config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api) config_host_data.set('CONFIG_BLKIO', blkio.found()) +if blkio.found() + config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD', + blkio.version().version_compare('>=1.3.0')) +endif config_host_data.set('CONFIG_CURL', curl.found()) config_host_data.set('CONFIG_CURSES', curses.found()) config_host_data.set('CONFIG_GBM', gbm.found())