From patchwork Thu Aug 11 15:37:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinhao Fan X-Patchwork-Id: 12941565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1C1C9C25B06 for ; Thu, 11 Aug 2022 15:40:48 +0000 (UTC) Received: from localhost ([::1]:41640 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAIg-0005Vi-VO for qemu-devel@archiver.kernel.org; Thu, 11 Aug 2022 11:40:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50400) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMAGS-00036m-4H; Thu, 11 Aug 2022 11:38:28 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:49836 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAGO-00026A-VV; Thu, 11 Aug 2022 11:38:27 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAA3GFlSIvViGUPiBw--.35837S3; Thu, 11 Aug 2022 23:38:13 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 1/4] hw/nvme: avoid unnecessary call to irq (de)assertion functions Date: Thu, 11 Aug 2022 23:37:36 +0800 Message-Id: <20220811153739.3079672-2-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> References: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: qwCowAA3GFlSIvViGUPiBw--.35837S3 X-Coremail-Antispam: 1UD129KBjvJXoW7Cw1xKw15uF4fZFy8KryDKFg_yoW8Kr1kpa 93W3WSkrWxWry2gw17ta47Xw1rXw4fZr1DArs3ta4xJwn3Ary5JFWrGryxGF9xZFZ7XrW5 ArZ3JF4xu3WrX37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBC14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1Y6r1xM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWxJVW8Jr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0264kExVAvwVAq07x20x yl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWU JVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7V AKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UMIIF0xvE42xK8VAvwI8IcIk0rVW8JVW3JwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42 IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUnLvtDUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" nvme_irq_assert() only does useful work when cq->irq_enabled is true. nvme_irq_deassert() only works for pin-based interrupts. Avoid calls into these functions if we are sure they will not do useful work. This will be most useful when we use eventfd to send interrupts. We can avoid the unnecessary overhead of signalling eventfd. Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..bd3350d7e0 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1377,11 +1377,13 @@ static void nvme_post_cqes(void *opaque) QTAILQ_INSERT_TAIL(&sq->req_list, req, entry); } if (cq->tail != cq->head) { - if (cq->irq_enabled && !pending) { - n->cq_pending++; - } + if (cq->irq_enabled) { + if (!pending) { + n->cq_pending++; + } - nvme_irq_assert(n, cq); + nvme_irq_assert(n, cq); + } } } @@ -4244,12 +4246,11 @@ static void nvme_cq_notifier(EventNotifier *e) nvme_update_cq_head(cq); - if (cq->tail == cq->head) { - if (cq->irq_enabled) { - n->cq_pending--; + if (cq->irq_enabled && cq->tail == cq->head) { + n->cq_pending--; + if (!msix_enabled(&n->parent_obj)) { + nvme_irq_deassert(n, cq); } - - nvme_irq_deassert(n, cq); } nvme_post_cqes(cq); @@ -4730,11 +4731,15 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_QUEUE_DEL; } - if (cq->irq_enabled && cq->tail != cq->head) { - n->cq_pending--; - } + if (cq->irq_enabled) { + if (cq->tail != cq->head) { + n->cq_pending--; + } - nvme_irq_deassert(n, cq); + if (!msix_enabled(&n->parent_obj)) { + nvme_irq_deassert(n, cq); + } + } trace_pci_nvme_del_cq(qid); nvme_free_cq(cq, n); return NVME_SUCCESS; @@ -6918,12 +6923,11 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); } - if (cq->tail == cq->head) { - if (cq->irq_enabled) { - n->cq_pending--; + if (cq->irq_enabled && cq->tail == cq->head) { + n->cq_pending--; + if (!msix_enabled(&n->parent_obj)) { + nvme_irq_deassert(n, cq); } - - nvme_irq_deassert(n, cq); } } else { /* Submission queue doorbell write */ From patchwork Thu Aug 11 15:37:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinhao Fan X-Patchwork-Id: 12941566 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C4E07C19F2A for ; Thu, 11 Aug 2022 15:40:56 +0000 (UTC) Received: from localhost ([::1]:42278 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAIp-000607-Pz for qemu-devel@archiver.kernel.org; Thu, 11 Aug 2022 11:40:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50526) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMAGV-0003F8-29; Thu, 11 Aug 2022 11:38:31 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:49848 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAGR-000286-BN; Thu, 11 Aug 2022 11:38:30 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAA3GFlSIvViGUPiBw--.35837S4; Thu, 11 Aug 2022 23:38:17 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 2/4] hw/nvme: add option to (de)assert irq with eventfd Date: Thu, 11 Aug 2022 23:37:37 +0800 Message-Id: <20220811153739.3079672-3-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> References: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: qwCowAA3GFlSIvViGUPiBw--.35837S4 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr4Duw13Gw15Cry8Xr43Jrb_yoWxJw4Upa ykWrZY9Fs7Kw1xWa1YvFsrZr1ru3yrXrWDArsxt34UJwn3Cry7AFWUGFyUtFWfXrZ5Xry5 Zr4jqF4Uu348JaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBC14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWxJVW8Jr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0264kExVAvwVAq07x20x yl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWU JVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7V AKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42 IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUU8nYDUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When the new option 'irq-eventfd' is turned on, the IO emulation code signals an eventfd when it want to (de)assert an irq. The main loop eventfd handler does the actual irq (de)assertion. This paves the way for iothread support since QEMU's interrupt emulation is not thread safe. Asserting and deasseting irq with eventfd has some performance implications. For small queue depth it increases request latency but for large queue depth it effectively coalesces irqs. Comparision (KIOPS): QD 1 4 16 64 QEMU 38 123 210 329 irq-eventfd 32 106 240 364 Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++-- hw/nvme/nvme.h | 4 +++ 2 files changed, 90 insertions(+), 3 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index bd3350d7e0..8a1c5ce3e1 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -1338,6 +1338,54 @@ static void nvme_update_cq_head(NvmeCQueue *cq) trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head); } +static void nvme_assert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_assert(cq->ctrl, cq); + } +} + +static void nvme_deassert_notifier_read(EventNotifier *e) +{ + NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier); + if (event_notifier_test_and_clear(e)) { + nvme_irq_deassert(cq->ctrl, cq); + } +} + +static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) +{ + int ret; + + ret = event_notifier_init(&cq->assert_notifier, 0); + if (ret < 0) { + goto fail_assert_handler; + } + + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + + if (!msix_enabled(&n->parent_obj)) { + ret = event_notifier_init(&cq->deassert_notifier, 0); + if (ret < 0) { + goto fail_deassert_handler; + } + + event_notifier_set_handler(&cq->deassert_notifier, + nvme_deassert_notifier_read); + } + + return; + +fail_deassert_handler: + event_notifier_set_handler(&cq->deassert_notifier, NULL); + event_notifier_cleanup(&cq->deassert_notifier); +fail_assert_handler: + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); +} + static void nvme_post_cqes(void *opaque) { NvmeCQueue *cq = opaque; @@ -1382,7 +1430,23 @@ static void nvme_post_cqes(void *opaque) n->cq_pending++; } - nvme_irq_assert(n, cq); + if (unlikely(cq->first_io_cqe)) { + /* + * Initilize event notifier when first cqe is posted. For irqfd + * support we need to register the MSI message in KVM. We + * can not do this registration at CQ creation time because + * Linux's NVMe driver changes the MSI message after CQ creation. + */ + cq->first_io_cqe = false; + + nvme_init_irq_notifier(n, cq); + } + + if (cq->assert_notifier.initialized) { + event_notifier_set(&cq->assert_notifier); + } else { + nvme_irq_assert(n, cq); + } } } } @@ -4249,7 +4313,11 @@ static void nvme_cq_notifier(EventNotifier *e) if (cq->irq_enabled && cq->tail == cq->head) { n->cq_pending--; if (!msix_enabled(&n->parent_obj)) { - nvme_irq_deassert(n, cq); + if (cq->deassert_notifier.initialized) { + event_notifier_set(&cq->deassert_notifier); + } else { + nvme_irq_deassert(n, cq); + } } } @@ -4706,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_set_handler(&cq->notifier, NULL); event_notifier_cleanup(&cq->notifier); } + if (cq->assert_notifier.initialized) { + event_notifier_set_handler(&cq->assert_notifier, NULL); + event_notifier_cleanup(&cq->assert_notifier); + } + if (cq->deassert_notifier.initialized) { + event_notifier_set_handler(&cq->deassert_notifier, NULL); + event_notifier_cleanup(&cq->deassert_notifier); + } if (msix_enabled(&n->parent_obj)) { msix_vector_unuse(&n->parent_obj, cq->vector); } @@ -4737,6 +4813,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req) } if (!msix_enabled(&n->parent_obj)) { + /* Do not use eventfd since this is always called in main loop */ nvme_irq_deassert(n, cq); } } @@ -4777,6 +4854,7 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr, } n->cq[cqid] = cq; cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq); + cq->first_io_cqe = cqid != 0; } static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) @@ -6926,7 +7004,11 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) if (cq->irq_enabled && cq->tail == cq->head) { n->cq_pending--; if (!msix_enabled(&n->parent_obj)) { - nvme_irq_deassert(n, cq); + if (cq->deassert_notifier.initialized) { + event_notifier_set(&cq->deassert_notifier); + } else { + nvme_irq_deassert(n, cq); + } } } } else { @@ -7675,6 +7757,7 @@ static Property nvme_props[] = { DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false), DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false), + DEFINE_PROP_BOOL("irq-eventfd", NvmeCtrl, params.irq_eventfd, false), DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0), DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl, params.auto_transition_zones, true), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 79f5c281c2..759d0ecd7c 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -398,6 +398,9 @@ typedef struct NvmeCQueue { uint64_t ei_addr; QEMUTimer *timer; EventNotifier notifier; + EventNotifier assert_notifier; + EventNotifier deassert_notifier; + bool first_io_cqe; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; QTAILQ_HEAD(, NvmeRequest) req_list; @@ -422,6 +425,7 @@ typedef struct NvmeParams { bool auto_transition_zones; bool legacy_cmb; bool ioeventfd; + bool irq_eventfd; uint8_t sriov_max_vfs; uint16_t sriov_vq_flexible; uint16_t sriov_vi_flexible; From patchwork Thu Aug 11 15:37:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinhao Fan X-Patchwork-Id: 12941568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9634EC25B0C for ; Thu, 11 Aug 2022 15:43:10 +0000 (UTC) Received: from localhost ([::1]:47418 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAKz-0001VL-Np for qemu-devel@archiver.kernel.org; Thu, 11 Aug 2022 11:43:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50402) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMAGS-00036x-6u; Thu, 11 Aug 2022 11:38:28 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:49842 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAGP-00027h-0T; Thu, 11 Aug 2022 11:38:27 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAA3GFlSIvViGUPiBw--.35837S5; Thu, 11 Aug 2022 23:38:20 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 3/4] hw/nvme: use irqfd to send interrupts Date: Thu, 11 Aug 2022 23:37:38 +0800 Message-Id: <20220811153739.3079672-4-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> References: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: qwCowAA3GFlSIvViGUPiBw--.35837S5 X-Coremail-Antispam: 1UD129KBjvJXoWxCw43AF17Kw1xJr4kuF1UGFg_yoWrXr48pa 4kGrZ5CF4vy34xWa1avrsrAr1ru3y8tryUJ3ySkry7Arn5Kr9xArW8CF1UtFy8Jr98XFy5 ZrsFqr4Uua45XaUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBC14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1j6r18McIj6I8E87Iv67AKxVWxJVW8Jr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0264kExVAvwVAq07x20x yl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWU JVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7V AKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42 IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUUdb1DUUUU X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Use KVM's irqfd to send interrupts when possible. This approach is thread safe. Moreover, it does not have the inter-thread communication overhead of plain event notifiers since handler callback are called in the same system call as irqfd write. Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++-- hw/nvme/nvme.h | 1 + 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 8a1c5ce3e1..63f988f2f9 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -192,6 +192,7 @@ #include "qapi/error.h" #include "qapi/visitor.h" #include "sysemu/sysemu.h" +#include "sysemu/kvm.h" #include "sysemu/block-backend.h" #include "sysemu/hostmem.h" #include "hw/pci/msix.h" @@ -1354,8 +1355,26 @@ static void nvme_deassert_notifier_read(EventNotifier *e) } } +static int nvme_kvm_msix_vector_use(NvmeCtrl *n, + NvmeCQueue *cq, + uint32_t vector) +{ + int ret; + + KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state); + ret = kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj); + if (ret < 0) { + return ret; + } + kvm_irqchip_commit_route_changes(&c); + cq->virq = ret; + return 0; +} + static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) { + bool with_irqfd = msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); int ret; ret = event_notifier_init(&cq->assert_notifier, 0); @@ -1363,8 +1382,21 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) goto fail_assert_handler; } - event_notifier_set_handler(&cq->assert_notifier, - nvme_assert_notifier_read); + if (with_irqfd) { + ret = nvme_kvm_msix_vector_use(n, cq, cq->vector); + if (ret < 0) { + goto fail_assert_handler; + } + ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, NULL, + cq->virq); + if (ret < 0) { + goto fail_kvm; + } + } else { + event_notifier_set_handler(&cq->assert_notifier, + nvme_assert_notifier_read); + } if (!msix_enabled(&n->parent_obj)) { ret = event_notifier_init(&cq->deassert_notifier, 0); @@ -1381,6 +1413,12 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq) fail_deassert_handler: event_notifier_set_handler(&cq->deassert_notifier, NULL); event_notifier_cleanup(&cq->deassert_notifier); + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &cq->assert_notifier, + cq->virq); +fail_kvm: + kvm_irqchip_release_virq(kvm_state, cq->virq); + } fail_assert_handler: event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); @@ -4764,6 +4802,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) { + bool with_irqfd = msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); uint16_t offset = (cq->cqid << 3) + (1 << 2); n->cq[cq->cqid] = NULL; @@ -4775,6 +4815,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n) event_notifier_cleanup(&cq->notifier); } if (cq->assert_notifier.initialized) { + if (with_irqfd) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + kvm_irqchip_release_virq(kvm_state, cq->virq); + } event_notifier_set_handler(&cq->assert_notifier, NULL); event_notifier_cleanup(&cq->assert_notifier); } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 759d0ecd7c..85fd9cd0e2 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -396,6 +396,7 @@ typedef struct NvmeCQueue { uint64_t dma_addr; uint64_t db_addr; uint64_t ei_addr; + int virq; QEMUTimer *timer; EventNotifier notifier; EventNotifier assert_notifier; From patchwork Thu Aug 11 15:37:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinhao Fan X-Patchwork-Id: 12941569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 71E1EC25B0C for ; Thu, 11 Aug 2022 15:43:18 +0000 (UTC) Received: from localhost ([::1]:47774 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAL7-0001kb-Hy for qemu-devel@archiver.kernel.org; Thu, 11 Aug 2022 11:43:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50552) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMAGV-0003Hj-Oo; Thu, 11 Aug 2022 11:38:31 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21]:49856 helo=cstnet.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMAGR-00028h-UH; Thu, 11 Aug 2022 11:38:31 -0400 Received: from localhost.localdomain (unknown [159.226.43.62]) by APP-01 (Coremail) with SMTP id qwCowAA3GFlSIvViGUPiBw--.35837S6; Thu, 11 Aug 2022 23:38:25 +0800 (CST) From: Jinhao Fan To: qemu-devel@nongnu.org Cc: its@irrelevant.dk, kbusch@kernel.org, stefanha@gmail.com, Jinhao Fan , qemu-block@nongnu.org (open list:nvme) Subject: [PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd Date: Thu, 11 Aug 2022 23:37:39 +0800 Message-Id: <20220811153739.3079672-5-fanjinhao21s@ict.ac.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> References: <20220811153739.3079672-1-fanjinhao21s@ict.ac.cn> MIME-Version: 1.0 X-CM-TRANSID: qwCowAA3GFlSIvViGUPiBw--.35837S6 X-Coremail-Antispam: 1UD129KBjvJXoWxKrWDuw1UCF4rAFyDtr15CFg_yoW7GFW7pa s7XFZ3WFZ7tFWIganIvrsrJr15Z39YqryUJw43Kw1IkayIkr9IvFW8KF15AFy5GFZxXF1Y v3y5tr47WwnxXaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr 1l84ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWxJVW8Jr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0264kExVAvwVAq07 x20xyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8Jw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUnT5lDUUU U X-Originating-IP: [159.226.43.62] X-CM-SenderInfo: xidqyxpqkd0j0rv6xunwoduhdfq/ Received-SPF: pass client-ip=159.226.251.21; envelope-from=fanjinhao21s@ict.ac.cn; helo=cstnet.cn X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to directly assert the irq. However, KVM is not aware of the device's MSI-x masking status. Add MSI-x mask bookkeeping in NVMe emulation and detach the corresponding irqfd when the certain vector is masked. Signed-off-by: Jinhao Fan --- hw/nvme/ctrl.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ hw/nvme/nvme.h | 2 ++ hw/nvme/trace-events | 3 ++ 3 files changed, 87 insertions(+) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 63f988f2f9..ac5460c7c8 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -7478,10 +7478,84 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) return 0; } +static int nvme_vector_unmask(PCIDevice *pci_dev, unsigned vector, + MSIMessage msg) +{ + NvmeCtrl *n = NVME(pci_dev); + int ret; + + trace_pci_nvme_irq_unmask(vector, msg.address, msg.data); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + /* + * If this function is called, then irqfd must be available. Therefore, + * irqfd must be in use if cq->assert_notifier.initialized is true. + */ + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { + if (cq->msg.data != msg.data || cq->msg.address != msg.address) { + ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg, + pci_dev); + if (ret < 0) { + return ret; + } + kvm_irqchip_commit_routes(kvm_state); + cq->msg = msg; + } + + ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + NULL, cq->virq); + if (ret < 0) { + return ret; + } + } + } + + return 0; +} + +static void nvme_vector_mask(PCIDevice *pci_dev, unsigned vector) +{ + NvmeCtrl *n = NVME(pci_dev); + + trace_pci_nvme_irq_mask(vector); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + } +} + +static void nvme_vector_poll(PCIDevice *pci_dev, + unsigned int vector_start, + unsigned int vector_end) +{ + NvmeCtrl *n = NVME(pci_dev); + + trace_pci_nvme_irq_poll(vector_start, vector_end); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + if (cq && cq->vector >= vector_start && cq->vector <= vector_end + && msix_is_masked(pci_dev, cq->vector) + && cq->assert_notifier.initialized) { + if (event_notifier_test_and_clear(&cq->assert_notifier)) { + msix_set_pending(pci_dev, i); + } + } + } +} static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; + bool with_irqfd = msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); uint64_t bar_size; unsigned msix_table_offset, msix_pba_offset; int ret; @@ -7534,6 +7608,13 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) } } + if (with_irqfd) { + msix_set_vector_notifiers(pci_dev, + nvme_vector_unmask, + nvme_vector_mask, + nvme_vector_poll); + } + nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize); if (n->params.cmb_size_mb) { @@ -7781,6 +7862,7 @@ static void nvme_exit(PCIDevice *pci_dev) pcie_sriov_pf_exit(pci_dev); } + msix_unset_vector_notifiers(pci_dev); msix_uninit(pci_dev, &n->bar0, &n->bar0); memory_region_del_subregion(&n->bar0, &n->iomem); } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 85fd9cd0e2..707a55ebfc 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -20,6 +20,7 @@ #include "qemu/uuid.h" #include "hw/pci/pci.h" +#include "hw/pci/msi.h" #include "hw/block/block.h" #include "block/nvme.h" @@ -401,6 +402,7 @@ typedef struct NvmeCQueue { EventNotifier notifier; EventNotifier assert_notifier; EventNotifier deassert_notifier; + MSIMessage msg; bool first_io_cqe; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..b11fcf4a65 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -2,6 +2,9 @@ pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u" pci_nvme_irq_pin(void) "pulsing IRQ pin" pci_nvme_irq_masked(void) "IRQ is masked" +pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked" +pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u gets unmasked, addr=0x%"PRIx64" data=0x%"PRIu32"" +pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, start=0x%"PRIu32" end=0x%"PRIu32"" pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64"" pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64" eis_addr=0x%"PRIx64"" pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""