From patchwork Tue Nov 16 15:34:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?=C5=81ukasz_Gieryk?= X-Patchwork-Id: 12622803 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4D39C43217 for ; Tue, 16 Nov 2021 16:04:56 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8A6FB619E1 for ; Tue, 16 Nov 2021 16:04:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8A6FB619E1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:42114 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mn0x5-0003QJ-GK for qemu-devel@archiver.kernel.org; Tue, 16 Nov 2021 11:04:55 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42892) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mn0lq-000612-8z; Tue, 16 Nov 2021 10:53:19 -0500 Received: from mga04.intel.com ([192.55.52.120]:64391) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mn0lo-0007fn-GU; Tue, 16 Nov 2021 10:53:18 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="232438891" X-IronPort-AV: E=Sophos;i="5.87,239,1631602800"; d="scan'208";a="232438891" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Nov 2021 07:53:14 -0800 X-IronPort-AV: E=Sophos;i="5.87,239,1631602800"; d="scan'208";a="454494453" Received: from lgieryk-lnx.igk.intel.com ([172.22.230.153]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Nov 2021 07:53:12 -0800 From: =?utf-8?q?=C5=81ukasz_Gieryk?= To: qemu-devel@nongnu.org Subject: [PATCH v2 08/15] hw/nvme: Implement the Function Level Reset Date: Tue, 16 Nov 2021 16:34:39 +0100 Message-Id: <20211116153446.317143-9-lukasz.gieryk@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211116153446.317143-1-lukasz.gieryk@linux.intel.com> References: <20211116153446.317143-1-lukasz.gieryk@linux.intel.com> MIME-Version: 1.0 Organization: Intel Technology Poland sp. z o.o. - ul. Slowackiego 173, 80-298 Gdansk - KRS 101882 - NIP 957-07-52-316 Received-SPF: none client-ip=192.55.52.120; envelope-from=lukasz.gieryk@linux.intel.com; helo=mga04.intel.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-block@nongnu.org, =?utf-8?q?=C5=81ukasz_Gieryk?= , Lukasz Maniak , Klaus Jensen , Klaus Jensen , Keith Busch Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This patch implements the Function Level Reset, a feature currently not implemented for the Nvme device, while listed as a mandatory ("shall") in the 1.4 spec. The implementation reuses FLR-related building blocks defined for the pci-bridge module, and follows the same logic: - FLR capability is advertised in the PCIE config, - custom pci_write_config callback detects a write to the trigger register and performs the PCI reset, - which, eventually, calls the custom dc->reset handler. Depending on reset type, parts of the state should (or should not) be cleared. To distinguish the type of reset, an additional parameter is passed to the reset function. This patch also enables advertisement of the Power Management PCI capability. The main reason behind it is to announce the no_soft_reset=1 bit, to signal SR-IOV support where each VF can be reset individually. The implementation purposedly ignores writes to the PMCS.PS register, as even such naïve behavior is enough to correctly handle the D3->D0 transition. It’s worth to note, that the power state transition back to to D3, with all the corresponding side effects, wasn't and stil isn't handled properly. Signed-off-by: Łukasz Gieryk Reviewed-by: Klaus Jensen --- hw/nvme/ctrl.c | 45 ++++++++++++++++++++++++++++++++++++++++---- hw/nvme/nvme.h | 5 +++++ hw/nvme/trace-events | 1 + 3 files changed, 47 insertions(+), 4 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 961161ba8e..da5b630ae3 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -5583,7 +5583,7 @@ static void nvme_process_sq(void *opaque) } } -static void nvme_ctrl_reset(NvmeCtrl *n) +static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst) { NvmeNamespace *ns; int i; @@ -5615,7 +5615,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n) } if (!pci_is_vf(&n->parent_obj) && n->params.sriov_max_vfs) { - pcie_sriov_pf_disable_vfs(&n->parent_obj); + if (rst != NVME_RESET_CONTROLLER) { + pcie_sriov_pf_disable_vfs(&n->parent_obj); + } } n->aer_queued = 0; @@ -5849,7 +5851,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, } } else if (!NVME_CC_EN(data) && NVME_CC_EN(cc)) { trace_pci_nvme_mmio_stopped(); - nvme_ctrl_reset(n); + nvme_ctrl_reset(n, NVME_RESET_CONTROLLER); cc = 0; csts &= ~NVME_CSTS_READY; } @@ -6406,6 +6408,28 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); } +static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) +{ + Error *err = NULL; + int ret; + + ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset, + PCI_PM_SIZEOF, &err); + if (err) { + error_report_err(err); + return ret; + } + + pci_set_word(pci_dev->config + offset + PCI_PM_PMC, + PCI_PM_CAP_VER_1_2); + pci_set_word(pci_dev->config + offset + PCI_PM_CTRL, + PCI_PM_CTRL_NO_SOFT_RESET); + pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL, + PCI_PM_CTRL_STATE_MASK); + + return 0; +} + static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; @@ -6427,7 +6451,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) } pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS); + nvme_add_pm_capability(pci_dev, 0x60); pcie_endpoint_cap_init(pci_dev, 0x80); + pcie_cap_flr_init(pci_dev); if (n->params.sriov_max_vfs) { pcie_ari_init(pci_dev, 0x100, 1); } @@ -6676,7 +6702,7 @@ static void nvme_exit(PCIDevice *pci_dev) NvmeNamespace *ns; int i; - nvme_ctrl_reset(n); + nvme_ctrl_reset(n, NVME_RESET_FUNCTION); if (n->subsys) { for (i = 1; i <= NVME_MAX_NAMESPACES; i++) { @@ -6775,6 +6801,15 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name, } } +static void nvme_pci_reset(DeviceState *qdev) +{ + PCIDevice *pci_dev = PCI_DEVICE(qdev); + NvmeCtrl *n = NVME(pci_dev); + + trace_pci_nvme_pci_reset(); + nvme_ctrl_reset(n, NVME_RESET_FUNCTION); +} + static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address, uint32_t val, int len) { @@ -6808,6 +6843,7 @@ static void nvme_pci_write_config(PCIDevice *dev, uint32_t address, { nvme_sriov_pre_write_ctrl(dev, address, val, len); pci_default_write_config(dev, address, val, len); + pcie_cap_flr_write_config(dev, address, val, len); } static const VMStateDescription nvme_vmstate = { @@ -6830,6 +6866,7 @@ static void nvme_class_init(ObjectClass *oc, void *data) dc->desc = "Non-Volatile Memory Express"; device_class_set_props(dc, nvme_props); dc->vmsd = &nvme_vmstate; + dc->reset = nvme_pci_reset; } static void nvme_instance_init(Object *obj) diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 2157a7b95f..6713493380 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -471,6 +471,11 @@ typedef struct NvmeCtrl { NvmeSecCtrlList sec_ctrl_list; } NvmeCtrl; +typedef enum NvmeResetType { + NVME_RESET_FUNCTION = 0, + NVME_RESET_CONTROLLER = 1, +} NvmeResetType; + static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid) { if (!nsid || nsid > NVME_MAX_NAMESPACES) { diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index dd2aac3418..88678fc21e 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -105,6 +105,7 @@ pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone de pci_nvme_zd_extension_set(uint32_t zone_idx) "set descriptor extension for zone_idx=%"PRIu32"" pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" +pci_nvme_pci_reset(void) "PCI Function Level Reset" # error conditions pci_nvme_err_mdts(size_t len) "len %zu"