From patchwork Mon Oct 19 02:17:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 778D414B7 for ; Mon, 19 Oct 2020 02:19:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15CF521D7B for ; Mon, 19 Oct 2020 02:19:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="AF9DHkob" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15CF521D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKlo-0001MH-Ie for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:19:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56130) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk3-0007yp-9G; Sun, 18 Oct 2020 22:17:43 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44109) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKjy-0004HF-PY; Sun, 18 Oct 2020 22:17:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073858; x=1634609858; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R9OVIQpzNmLf9QYkay6aZV9rxh3JuJYtnbg3r6qw4tQ=; b=AF9DHkobJDnNwTadTNsxVkhyplMhBYRR7MacMppo+xtgqEuhMhKmuk10 d9jPTNz6gHDVkyyHziN2eUJz0enZ+E91UVpLHeKTCfD35xpKmsdPvvNDN ZeILgUuqK0D8bLZPsh7KqPCLKr1stTPLDUJtvIsCW887h0Fl87wbxNupq yKnRbRXj7PhMYhIhaODel/SiO8wz//J6B6kiUr+RTRp+q29I1w3UN5fcn oy9QF13nETBnrIb3wEl5lBEmhUys7paQdr/zPxTSPWRml8bDJCa5hxsMX sBArh8pInAHoAG5zMy4heYCaDihTbnNUjFPSzMvEUxvIE2UCW77JaQQcm Q==; IronPort-SDR: d69uHTodGwmXdQenM/glrn4SHTLe7sfxfRKSe9Y1+6guXEEwV5MkdOziQkmMzq6V0Xvpv7tHWS yVs4Wqqd6QGnrBaZGtmnXJ0PYGdhEIr8x95XIRr2yQdHot/0Q5xWnFZhoo4OGl/FlUYAO8bSEK FW7fFaM8ic7bJE/8F8bHPyVM04cWpkTQEAHb6X3oPA55Fz9tT5JJykvxHmqKCOErGedPZBquWC gIRX+f6yrUUa+acJU8kijxShbfsYzJ8KWnqKWqejhLC3GMSBgm2bFh7Ksf0xDbwAzmQlyd4Tq2 RsI= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207951" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:34 +0800 IronPort-SDR: YM+EYyKQ5mApluZLaHLlswpe/N+CfoNxJ16rii2+A34Ik7TaXCsCLE7vmA4TdOghw6q6zGsq6L PkANedAFvs+szvy3Z0NuiWegf/dbRjr7MVFe/XPfF9EddzxHXEnLezBfhGLwd9uX4yBdVjW/L1 ysYWrnzZ/a6hqfgQ74UgMtwgNmhELS8R1DzAWQw3DwSXbNFgY9eNoI7D64OfQI59/4WJ6TJKfb cDakEeJdI1UzTOSID4oGUZLyiuV/6huER2GiPytce6yu4SiMOaaq88md93DJN4vw6qnlhcS294 euUCuXKBLcC8qXuXVaUYOGCS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:11 -0700 IronPort-SDR: AXArlErxFLnU6w2q3sRDH5uTDc1w+ENHUuc6pfQKisgLPIOc1l6Pa1AuAczURzzz46a4A3XM01 AM+c+OEa5ZI4wCNTE+qOxYnGV3TrouHsDefWXTnXNvzLexk1oyDnx+Mczr3ehrKoHXnKgBFycG lGhYY9OnfuwnZeerD2tFt1u7Xfbx+j+Wiqcp24Fpl7uIW+EvB6Kg2XNQu1fBWepyTFOvZXRCA/ xQDtsfYvt599mW36B0PUq+78Zmk5A/vkayBs8IFmHHaknfjWPMNidbbMNQI8A+wrm2ZRus/s9N NJ0= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:32 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 01/11] hw/block/nvme: Add Commands Supported and Effects log Date: Mon, 19 Oct 2020 11:17:16 +0900 Message-Id: <20201019021726.12048-2-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev Reviewed-by: Keith Busch --- hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 98 +++++++++++++++++++++++++++++++++++++++---- hw/block/trace-events | 2 + include/block/nvme.h | 19 +++++++++ 4 files changed, 111 insertions(+), 9 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 83734f4606..ea8c2f785d 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -29,6 +29,7 @@ typedef struct NvmeNamespace { int32_t bootindex; int64_t size; NvmeIdNs id_ns; + const uint32_t *iocs; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 9d30ca69dc..5a9493d89f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -111,6 +111,28 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = { [NVME_TIMESTAMP] = NVME_FEAT_CAP_CHANGE, }; +static const uint32_t nvme_cse_acs[256] = { + [NVME_ADM_CMD_DELETE_SQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_CREATE_SQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_DELETE_CQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_CREATE_CQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_IDENTIFY] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, +}; + +static const uint32_t nvme_cse_iocs_none[256] = { +}; + +static const uint32_t nvme_cse_iocs_nvm[256] = { + [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -1032,10 +1054,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req), req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode)); - if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) { - return NVME_INVALID_OPCODE | NVME_DNR; - } - if (!nvme_nsid_valid(n, nsid)) { return NVME_INVALID_NSID | NVME_DNR; } @@ -1045,6 +1063,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } + if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { + trace_pci_nvme_err_invalid_opc(req->cmd.opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + switch (req->cmd.opcode) { case NVME_CMD_FLUSH: return nvme_flush(n, req); @@ -1054,8 +1077,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_READ: return nvme_rw(n, req); default: - trace_pci_nvme_err_invalid_opc(req->cmd.opcode); - return NVME_INVALID_OPCODE | NVME_DNR; + assert(false); } } @@ -1291,6 +1313,39 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, + uint64_t off, NvmeRequest *req) +{ + NvmeEffectsLog log = {}; + const uint32_t *src_iocs = NULL; + uint32_t trans_len; + + trace_pci_nvme_cmd_supp_and_effects_log_read(); + + if (off >= sizeof(log)) { + trace_pci_nvme_err_invalid_effects_log_offset(off); + return NVME_INVALID_FIELD | NVME_DNR; + } + + switch (NVME_CC_CSS(n->bar.cc)) { + case NVME_CC_CSS_NVM: + src_iocs = nvme_cse_iocs_nvm; + case NVME_CC_CSS_ADMIN_ONLY: + break; + } + + memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs)); + + if (src_iocs) { + memcpy(log.iocs, src_iocs, sizeof(log.iocs)); + } + + trans_len = MIN(sizeof(log) - off, buf_len); + + return nvme_dma(n, ((uint8_t *)&log) + off, trans_len, + DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = &req->cmd; @@ -1334,6 +1389,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) return nvme_smart_info(n, rae, len, off, req); case NVME_LOG_FW_SLOT_INFO: return nvme_fw_log_info(n, len, off, req); + case NVME_LOG_CMD_EFFECTS: + return nvme_cmd_effects(n, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -1920,6 +1977,11 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode, nvme_adm_opc_str(req->cmd.opcode)); + if (!(nvme_cse_acs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { + trace_pci_nvme_err_invalid_admin_opc(req->cmd.opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + switch (req->cmd.opcode) { case NVME_ADM_CMD_DELETE_SQ: return nvme_del_sq(n, req); @@ -1942,8 +2004,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); default: - trace_pci_nvme_err_invalid_admin_opc(req->cmd.opcode); - return NVME_INVALID_OPCODE | NVME_DNR; + assert(false); } } @@ -2031,6 +2092,23 @@ static void nvme_clear_ctrl(NvmeCtrl *n) n->bar.cc = 0; } +static void nvme_select_ns_iocs(NvmeCtrl *n) +{ + NvmeNamespace *ns; + int i; + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + ns->iocs = nvme_cse_iocs_none; + if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { + ns->iocs = nvme_cse_iocs_nvm; + } + } +} + static int nvme_start_ctrl(NvmeCtrl *n) { uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12; @@ -2129,6 +2207,8 @@ static int nvme_start_ctrl(NvmeCtrl *n) QTAILQ_INIT(&n->aer_queue); + nvme_select_ns_iocs(n); + return 0; } @@ -2737,7 +2817,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->acl = 3; id->aerl = n->params.aerl; id->frmw = (NVME_NUM_FW_SLOTS << 1) | NVME_FRMW_SLOT1_RO; - id->lpa = NVME_LPA_NS_SMART | NVME_LPA_EXTENDED; + id->lpa = NVME_LPA_NS_SMART | NVME_LPA_CSE | NVME_LPA_EXTENDED; /* recommended default value (~70 C) */ id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING); diff --git a/hw/block/trace-events b/hw/block/trace-events index fac5995d94..0ae9cb0d35 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -85,6 +85,7 @@ pci_nvme_mmio_start_success(void) "setting controller enable bit succeeded" pci_nvme_mmio_stopped(void) "cleared controller enable bit" pci_nvme_mmio_shutdown_set(void) "shutdown bit set" pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared" +pci_nvme_cmd_supp_and_effects_log_read(void) "commands supported and effects log read" # nvme traces for error conditions pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu" @@ -104,6 +105,7 @@ pci_nvme_err_invalid_prp(void) "invalid PRP" pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" +pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" pci_nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 6de2d5aa75..4779495b7d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -744,10 +744,27 @@ enum NvmeSmartWarn { NVME_SMART_FAILED_VOLATILE_MEDIA = 1 << 4, }; +typedef struct NvmeEffectsLog { + uint32_t acs[256]; + uint32_t iocs[256]; + uint8_t resv[2048]; +} NvmeEffectsLog; + +enum { + NVME_CMD_EFF_CSUPP = 1 << 0, + NVME_CMD_EFF_LBCC = 1 << 1, + NVME_CMD_EFF_NCC = 1 << 2, + NVME_CMD_EFF_NIC = 1 << 3, + NVME_CMD_EFF_CCC = 1 << 4, + NVME_CMD_EFF_CSE_MASK = 3 << 16, + NVME_CMD_EFF_UUID_SEL = 1 << 19, +}; + enum NvmeLogIdentifier { NVME_LOG_ERROR_INFO = 0x01, NVME_LOG_SMART_INFO = 0x02, NVME_LOG_FW_SLOT_INFO = 0x03, + NVME_LOG_CMD_EFFECTS = 0x05, }; typedef struct QEMU_PACKED NvmePSD { @@ -860,6 +877,7 @@ enum NvmeIdCtrlFrmw { enum NvmeIdCtrlLpa { NVME_LPA_NS_SMART = 1 << 0, + NVME_LPA_CSE = 1 << 1, NVME_LPA_EXTENDED = 1 << 2, }; @@ -1059,6 +1077,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); From patchwork Mon Oct 19 02:17:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 839FF1744 for ; Mon, 19 Oct 2020 02:19:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2AAD6221FC for ; Mon, 19 Oct 2020 02:19:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="T23FQopO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AAD6221FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36592 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKlp-0001Nb-0f for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:19:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56128) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk3-0007yk-2H; Sun, 18 Oct 2020 22:17:43 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44111) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk0-0004HN-Qs; Sun, 18 Oct 2020 22:17:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073860; x=1634609860; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SMG2Ngfk2iZA+zlHITm2D7Ljs/qojB/EptGd2Tv57Vk=; b=T23FQopODkMV2FrdB0cKCl/dUpWzkBnJA6l7z7kQVqVcrjh39ZPKFhyb C0Bv35uZCv5y44UD6YpH6HM9HN18ssVOSCW3L+0OahYF9TqCSGKznL2rY g6aWK23s28jc/T875j8MBVaKM0nUu7JtRI4LyXh6ogzsq9YBzw83rd3p0 FJwcQ9lIBqy3PiC0Xo0swHGNaMRRiRHlsPgKhB5ZYxasusGFrgOtoQHil H9rW2PDg1uyW/eEKe7GI0t/BHaaaU1OWfGHQUH8WfZN1q5oonkFZFrJyS 5voV+GJv0jDe45uzyxMh0vQljAJlUAt6UG+V/MgeZHKbTafAS3XNc6Bo1 A==; IronPort-SDR: Q4jg9hl2b32M4XjpU6F3D23oorXeAfw421It+Md3aYXRje1LYqE+zZgGGevD7MPCMvxFrN8L8Q kEFucQ1F+7lc797flUdLSqGBdBkvubUNvk309sPdRsDqHFNnf8qayLtDnDVg39kguRcyxA4R3h bWGzx7syk+eAcCyiRgpar7zN2X8Jmv6zapGF7AwOyvoXUkfjtDUQkckSbs8U9z43X/h3BWvdwP EUdq6ECZF3cP1JH3lwb9rJ2NiG1vi8WYLJn7/O0i7472lNn4XMNQzek0oYhQQdpOoNuynZNbfO TLI= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207953" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:36 +0800 IronPort-SDR: vJvuw3oZ+dksJGtfGVP72pkxbm+xCWlReSwjfQimDuGz2kPDGF09cAP2kTJb8U9W9A6OVdgbd4 ITrbXZbENE9qx4f+4zQPKZ+1WYzW7NeY0YM24FpK4lykHMmfqnCcu8X6IMXXhGCIpNTM4/TcZs DHcgKs2I767uVFl8QhDScILkpzkcOSLyCp85xrLtUXo7ZKG2WX9kJ+sWSPO4Hw2/zDuatFAqSP gyMDJLnmo5wmpFWoVCFasoOyxbajdJHq6xULyeFuxRC4VX1CcbdpA7TKSf3jPbmpkEM9fLtdeK jpnezXoJldI5eO1W2WQuAw6V Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:13 -0700 IronPort-SDR: 2nyalcMwhW2hQEqyL+YPE1vJgULz0IcsCV5RvqYQTSYxX2QofLg6WJ3V+dFndtXWDqmWa1NqnK +pO+SxwB8QjEdYegEI7UhmVrVrw2de8KrpareQqZ3ZdPN43PkKhW6FuhbubaKvzKXfxUVAEAZT Z6j5wrnc51VwmHzm3V3KhNOge2+s4ao6+snXDaiPxTf/SL09DhdZj0Ac85iewJSU9EZ5s7+HGZ +616iSO/OByHd1x1iquu/aSh3fCZFK42f54oUbEVj8qbYQGB0ID6Yk5Cjv8K8KvpCyzfjM4+sW hE4= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:35 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 02/11] hw/block/nvme: Generate namespace UUIDs Date: Mon, 19 Oct 2020 11:17:17 +0900 Message-Id: <20201019021726.12048-3-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" In NVMe 1.4, a namespace must report an ID descriptor of UUID type if it doesn't support EUI64 or NGUID. Add a new namespace property, "uuid", that provides the user the option to either specify the UUID explicitly or have a UUID generated automatically every time a namespace is initialized. Suggested-by: Klaus Jansen Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jansen Reviewed-by: Keith Busch --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 9 +++++---- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index b69cdaf27e..de735eb9f3 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), + DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index ea8c2f785d..a38071884a 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + QemuUUID uuid; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5a9493d89f..29139d8a17 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1574,6 +1574,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; uint32_t nsid = le32_to_cpu(c->nsid); uint8_t list[NVME_IDENTIFY_DATA_SIZE]; @@ -1593,7 +1594,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - if (unlikely(!nvme_ns(n, nsid))) { + ns = nvme_ns(n, nsid); + if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1602,12 +1604,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) /* * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data * structure, a Namespace UUID (nidt = 0x3) must be reported in the - * Namespace Identification Descriptor. Add a very basic Namespace UUID - * here. + * Namespace Identification Descriptor. Add the namespace UUID here. */ ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID; ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; - stl_be_p(&ns_descrs->uuid.v, nsid); + memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN); return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, DMA_DIRECTION_FROM_DEVICE, req); From patchwork Mon Oct 19 02:17:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 268CA1580 for ; Mon, 19 Oct 2020 02:21:33 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF77421D7B for ; Mon, 19 Oct 2020 02:21:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="H+ku6nnr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BF77421D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:45460 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKnj-00051J-PR for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:21:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56158) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk6-00083j-8j; Sun, 18 Oct 2020 22:17:46 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44111) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk3-0004HN-GE; Sun, 18 Oct 2020 22:17:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073863; x=1634609863; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fPfcEW06vVgXkUctCfUgb4oMvaL9tcNraAXsKzrHh3s=; b=H+ku6nnrXrQOJxljnDo4zMAiakvb+f7b2o4Buwds1rc44nM02KSwlfh1 lf5zZjM//ozhQy3Ih8kW+LACQMbyQgc1eVAuEyqwGiqkhb0L332Yr1Wvs dB6/LG8gw7YOlEP4GiRnkzX2p9wJvPRNl2uAp2HpD3AJq3gXvdcupY9J3 UkKR6sF84fj3QQIML194IG7wYXE991YY4HC3qzfoSQeTmYVuxy0WyazFB GgCEURYfnRIl63TqfPq5f0f7cCjBDnT27VUXX/r/2piXTjAsG0+v+yEUy YTtPjZ88+rPWUBd7RcCnlP+pFoMJ4HINZjC4mewi/6kQaQTMNoaB9Y+Tp A==; IronPort-SDR: 4UA9gMzx3mNFovAoarvd0iSgZSz1lHacBglr+H3X3j2gAHgnoCoY0pg5jXIs+W+9k8THJavySs cERiwNo/3jOGRxzY0it438kpPyrtYvxtXPZuQBeUxOO8zawPMTbiDaDFbX3HHeVyIB7mVPZAIZ ycpe6Cl62lSs7IR9lSJuzpmKtwvGhCRvGUQ2p9cG65lklz3tOQK7RXhEFLV+dBTtYlNE4K6o0k UoAR/JXoDft0QLsYiqLYlD7IruEMOm+SjoilXlvmZqPH0Rtv1ge1Soao+4a4baXpzxz8LGYqnN oDQ= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207956" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:39 +0800 IronPort-SDR: 2kaq9ten82H8IYcyfKD2lsVxp9C+B5lfFujLkBI6MIaUGKG4kpoAjAXD45xM0kDaG4+U8p7HD7 ONZSV3pEGU0ni+vkryhOklGjVlT9YXrVeANbJKqqcTDrsIllPutGidsNadX9AloShDa1AzHJ3W EewbegBzcA4QSPAGCUlawAfbAmj5Cf5E/hJyOfxg9WkOQX+HFH1iZjQnYHS90DsPHFQKMLTygG UmeMjkwru2NteQ9igWBl6YzhC3s5o0m5vRCLMyjFqJVIL4h6/6v4IdLWGLewuoxOCdIFohQzbv tsbVNdZNDZiLuuHP9umgEYvR Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:15 -0700 IronPort-SDR: /QWKx+s3oZzqb4keG7+gU8SBWjFqe6eJ2sRPNGOVpaNnrNMQ8jEoUGB21M1GQc5bV+gZbvVlua kRYe3A2cBOYJYrnHXjl2AiMBVia8UEtWcipnEQc9ZFKh6GRSxqHNvR9Sm3NRmk2a1466sw4N8g m7rrju0jQllGYinaPidprxgPVme/JqomRw+Ylw9lzX26+fkGn12utJbIavRlxcpI1QMsX+r0Ba zXm4fjzzcpYXAVeYvsee5OPXxLDz+4OXzSxZBdOMSEmHnsEEQXoMpYgiJiK7HqOuJJrMGb7pM3 c1k= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:37 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 03/11] hw/block/nvme: Add support for Namespace Types Date: Mon, 19 Oct 2020 11:17:18 +0900 Message-Id: <20201019021726.12048-4-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" From: Niklas Cassel Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 2 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 169 +++++++++++++++++++++++++++++++++++------- hw/block/trace-events | 7 ++ include/block/nvme.h | 65 ++++++++++++---- 5 files changed, 202 insertions(+), 42 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index de735eb9f3..c0362426cc 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -41,6 +41,8 @@ static void nvme_ns_init(NvmeNamespace *ns) id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); + ns->csi = NVME_CSI_NVM; + /* no thin provisioning */ id_ns->ncap = id_ns->nsze; id_ns->nuse = id_ns->ncap; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index a38071884a..d795e44bab 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,7 @@ typedef struct NvmeNamespace { int64_t size; NvmeIdNs id_ns; const uint32_t *iocs; + uint8_t csi; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 29139d8a17..ca0d0abf5c 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1503,6 +1503,13 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) return NVME_SUCCESS; } +static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req) +{ + uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {}; + + return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_identify_ctrl(); @@ -1511,11 +1518,23 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + + trace_pci_nvme_identify_ctrl_csi(c->csi); + + if (c->csi == NVME_CSI_NVM) { + return nvme_rpt_empty_id_struct(n, req); + } + + return NVME_INVALID_FIELD | NVME_DNR; +} + static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - NvmeIdNs *id_ns, inactive = { 0 }; uint32_t nsid = le32_to_cpu(c->nsid); trace_pci_nvme_identify_ns(nsid); @@ -1526,23 +1545,46 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { - id_ns = &inactive; - } else { - id_ns = &ns->id_ns; + return nvme_rpt_empty_id_struct(n, req); } - return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs), + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeNamespace *ns; + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + uint32_t nsid = le32_to_cpu(c->nsid); + + trace_pci_nvme_identify_ns_csi(nsid, c->csi); + + if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) { + return NVME_INVALID_NSID | NVME_DNR; + } + + ns = nvme_ns(n, nsid); + if (unlikely(!ns)) { + return nvme_rpt_empty_id_struct(n, req); + } + + if (c->csi == NVME_CSI_NVM) { + return nvme_rpt_empty_id_struct(n, req); + } + + return NVME_INVALID_FIELD | NVME_DNR; +} + static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - static const int data_len = NVME_IDENTIFY_DATA_SIZE; uint32_t min_nsid = le32_to_cpu(c->nsid); - uint32_t *list; - uint16_t ret; - int j = 0; + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + uint32_t *list_ptr = (uint32_t *)list; + int i, j = 0; trace_pci_nvme_identify_nslist(min_nsid); @@ -1556,20 +1598,54 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - list = g_malloc0(data_len); - for (int i = 1; i <= n->num_namespaces; i++) { - if (i <= min_nsid || !nvme_ns(n, i)) { + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { continue; } - list[j++] = cpu_to_le32(i); + if (ns->params.nsid < min_nsid) { + continue; + } + list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; } } - ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE, - req); - g_free(list); - return ret; + + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); +} + +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeNamespace *ns; + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + uint32_t min_nsid = le32_to_cpu(c->nsid); + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + uint32_t *list_ptr = (uint32_t *)list; + int i, j = 0; + + trace_pci_nvme_identify_nslist_csi(min_nsid, c->csi); + + if (c->csi != NVME_CSI_NVM) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + if (ns->params.nsid < min_nsid) { + continue; + } + list_ptr[j++] = cpu_to_le32(ns->params.nsid); + if (j == data_len / sizeof(uint32_t)) { + break; + } + } + + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) @@ -1577,13 +1653,17 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; uint32_t nsid = le32_to_cpu(c->nsid); - uint8_t list[NVME_IDENTIFY_DATA_SIZE]; + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; struct data { struct { NvmeIdNsDescr hdr; - uint8_t v[16]; + uint8_t v[NVME_NIDL_UUID]; } uuid; + struct { + NvmeIdNsDescr hdr; + uint8_t v; + } csi; }; struct data *ns_descrs = (struct data *)list; @@ -1599,19 +1679,31 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - memset(list, 0x0, sizeof(list)); - /* * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data * structure, a Namespace UUID (nidt = 0x3) must be reported in the * Namespace Identification Descriptor. Add the namespace UUID here. */ ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID; - ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; - memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN); + ns_descrs->uuid.hdr.nidl = NVME_NIDL_UUID; + memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDL_UUID); - return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, - DMA_DIRECTION_FROM_DEVICE, req); + ns_descrs->csi.hdr.nidt = NVME_NIDT_CSI; + ns_descrs->csi.hdr.nidl = NVME_NIDL_CSI; + ns_descrs->csi.v = ns->csi; + + return nvme_dma(n, list, sizeof(list), DMA_DIRECTION_FROM_DEVICE, req); +} + +static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req) +{ + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + + trace_pci_nvme_identify_cmd_set(); + + NVME_SET_CSI(*list, NVME_CSI_NVM); + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) @@ -1621,12 +1713,20 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) switch (le32_to_cpu(c->cns)) { case NVME_ID_CNS_NS: return nvme_identify_ns(n, req); + case NVME_ID_CNS_CS_NS: + return nvme_identify_ns_csi(n, req); case NVME_ID_CNS_CTRL: return nvme_identify_ctrl(n, req); + case NVME_ID_CNS_CS_CTRL: + return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: return nvme_identify_nslist(n, req); + case NVME_ID_CNS_CS_NS_ACTIVE_LIST: + return nvme_identify_nslist_csi(n, req); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); + case NVME_ID_CNS_IO_COMMAND_SET: + return nvme_identify_cmd_set(n, req); default: trace_pci_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns)); return NVME_INVALID_FIELD | NVME_DNR; @@ -1807,7 +1907,9 @@ defaults: if (iv == n->admin_cq.vector) { result |= NVME_INTVC_NOCOALESCING; } - + break; + case NVME_COMMAND_SET_PROFILE: + result = 0; break; default: result = nvme_feature_default[fid]; @@ -1948,6 +2050,12 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) break; case NVME_TIMESTAMP: return nvme_set_feature_timestamp(n, req); + case NVME_COMMAND_SET_PROFILE: + if (dw11 & 0x1ff) { + trace_pci_nvme_err_invalid_iocsci(dw11 & 0x1ff); + return NVME_CMD_SET_CMB_REJECTED | NVME_DNR; + } + break; default: return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR; } @@ -2104,8 +2212,12 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) continue; } ns->iocs = nvme_cse_iocs_none; - if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { - ns->iocs = nvme_cse_iocs_nvm; + switch (ns->csi) { + case NVME_CSI_NVM: + if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { + ns->iocs = nvme_cse_iocs_nvm; + } + break; } } } @@ -2847,6 +2959,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM); + NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_CSI_SUPP); NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_ADMIN_ONLY); NVME_CAP_SET_MPSMAX(n->bar.cap, 4); diff --git a/hw/block/trace-events b/hw/block/trace-events index 0ae9cb0d35..65b964c894 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -48,8 +48,12 @@ pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16"" pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16"" pci_nvme_identify_ctrl(void) "identify controller" +pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8"" pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", csi=0x%"PRIx8"" pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", csi=0x%"PRIx8"" +pci_nvme_identify_cmd_set(void) "identify i/o command set" pci_nvme_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32"" pci_nvme_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64"" pci_nvme_getfeat(uint16_t cid, uint32_t nsid, uint8_t fid, uint8_t sel, uint32_t cdw11) "cid %"PRIu16" nsid 0x%"PRIx32" fid 0x%"PRIx8" sel 0x%"PRIx8" cdw11 0x%"PRIx32"" @@ -106,6 +110,8 @@ pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" +pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" +pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" pci_nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16"" @@ -162,6 +168,7 @@ pci_nvme_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for pci_nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring" pci_nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring" pci_nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring" +pci_nvme_ub_unknown_css_value(void) "unknown value in cc.css field" # xen-block.c xen_block_realize(const char *type, uint32_t disk, uint32_t partition) "%s d%up%u" diff --git a/include/block/nvme.h b/include/block/nvme.h index 4779495b7d..f5ac9143c4 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -84,6 +84,7 @@ enum NvmeCapMask { enum NvmeCapCss { NVME_CAP_CSS_NVM = 1 << 0, + NVME_CAP_CSS_CSI_SUPP = 1 << 6, NVME_CAP_CSS_ADMIN_ONLY = 1 << 7, }; @@ -117,9 +118,25 @@ enum NvmeCcMask { enum NvmeCcCss { NVME_CC_CSS_NVM = 0x0, + NVME_CC_CSS_CSI = 0x6, NVME_CC_CSS_ADMIN_ONLY = 0x7, }; +#define NVME_SET_CC_EN(cc, val) \ + (cc |= (uint32_t)((val) & CC_EN_MASK) << CC_EN_SHIFT) +#define NVME_SET_CC_CSS(cc, val) \ + (cc |= (uint32_t)((val) & CC_CSS_MASK) << CC_CSS_SHIFT) +#define NVME_SET_CC_MPS(cc, val) \ + (cc |= (uint32_t)((val) & CC_MPS_MASK) << CC_MPS_SHIFT) +#define NVME_SET_CC_AMS(cc, val) \ + (cc |= (uint32_t)((val) & CC_AMS_MASK) << CC_AMS_SHIFT) +#define NVME_SET_CC_SHN(cc, val) \ + (cc |= (uint32_t)((val) & CC_SHN_MASK) << CC_SHN_SHIFT) +#define NVME_SET_CC_IOSQES(cc, val) \ + (cc |= (uint32_t)((val) & CC_IOSQES_MASK) << CC_IOSQES_SHIFT) +#define NVME_SET_CC_IOCQES(cc, val) \ + (cc |= (uint32_t)((val) & CC_IOCQES_MASK) << CC_IOCQES_SHIFT) + enum NvmeCstsShift { CSTS_RDY_SHIFT = 0, CSTS_CFS_SHIFT = 1, @@ -534,8 +551,13 @@ typedef struct QEMU_PACKED NvmeIdentify { uint64_t rsvd2[2]; uint64_t prp1; uint64_t prp2; - uint32_t cns; - uint32_t rsvd11[5]; + uint8_t cns; + uint8_t rsvd10; + uint16_t ctrlid; + uint16_t nvmsetid; + uint8_t rsvd11; + uint8_t csi; + uint32_t rsvd12[4]; } NvmeIdentify; typedef struct QEMU_PACKED NvmeRwCmd { @@ -655,6 +677,7 @@ enum NvmeStatusCodes { NVME_MD_SGL_LEN_INVALID = 0x0010, NVME_SGL_DESCR_TYPE_INVALID = 0x0011, NVME_INVALID_USE_OF_CMB = 0x0012, + NVME_CMD_SET_CMB_REJECTED = 0x002b, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -781,11 +804,15 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 -enum { - NVME_ID_CNS_NS = 0x0, - NVME_ID_CNS_CTRL = 0x1, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x2, - NVME_ID_CNS_NS_DESCR_LIST = 0x3, +enum NvmeIdCns { + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_CS_NS = 0x05, + NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, + NVME_ID_CNS_IO_COMMAND_SET = 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { @@ -933,6 +960,7 @@ enum NvmeFeatureIds { NVME_WRITE_ATOMICITY = 0xa, NVME_ASYNCHRONOUS_EVENT_CONF = 0xb, NVME_TIMESTAMP = 0xe, + NVME_COMMAND_SET_PROFILE = 0x19, NVME_SOFTWARE_PROGRESS_MARKER = 0x80, NVME_FID_MAX = 0x100, }; @@ -1017,18 +1045,26 @@ typedef struct QEMU_PACKED NvmeIdNsDescr { uint8_t rsvd2[2]; } NvmeIdNsDescr; -enum { - NVME_NIDT_EUI64_LEN = 8, - NVME_NIDT_NGUID_LEN = 16, - NVME_NIDT_UUID_LEN = 16, +enum NvmeNsIdentifierLength { + NVME_NIDL_EUI64 = 8, + NVME_NIDL_NGUID = 16, + NVME_NIDL_UUID = 16, + NVME_NIDL_CSI = 1, }; enum NvmeNsIdentifierType { - NVME_NIDT_EUI64 = 0x1, - NVME_NIDT_NGUID = 0x2, - NVME_NIDT_UUID = 0x3, + NVME_NIDT_EUI64 = 0x01, + NVME_NIDT_NGUID = 0x02, + NVME_NIDT_UUID = 0x03, + NVME_NIDT_CSI = 0x04, }; +enum NvmeCsi { + NVME_CSI_NVM = 0x00, +}; + +#define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat) ((dlfeat) & 0x08) @@ -1079,6 +1115,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); From patchwork Mon Oct 19 02:17:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E16514B7 for ; Mon, 19 Oct 2020 02:19:50 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1ABB21D7B for ; Mon, 19 Oct 2020 02:19:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="eQKZ1IUQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1ABB21D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:37820 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKm4-0001rd-Jm for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:19:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56202) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk9-0008BO-6r; Sun, 18 Oct 2020 22:17:49 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44109) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk4-0004HF-3f; Sun, 18 Oct 2020 22:17:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073864; x=1634609864; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RZDQA7e6KniA9T/iOCXCMWEhuU3w4q61BBDAMvjGMtQ=; b=eQKZ1IUQmh2GqqpCqGSoXrGMwfscR2gTUhT5/zM+ZptteH08zrX0WRKx e0B5B8UHPAXTJefm5Rs4FjcOp9yW+DxyUfq72h0R0H9oE/RHZrzDbBy/C S204syweVWpwtVL1jtb7Xm6yNKRiN7sDTKJu7q7v39pK4ITLXpXtYsN8D EyibIW7nA1svfmOW+7cajRcw/hUm7tXUFZzxxfwxr9hLHUIot6nXOi6jm j7/3Pw3Z4KvaN69iVdFc8OFcJR82QXs+xniIbOENycQjb1ljN+n4wytAJ yvHjSkp5wB4Rrh0sWlj5h2NjVETuqTN3ncCo0UHWp3C9ptrvRp6OpnE5J Q==; IronPort-SDR: f30/2/lDyFiMSXhZIdea3pDz2fZzhtw6mVTOh87CGWSPYJTxJr9pGr2o4f59iJbPuHjGe5JsRF vVHaSV9uqe3qSSpZYdBNgzsEpNAwunLiTMFtdWvSIaSGa3r1BSGecUksq8xtYivvKcMjR266/9 LHxkaJupxC1mmKoUTrjirL72TvH5XexM6J7hf1SN8Uv58ZjQOAyI9tZJ1rFm+agdqrt6Nc8hK4 o2vfkaHMzseA4UlIk4W2W5GWfSLUfrjo7WjNO2SlXi6eHTgf3DrLdUHhEybznKwqAcULvBtScn TH8= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207959" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:41 +0800 IronPort-SDR: rvXkC2jFw5s42vE9Af79MrFvH5ZDQ6uAFN9nDEtwycQs09pdE7WXI8oEb9URBhznYdM4Tr9q/E 4t3YVGY9YNrgMzmPFDyi373pNgp4lLjnoEj72xHy7JVkDk2QeRxjZTiO8kF0UCruRDJ22V2Nb0 b+U9cQFMcpTTKMPf3fxAh6jzE0JQXLP5Wj5NTlQcsWhKPNlGBU5HMG85WR1RMjokyby6Hj+btZ d9IYenVdBVNiw5YOdpWaUZ9rkwTnLx+Vw4icqT3c4xQ/INgXPcXR2PLrsu7qzESVX5AZ4XAYyE ypCe3FJZSawb8tWXHsnZ+cag Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:17 -0700 IronPort-SDR: OBPIZGCIGXFRU/9++ZqNodIEPvQ6Du16CDiW9XP3NfwV/AhqDVNFgZ+2lxugjxfnUFjeFO7KYC jAZxbOb24QvzCsBvrgqmGe09s04lmit5P3Uh6xd8/wsAHhlqPTDBi8vUwBw4QEAldR3wWVH1dX oaMDPcUiXibaJ1h2EHYdG0ELH+vL6eS261siOKBdNQMNsbHodSsUCvklGOUYtuk1nfp2lbSQuc vLun6Fu0BqgJrCRPVqiu0on0ESSZ+Nw4I/ruG1B+mFackb1OIAFQ0DLMSGoL1Az63zT7nefbVS oig= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:39 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 04/11] hw/block/nvme: Support allocated CNS command variants Date: Mon, 19 Oct 2020 11:17:19 +0900 Message-Id: <20201019021726.12048-5-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" From: Niklas Cassel Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. In NVMe, a namespace is active if it exists and is attached to the controller. CAP.CSS (together with the I/O Command Set data structure) defines what command sets are supported by the controller. CC.CSS (together with Set Profile) can be set to enable a subset of the available command sets. Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces will still be attached (and thus marked as active). Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces will still be attached (and thus marked as active). However, any operation from a disabled command set will result in a Invalid Command Opcode. Add a new Boolean namespace property, "attached", to provide the most basic namespace attachment support. The default value for this new property is true. Also, implement the logic in the new CNS values to include/exclude namespaces based on this new property. The only thing missing is hooking up the actual Namespace Attachment command opcode, which will allow a user to toggle the "attached" flag per namespace. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev Reviewed-by: Keith Busch --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 68 ++++++++++++++++++++++++++++++++++++-------- include/block/nvme.h | 20 +++++++------ 4 files changed, 70 insertions(+), 20 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c0362426cc..974aea33f7 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -132,6 +132,7 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), + DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d795e44bab..d6b2808b97 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + bool attached; QemuUUID uuid; } NvmeNamespaceParams; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index ca0d0abf5c..93728e51b3 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1062,6 +1062,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) if (unlikely(!req->ns)) { return NVME_INVALID_FIELD | NVME_DNR; } + if (!req->ns->params.attached) { + return NVME_INVALID_FIELD | NVME_DNR; + } if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { trace_pci_nvme_err_invalid_opc(req->cmd.opcode); @@ -1222,6 +1225,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, uint32_t trans_len; NvmeNamespace *ns; time_t current_ms; + int i; if (off >= sizeof(smart)) { return NVME_INVALID_FIELD | NVME_DNR; @@ -1232,15 +1236,18 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, if (!ns) { return NVME_INVALID_NSID | NVME_DNR; } - nvme_set_blk_stats(ns, &stats); + if (ns->params.attached) { + nvme_set_blk_stats(ns, &stats); + } } else { - int i; - for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { continue; } + if (!ns->params.attached) { + continue; + } nvme_set_blk_stats(ns, &stats); } } @@ -1531,7 +1538,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1548,11 +1556,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->params.attached) { + return nvme_rpt_empty_id_struct(n, req); + } + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1569,6 +1582,10 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->params.attached) { + return nvme_rpt_empty_id_struct(n, req); + } + if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); } @@ -1576,7 +1593,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1606,6 +1624,9 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid < min_nsid) { continue; } + if (only_active && !ns->params.attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1615,7 +1636,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1639,6 +1661,9 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid < min_nsid) { continue; } + if (only_active && !ns->params.attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1712,17 +1737,25 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) switch (le32_to_cpu(c->cns)) { case NVME_ID_CNS_NS: - return nvme_identify_ns(n, req); + return nvme_identify_ns(n, req, true); case NVME_ID_CNS_CS_NS: - return nvme_identify_ns_csi(n, req); + return nvme_identify_ns_csi(n, req, true); + case NVME_ID_CNS_NS_PRESENT: + return nvme_identify_ns(n, req, false); + case NVME_ID_CNS_CS_NS_PRESENT: + return nvme_identify_ns_csi(n, req, false); case NVME_ID_CNS_CTRL: return nvme_identify_ctrl(n, req); case NVME_ID_CNS_CS_CTRL: return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: - return nvme_identify_nslist(n, req); + return nvme_identify_nslist(n, req, true); case NVME_ID_CNS_CS_NS_ACTIVE_LIST: - return nvme_identify_nslist_csi(n, req); + return nvme_identify_nslist_csi(n, req, true); + case NVME_ID_CNS_NS_PRESENT_LIST: + return nvme_identify_nslist(n, req, false); + case NVME_ID_CNS_CS_NS_PRESENT_LIST: + return nvme_identify_nslist_csi(n, req, false); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); case NVME_ID_CNS_IO_COMMAND_SET: @@ -1795,6 +1828,7 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeCmd *cmd = &req->cmd; uint32_t dw10 = le32_to_cpu(cmd->cdw10); uint32_t dw11 = le32_to_cpu(cmd->cdw11); @@ -1826,7 +1860,11 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - if (!nvme_ns(n, nsid)) { + ns = nvme_ns(n, nsid); + if (!ns) { + return NVME_INVALID_FIELD | NVME_DNR; + } + if (!ns->params.attached) { return NVME_INVALID_FIELD | NVME_DNR; } } @@ -1968,6 +2006,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } + if (!ns->params.attached) { + return NVME_INVALID_FIELD | NVME_DNR; + } } } else if (nsid && nsid != NVME_NSID_BROADCAST) { if (!nvme_nsid_valid(n, nsid)) { @@ -2015,6 +2056,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) if (!ns) { continue; } + if (!ns->params.attached) { + continue; + } if (!(dw11 & 0x1) && blk_enable_write_cache(ns->blkconf.blk)) { blk_flush(ns->blkconf.blk); diff --git a/include/block/nvme.h b/include/block/nvme.h index f5ac9143c4..27125c9d28 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -805,14 +805,18 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 enum NvmeIdCns { - NVME_ID_CNS_NS = 0x00, - NVME_ID_CNS_CTRL = 0x01, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, - NVME_ID_CNS_NS_DESCR_LIST = 0x03, - NVME_ID_CNS_CS_NS = 0x05, - NVME_ID_CNS_CS_CTRL = 0x06, - NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, - NVME_ID_CNS_IO_COMMAND_SET = 0x1c, + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_CS_NS = 0x05, + NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, + NVME_ID_CNS_NS_PRESENT_LIST = 0x10, + NVME_ID_CNS_NS_PRESENT = 0x11, + NVME_ID_CNS_CS_NS_PRESENT_LIST = 0x1a, + NVME_ID_CNS_CS_NS_PRESENT = 0x1b, + NVME_ID_CNS_IO_COMMAND_SET = 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { From patchwork Mon Oct 19 02:17:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843509 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9499A1580 for ; Mon, 19 Oct 2020 02:25:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1FDD222268 for ; Mon, 19 Oct 2020 02:25:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="VJ5vODNM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1FDD222268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56368 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKrx-000164-2u for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:25:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56214) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkA-0008Ee-DY; Sun, 18 Oct 2020 22:17:50 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44104) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk4-0004Gy-SF; Sun, 18 Oct 2020 22:17:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073864; x=1634609864; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ww/utn4s0ZVdwIMi7p0SGJkcpeLx8EULfhX3wnCOf+4=; b=VJ5vODNMRH0SeAh/STz+lazSK5uO6BM4bH+WV3zDMyhqZePG/Y6KlWsr faIeeuHt+LO2svm1GnR2NVm835avZVLHeKFJ2DfUJ+EOKR25MdgGPP2lW XHiO6PTgltYBnAZ5g+gsTLy+qxl8EAQr5Z44aIIFozEtfrICx/mGMAg4F OK13AACYouHni6/Ub6pGUdbyV2PgnTMohB4Cx+tmU/n7uLS4ARGzpKxjF U0KoM2kpoG/Qms9W6BEbEbZMqIDNFITCeUZRXL55RPe/mFAUnSgBbOzyu DekrZjTuyNlUDAlwEl/p3kULk68/Ry+f07gM2wAmRA5i2cLQnPnldUNcy w==; IronPort-SDR: wSd0w0aeoaQZJ0hKWnwiZ3LDVBBk25T9imsiUk6oo6hvHilU7tH7Wkwrwx3Zwvs707ztxbsZ6W FpvmsG2sKwv7DelayXS3BkyF4wqQ9wh4FmM5yFwZNkReW2SpaziKxUM0L2y80yWsRnzZuHfSxJ mYDeJqe4l7UpZ+ehSvw7csBG7gwGrPdEqxtFp7EJDBB60wsx1r0il/gDvbGTjFXUeJ8JEKw76R Uy5/f48n05Zvzl8T32oNJaLuQ/dDTM05K2kTDWoNMeJPfzEzx6WIQuaoe24ZyicqpY7x0Ek0h3 eTE= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207964" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:43 +0800 IronPort-SDR: sRvblzhMaW438sLHgpNxX7Zvekj968y9Z0ABrx7RfDSxeKLlE6vuD9Q7nSNF0VLe9yf6ql2cjm rv6K26mRbIP+/OxV+2GGl5duAwhS9Oo36XcU8eDuseAVuv3UiwOaZkAkGHn3mXX3jeaLIZQ6fB CsWO1uDIo6av4UqND/fm2uJgv6RW57/u5YVUaEi8+ZqA5QaZHk4v/GOZb+QCFChlZN6BG5np6B SnP9DzphAYUHdnaerb9nMSDXf1CHQpU34on8rsL2yJ/ePmkCfazBkeHR5GvsCC4UwS+CaE0Rfo kZOA25FA7uFMgiQphKeyTV8t Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:20 -0700 IronPort-SDR: JFVyze0A2Yogj6B9znDomc4lWiUm69Z4r1nBk4Rpm9zjCKqg1aVU2enquvgfLctG45ZBBEHrHh TqPrhsjXH8yTqbPndLK7mliwhE/RTxRVwH2ivlNn/InGcOpJDf8f6kyjbxKi5sxoYjwgoFpy83 cu4rRd+vAVnZK3I10VjS1klUzX9nihxwlhFv0IzrhskORQBBrkY5+NmEOnOuB0P75CxI2TtzqN 2oTFzfbM0464s2dxaec0CLCyAcHEedQTbfqR9kx1YTOSL/7sCknITfwMvxJWQ8ZD0+ljvEDIcI vx0= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:41 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 05/11] hw/block/nvme: Support Zoned Namespace Command Set Date: Mon, 19 Oct 2020 11:17:20 +0900 Message-Id: <20201019021726.12048-6-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel Signed-off-by: Hans Holmberg Signed-off-by: Ajay Joshi Signed-off-by: Chaitanya Kulkarni Signed-off-by: Matias Bjorling Signed-off-by: Aravind Ramesh Signed-off-by: Shin'ichiro Kawasaki Signed-off-by: Adam Manzanares Signed-off-by: Dmitry Fomichev --- block/nvme.c | 2 +- hw/block/nvme-ns.c | 193 +++++++++ hw/block/nvme-ns.h | 54 +++ hw/block/nvme.c | 975 ++++++++++++++++++++++++++++++++++++++++-- hw/block/nvme.h | 9 + hw/block/trace-events | 21 + include/block/nvme.h | 113 ++++- 7 files changed, 1339 insertions(+), 28 deletions(-) diff --git a/block/nvme.c b/block/nvme.c index 05485fdd11..7a513c9a17 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c) { uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; if (status) { - trace_nvme_error(le32_to_cpu(c->result), + trace_nvme_error(le32_to_cpu(c->result32), le16_to_cpu(c->sq_head), le16_to_cpu(c->sq_id), le16_to_cpu(c->cid), diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 974aea33f7..fedfad595c 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -25,6 +25,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-core.h" +#include "trace.h" #include "nvme.h" #include "nvme-ns.h" @@ -76,6 +77,171 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) return 0; } +static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) +{ + uint64_t zone_size, zone_cap; + uint32_t nz, lbasz = ns->blkconf.logical_block_size; + + if (ns->params.zone_size_bs) { + zone_size = ns->params.zone_size_bs; + } else { + zone_size = NVME_DEFAULT_ZONE_SIZE; + } + if (ns->params.zone_cap_bs) { + zone_cap = ns->params.zone_cap_bs; + } else { + zone_cap = zone_size; + } + if (zone_cap > zone_size) { + error_setg(errp, "zone capacity %luB exceeds zone size %luB", + zone_cap, zone_size); + return -1; + } + if (zone_size < lbasz) { + error_setg(errp, "zone size %luB too small, must be at least %uB", + zone_size, lbasz); + return -1; + } + if (zone_cap < lbasz) { + error_setg(errp, "zone capacity %luB too small, must be at least %uB", + zone_cap, lbasz); + return -1; + } + ns->zone_size = zone_size / lbasz; + ns->zone_capacity = zone_cap / lbasz; + + nz = DIV_ROUND_UP(ns->size / lbasz, ns->zone_size); + ns->num_zones = nz; + ns->zone_array_size = sizeof(NvmeZone) * nz; + ns->zone_size_log2 = 0; + if (is_power_of_2(ns->zone_size)) { + ns->zone_size_log2 = 63 - clz64(ns->zone_size); + } + + return 0; +} + +static void nvme_init_zone_state(NvmeNamespace *ns) +{ + uint64_t start = 0, zone_size = ns->zone_size; + uint64_t capacity = ns->num_zones * zone_size; + NvmeZone *zone; + int i; + + ns->zone_array = g_malloc0(ns->zone_array_size); + + QTAILQ_INIT(&ns->exp_open_zones); + QTAILQ_INIT(&ns->imp_open_zones); + QTAILQ_INIT(&ns->closed_zones); + QTAILQ_INIT(&ns->full_zones); + + zone = ns->zone_array; + for (i = 0; i < ns->num_zones; i++, zone++) { + if (start + zone_size > capacity) { + zone_size = capacity - start; + } + zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE; + nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); + zone->d.za = 0; + zone->d.zcap = ns->zone_capacity; + zone->d.zslba = start; + zone->d.wp = start; + zone->w_ptr = start; + start += zone_size; + } +} + +static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, + Error **errp) +{ + NvmeIdNsZoned *id_ns_z; + + if (n->params.fill_pattern == 0xff) { + ns->id_ns.dlfeat |= 0x02; + } + if (n->params.fill_pattern != 0x00) { + ns->id_ns.dlfeat &= ~0x01; + } + + if (nvme_calc_zone_geometry(ns, errp) != 0) { + return -1; + } + + nvme_init_zone_state(ns); + + id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); + + /* MAR/MOR are zeroes-based, 0xffffffff means no limit */ + id_ns_z->mar = 0xffffffff; + id_ns_z->mor = 0xffffffff; + id_ns_z->zoc = 0; + id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; + + id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); + id_ns_z->lbafe[lba_index].zdes = 0; + + ns->csi = NVME_CSI_ZONED; + ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); + ns->id_ns.ncap = cpu_to_le64(ns->zone_capacity * ns->num_zones); + ns->id_ns.nuse = ns->id_ns.ncap; + + ns->id_ns_zoned = id_ns_z; + + return 0; +} + +/* + * Close or finish all the zones that are currently open. + */ +static void nvme_zoned_clear_ns(NvmeNamespace *ns) +{ + NvmeZone *zone; + uint32_t set_state; + int i; + + zone = ns->zone_array; + for (i = 0; i < ns->num_zones; i++, zone++) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + /* fall through */ + default: + continue; + } + + if (zone->d.wp == zone->d.zslba) { + set_state = NVME_ZONE_STATE_EMPTY; + } else { + set_state = NVME_ZONE_STATE_CLOSED; + } + + switch (set_state) { + case NVME_ZONE_STATE_CLOSED: + trace_pci_nvme_clear_ns_close(nvme_get_zone_state(zone), + zone->d.zslba); + QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_EMPTY: + trace_pci_nvme_clear_ns_reset(nvme_get_zone_state(zone), + zone->d.zslba); + break; + case NVME_ZONE_STATE_FULL: + trace_pci_nvme_clear_ns_full(nvme_get_zone_state(zone), + zone->d.zslba); + zone->d.wp = nvme_zone_wr_boundary(zone); + QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); + } + + zone->w_ptr = zone->d.wp; + nvme_set_zone_state(zone, set_state); + } +} + static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { if (!ns->blkconf.blk) { @@ -97,6 +263,12 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) } nvme_ns_init(ns); + if (ns->params.zoned) { + if (nvme_zoned_init_ns(n, ns, 0, errp) != 0) { + return -1; + } + } + if (nvme_register_namespace(n, ns, errp)) { return -1; } @@ -114,6 +286,21 @@ void nvme_ns_flush(NvmeNamespace *ns) blk_flush(ns->blkconf.blk); } +void nvme_ns_clear(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + nvme_zoned_clear_ns(ns); + } +} + +void nvme_ns_cleanup(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + g_free(ns->id_ns_zoned); + g_free(ns->zone_array); + } +} + static void nvme_ns_realize(DeviceState *dev, Error **errp) { NvmeNamespace *ns = NVME_NS(dev); @@ -133,6 +320,12 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true), + DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), + DEFINE_PROP_SIZE("zone_size", NvmeNamespace, params.zone_size_bs, + NVME_DEFAULT_ZONE_SIZE), + DEFINE_PROP_SIZE("zone_capacity", NvmeNamespace, params.zone_cap_bs, 0), + DEFINE_PROP_BOOL("cross_zone_read", NvmeNamespace, + params.cross_zone_read, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d6b2808b97..170cbb8cdc 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -19,10 +19,21 @@ #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +typedef struct NvmeZone { + NvmeZoneDescr d; + uint64_t w_ptr; + QTAILQ_ENTRY(NvmeZone) entry; +} NvmeZone; + typedef struct NvmeNamespaceParams { uint32_t nsid; bool attached; QemuUUID uuid; + + bool zoned; + bool cross_zone_read; + uint64_t zone_size_bs; + uint64_t zone_cap_bs; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -34,6 +45,18 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; + NvmeIdNsZoned *id_ns_zoned; + NvmeZone *zone_array; + QTAILQ_HEAD(, NvmeZone) exp_open_zones; + QTAILQ_HEAD(, NvmeZone) imp_open_zones; + QTAILQ_HEAD(, NvmeZone) closed_zones; + QTAILQ_HEAD(, NvmeZone) full_zones; + uint32_t num_zones; + uint64_t zone_size; + uint64_t zone_capacity; + uint64_t zone_array_size; + uint32_t zone_size_log2; + NvmeNamespaceParams params; } NvmeNamespace; @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) typedef struct NvmeCtrl NvmeCtrl; +static inline uint8_t nvme_get_zone_state(NvmeZone *zone) +{ + return zone->d.zs >> 4; +} + +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState state) +{ + zone->d.zs = state << 4; +} + +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) +{ + return zone->d.zslba + ns->zone_size; +} + +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) +{ + return zone->d.zslba + zone->d.zcap; +} + +static inline bool nvme_wp_is_valid(NvmeZone *zone) +{ + uint8_t st = nvme_get_zone_state(zone); + + return st != NVME_ZONE_STATE_FULL && + st != NVME_ZONE_STATE_READ_ONLY && + st != NVME_ZONE_STATE_OFFLINE; +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); +void nvme_ns_clear(NvmeNamespace *ns); +void nvme_ns_cleanup(NvmeNamespace *ns); #endif /* NVME_NS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 93728e51b3..34d0d0250d 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -133,6 +133,16 @@ static const uint32_t nvme_cse_iocs_nvm[256] = { [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, }; +static const uint32_t nvme_cse_iocs_zoned[256] = { + [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_APPEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_ZONE_MGMT_SEND] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_MGMT_RECV] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -149,6 +159,46 @@ static uint16_t nvme_sqid(NvmeRequest *req) return le16_to_cpu(req->sq->sqid); } +static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + if (QTAILQ_IN_USE(zone, entry)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_REMOVE(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_REMOVE(&ns->full_zones, zone, entry); + } + } + + nvme_set_zone_state(zone, state); + + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); + case NVME_ZONE_STATE_READ_ONLY: + break; + default: + zone->d.za = 0; + } +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr low = n->ctrl_mem.addr; @@ -841,7 +891,7 @@ static void nvme_process_aers(void *opaque) req = n->aer_reqs[n->outstanding_aers]; - result = (NvmeAerResult *) &req->cqe.result; + result = (NvmeAerResult *) &req->cqe.result32; result->event_type = event->result.event_type; result->event_info = event->result.event_info; result->log_page = event->result.log_page; @@ -910,6 +960,326 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns, return NVME_SUCCESS; } +static void nvme_fill_read_data(NvmeRequest *req, uint64_t offset, + uint32_t max_len, uint8_t pattern) +{ + QEMUSGList *qsg = &req->qsg; + QEMUIOVector *iov = &req->iov; + ScatterGatherEntry *entry; + uint32_t len, ent_len; + + if (qsg->nsg > 0) { + entry = qsg->sg; + len = qsg->size; + if (max_len) { + len = MIN(len, max_len); + } + for (; len > 0; len -= ent_len) { + ent_len = MIN(len, entry->len); + if (offset > ent_len) { + offset -= ent_len; + } else if (offset != 0) { + dma_memory_set(qsg->as, entry->base + offset, + pattern, ent_len - offset); + offset = 0; + } else { + dma_memory_set(qsg->as, entry->base, pattern, ent_len); + } + entry++; + } + } else if (iov->iov) { + len = iov_size(iov->iov, iov->niov); + if (max_len) { + len = MIN(len, max_len); + } + qemu_iovec_memset(iov, offset, pattern, len - offset); + } +} + +static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba) +{ + return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 : + slba / ns->zone_size; +} + +static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba) +{ + uint32_t zone_idx = nvme_zone_idx(ns, slba); + + assert(zone_idx < ns->num_zones); + return &ns->zone_array[zone_idx]; +} + +static uint16_t nvme_zone_state_ok_to_write(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_FULL: + status = NVME_ZONE_FULL; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE; + break; + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_ZONE_READ_ONLY; + break; + default: + assert(false); + } + + return status; +} + +static uint16_t nvme_check_zone_write(NvmeCtrl *n, NvmeNamespace *ns, + NvmeZone *zone, uint64_t slba, + uint32_t nlb, bool append) +{ + uint16_t status; + + if (unlikely((slba + nlb) > nvme_zone_wr_boundary(zone))) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + status = nvme_zone_state_ok_to_write(zone); + } + + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_write_not_ok(slba, nlb, status); + } else { + assert(nvme_wp_is_valid(zone)); + if (append) { + if (unlikely(slba != zone->d.zslba)) { + trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba); + status = NVME_ZONE_INVALID_WRITE; + } + if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) { + trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl); + status = NVME_INVALID_FIELD; + } + } else if (unlikely(slba != zone->w_ptr)) { + trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba, + zone->w_ptr); + status = NVME_ZONE_INVALID_WRITE; + } + } + + return status; +} + +static uint16_t nvme_zone_state_ok_to_read(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_FULL: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE | NVME_DNR; + break; + default: + assert(false); + } + + return status; +} + +typedef struct NvmeReadFillCtx { + uint64_t pre_rd_fill_slba; + uint64_t read_slba; + uint64_t post_rd_fill_slba; + + uint32_t pre_rd_fill_nlb; + uint32_t read_nlb; + uint32_t post_rd_fill_nlb; +} NvmeReadFillCtx; + +static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, + uint64_t slba, uint32_t nlb, + NvmeReadFillCtx *rfc) +{ + NvmeZone *next_zone; + uint64_t bndry = nvme_zone_rd_boundary(ns, zone); + uint64_t end = slba + nlb, wp1, wp2; + uint16_t status; + + rfc->pre_rd_fill_slba = ~0ULL; + rfc->pre_rd_fill_nlb = 0; + rfc->read_slba = slba; + rfc->read_nlb = nlb; + rfc->post_rd_fill_slba = ~0ULL; + rfc->post_rd_fill_nlb = 0; + + status = nvme_zone_state_ok_to_read(zone); + if (status != NVME_SUCCESS) { + ; + } else if (likely(end <= bndry)) { + if (end > zone->w_ptr) { + wp1 = zone->w_ptr; + if (slba >= wp1) { + /* No i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } + } else if (!ns->params.cross_zone_read) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + /* + * Read across zone boundary, look at the next zone. + * Earlier bounds checks ensure that the current zone + * is not the last one. + */ + next_zone = zone + 1; + status = nvme_zone_state_ok_to_read(next_zone); + if (status != NVME_SUCCESS) { + ; + } else if (end > nvme_zone_rd_boundary(ns, next_zone)) { + /* + * As zone size is much larger than a typical maximum + * i/o size in real hardware, allow the i/o range + * to span no more than one pair of zones. + */ + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + wp1 = zone->w_ptr; + wp2 = next_zone->w_ptr; + if (wp2 == bndry) { + if (slba >= wp1) { + /* Again, no i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } else if (slba < wp1) { + if (end > wp2) { + if (wp1 == bndry) { + rfc->post_rd_fill_slba = wp2; + rfc->post_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + } else { + rfc->pre_rd_fill_slba = wp2; + rfc->pre_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + if (end > wp2) { + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = end - slba; + rfc->read_slba = bndry; + rfc->read_nlb = wp2 - bndry; + } else { + rfc->read_slba = bndry; + rfc->read_nlb = end - bndry; + rfc->post_rd_fill_slba = slba; + rfc->post_rd_fill_nlb = bndry - slba; + } + } + } + } + + return status; +} + +static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, + bool failed) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeZone *zone; + uint64_t slba, start_wp = req->cqe.result64; + uint32_t nlb; + + if (rw->opcode != NVME_CMD_WRITE && + rw->opcode != NVME_CMD_ZONE_APPEND && + rw->opcode != NVME_CMD_WRITE_ZEROES) { + return false; + } + + slba = le64_to_cpu(rw->slba); + nlb = le16_to_cpu(rw->nlb) + 1; + zone = nvme_get_zone_by_slba(ns, slba); + + if (!failed && zone->w_ptr < start_wp + nlb) { + /* + * A preceding queued write to the zone has failed, + * now this write is not at the WP, fail it too. + */ + failed = true; + } + + if (failed) { + if (zone->w_ptr > start_wp) { + zone->w_ptr = start_wp; + zone->d.wp = start_wp; + } + req->cqe.result64 = 0; + } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + break; + default: + assert(false); + } + zone->d.wp = zone->w_ptr; + } else { + zone->d.wp += nlb; + } + + return failed; +} + +static uint64_t nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, + uint32_t nlb) +{ + uint64_t result = zone->w_ptr; + uint8_t zs; + + zone->w_ptr += nlb; + + if (zone->w_ptr < nvme_zone_wr_boundary(zone)) { + zs = nvme_get_zone_state(zone); + switch (zs) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); + } + } + + return result; +} + static void nvme_rw_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -924,10 +1294,27 @@ static void nvme_rw_cb(void *opaque, int ret) trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); if (!ret) { - block_acct_done(stats, acct); + if (ns->params.zoned) { + if (nvme_finalize_zoned_write(ns, req, false)) { + ret = EIO; + block_acct_failed(stats, acct); + req->status = NVME_ZONE_INVALID_WRITE; + } else if (req->fill_len) { + nvme_fill_read_data(req, req->fill_ofs, req->fill_len, + nvme_ctrl(req)->params.fill_pattern); + req->fill_len = 0; + } + } + if (!ret) { + block_acct_done(stats, acct); + } } else { uint16_t status; + if (ns->params.zoned) { + nvme_finalize_zoned_write(ns, req, true); + } + block_acct_failed(stats, acct); switch (req->cmd.opcode) { @@ -969,8 +1356,10 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns = req->ns; uint64_t slba = le64_to_cpu(rw->slba); uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + NvmeZone *zone; uint64_t offset = nvme_l2b(ns, slba); uint32_t count = nvme_l2b(ns, nlb); + BlockBackend *blk = ns->blkconf.blk; uint16_t status; trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); @@ -981,24 +1370,41 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return status; } - block_acct_start(blk_get_stats(req->ns->blkconf.blk), &req->acct, 0, - BLOCK_ACCT_WRITE); - req->aiocb = blk_aio_pwrite_zeroes(req->ns->blkconf.blk, offset, count, + if (ns->params.zoned) { + zone = nvme_get_zone_by_slba(ns, slba); + + status = nvme_check_zone_write(n, ns, zone, slba, nlb, false); + if (status != NVME_SUCCESS) { + goto invalid; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } + + block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(blk, offset, count, BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); + return status | NVME_DNR; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t slba = le64_to_cpu(rw->slba); - uint64_t data_size = nvme_l2b(ns, nlb); - uint64_t data_offset = nvme_l2b(ns, slba); - enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ? - BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; + uint64_t data_offset, fill_ofs; + + NvmeZone *zone; + uint32_t fill_len; + NvmeReadFillCtx rfc; + bool is_write = rw->opcode == NVME_CMD_WRITE || append; + enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -1017,14 +1423,71 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) goto invalid; } + if (ns->params.zoned) { + zone = nvme_get_zone_by_slba(ns, slba); + + if (is_write) { + status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); + if (status != NVME_SUCCESS) { + goto invalid; + } + + if (append) { + slba = zone->w_ptr; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } else { + status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); + goto invalid; + } + } + } else if (append) { + trace_pci_nvme_err_invalid_opc(rw->opcode); + status = NVME_INVALID_OPCODE; + goto invalid; + } + + data_offset = nvme_l2b(ns, slba); + status = nvme_map_dptr(n, data_size, req); if (status) { goto invalid; } + if (ns->params.zoned) { + if (is_write) { + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } else { + if (rfc.pre_rd_fill_nlb) { + fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); + fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); + nvme_fill_read_data(req, fill_ofs, fill_len, + n->params.fill_pattern); + } + if (!rfc.read_nlb) { + /* No backend I/O necessary, only needed to fill the buffer */ + req->status = NVME_SUCCESS; + return NVME_SUCCESS; + } + if (rfc.post_rd_fill_nlb) { + req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); + req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); + } else { + req->fill_len = 0; + } + slba = rfc.read_slba; + data_size = nvme_l2b(ns, rfc.read_nlb); + } + } + + data_offset = nvme_l2b(ns, slba); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, acct); if (req->qsg.sg) { - if (acct == BLOCK_ACCT_WRITE) { + if (is_write) { req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); } else { @@ -1032,7 +1495,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) BDRV_SECTOR_SIZE, nvme_rw_cb, req); } } else { - if (acct == BLOCK_ACCT_WRITE) { + if (is_write) { req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, nvme_rw_cb, req); } else { @@ -1043,10 +1506,383 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; invalid: - block_acct_invalid(blk_get_stats(ns->blkconf.blk), acct); + block_acct_invalid(blk_get_stats(blk), acct); + return status | NVME_DNR; +} + +static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, + uint64_t *slba, uint32_t *zone_idx) +{ + uint32_t dw10 = le32_to_cpu(c->cdw10); + uint32_t dw11 = le32_to_cpu(c->cdw11); + + if (!ns->params.zoned) { + trace_pci_nvme_err_invalid_opc(c->opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + + *slba = ((uint64_t)dw11) << 32 | dw10; + if (unlikely(*slba >= ns->id_ns.nsze)) { + trace_pci_nvme_err_invalid_lba_range(*slba, 0, ns->id_ns.nsze); + *slba = 0; + return NVME_LBA_RANGE | NVME_DNR; + } + + *zone_idx = nvme_zone_idx(ns, *slba); + assert(*zone_idx < ns->num_zones); + + return NVME_SUCCESS; +} + +static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); + /* fall through */ + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_open_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_CLOSED; +} + +static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + /* fall through */ + case NVME_ZONE_STATE_CLOSED: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_close_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN; +} + +static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + zone->w_ptr = nvme_zone_wr_boundary(zone); + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_finish_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN || + state == NVME_ZONE_STATE_CLOSED; +} + +static uint16_t nvme_reset_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_FULL: + zone->w_ptr = zone->d.zslba; + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY); + /* fall through */ + case NVME_ZONE_STATE_EMPTY: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_reset_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN || + state == NVME_ZONE_STATE_CLOSED || + state == NVME_ZONE_STATE_FULL; +} + +static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_READ_ONLY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE); + /* fall through */ + case NVME_ZONE_STATE_OFFLINE: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_offline_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_READ_ONLY; +} + +typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, + uint8_t); +typedef bool (*need_to_proc_zone_t)(uint8_t); + +static uint16_t name_do_zone_op(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state, bool all, + op_handler_t op_hndlr, + need_to_proc_zone_t proc_zone) +{ + int i; + uint16_t status = 0; + + if (!all) { + status = op_hndlr(ns, zone, state); + } else { + for (i = 0; i < ns->num_zones; i++, zone++) { + state = nvme_get_zone_state(zone); + if (proc_zone(state)) { + status = op_hndlr(ns, zone, state); + if (status != NVME_SUCCESS) { + break; + } + } + } + } + return status; } +static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint64_t slba = 0; + uint32_t zone_idx = 0; + uint16_t status; + uint8_t action, state; + bool all; + NvmeZone *zone; + + action = dw13 & 0xff; + all = dw13 & 0x100; + + req->status = NVME_SUCCESS; + + if (!all) { + status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (status) { + return status; + } + } + + zone = &ns->zone_array[zone_idx]; + if (slba != zone->d.zslba) { + trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + state = nvme_get_zone_state(zone); + + switch (action) { + + case NVME_ZONE_ACTION_OPEN: + trace_pci_nvme_open_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_open_zone, nvme_cond_open_all); + break; + + case NVME_ZONE_ACTION_CLOSE: + trace_pci_nvme_close_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_close_zone, nvme_cond_close_all); + break; + + case NVME_ZONE_ACTION_FINISH: + trace_pci_nvme_finish_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_finish_zone, nvme_cond_finish_all); + break; + + case NVME_ZONE_ACTION_RESET: + trace_pci_nvme_reset_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_reset_zone, nvme_cond_reset_all); + break; + + case NVME_ZONE_ACTION_OFFLINE: + trace_pci_nvme_offline_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_offline_zone, nvme_cond_offline_all); + break; + + case NVME_ZONE_ACTION_SET_ZD_EXT: + trace_pci_nvme_set_descriptor_extension(slba, zone_idx); + return NVME_INVALID_FIELD | NVME_DNR; + break; + + default: + trace_pci_nvme_err_invalid_mgmt_action(action); + status = NVME_INVALID_FIELD; + } + + if (status == NVME_ZONE_INVAL_TRANSITION) { + trace_pci_nvme_err_invalid_zone_state_transition(state, action, slba, + zone->d.za); + } + if (status) { + status |= NVME_DNR; + } + + return status; +} + +static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl) +{ + int zs = nvme_get_zone_state(zl); + + switch (zafs) { + case NVME_ZONE_REPORT_ALL: + return true; + case NVME_ZONE_REPORT_EMPTY: + return zs == NVME_ZONE_STATE_EMPTY; + case NVME_ZONE_REPORT_IMPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN; + case NVME_ZONE_REPORT_EXPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN; + case NVME_ZONE_REPORT_CLOSED: + return zs == NVME_ZONE_STATE_CLOSED; + case NVME_ZONE_REPORT_FULL: + return zs == NVME_ZONE_STATE_FULL; + case NVME_ZONE_REPORT_READ_ONLY: + return zs == NVME_ZONE_STATE_READ_ONLY; + case NVME_ZONE_REPORT_OFFLINE: + return zs == NVME_ZONE_STATE_OFFLINE; + default: + return false; + } +} + +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + /* cdw12 is zero-based number of dwords to return. Convert to bytes */ + uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint32_t zone_idx, zra, zrasf, partial; + uint64_t max_zones, nr_zones = 0; + uint16_t ret; + uint64_t slba; + NvmeZoneDescr *z; + NvmeZone *zs; + NvmeZoneReportHeader *header; + void *buf, *buf_p; + size_t zone_entry_sz; + + req->status = NVME_SUCCESS; + + ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (ret) { + return ret; + } + + if (len < sizeof(NvmeZoneReportHeader)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + zra = dw13 & 0xff; + if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (zra == NVME_ZONE_REPORT_EXTENDED) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + zrasf = (dw13 >> 8) & 0xff; + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + partial = (dw13 >> 16) & 0x01; + + zone_entry_sz = sizeof(NvmeZoneDescr); + + max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; + buf = g_malloc0(len); + + header = (NvmeZoneReportHeader *)buf; + buf_p = buf + sizeof(NvmeZoneReportHeader); + + while (zone_idx < ns->num_zones && nr_zones < max_zones) { + zs = &ns->zone_array[zone_idx]; + + if (!nvme_zone_matches_filter(zrasf, zs)) { + zone_idx++; + continue; + } + + z = (NvmeZoneDescr *)buf_p; + buf_p += sizeof(NvmeZoneDescr); + nr_zones++; + + z->zt = zs->d.zt; + z->zs = zs->d.zs; + z->zcap = cpu_to_le64(zs->d.zcap); + z->zslba = cpu_to_le64(zs->d.zslba); + z->za = zs->d.za; + + if (nvme_wp_is_valid(zs)) { + z->wp = cpu_to_le64(zs->d.wp); + } else { + z->wp = cpu_to_le64(~0ULL); + } + + zone_idx++; + } + + if (!partial) { + for (; zone_idx < ns->num_zones; zone_idx++) { + zs = &ns->zone_array[zone_idx]; + if (nvme_zone_matches_filter(zrasf, zs)) { + nr_zones++; + } + } + } + header->nr_zones = cpu_to_le64(nr_zones); + + ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req); + + g_free(buf); + + return ret; +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { uint32_t nsid = le32_to_cpu(req->cmd.nsid); @@ -1076,9 +1912,15 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: return nvme_write_zeroes(n, req); + case NVME_CMD_ZONE_APPEND: + return nvme_rw(n, req, true); case NVME_CMD_WRITE: case NVME_CMD_READ: - return nvme_rw(n, req); + return nvme_rw(n, req, false); + case NVME_CMD_ZONE_MGMT_SEND: + return nvme_zone_mgmt_send(n, req); + case NVME_CMD_ZONE_MGMT_RECV: + return nvme_zone_mgmt_recv(n, req); default: assert(false); } @@ -1320,7 +2162,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, +static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len, uint64_t off, NvmeRequest *req) { NvmeEffectsLog log = {}; @@ -1339,6 +2181,15 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, src_iocs = nvme_cse_iocs_nvm; case NVME_CC_CSS_ADMIN_ONLY: break; + case NVME_CC_CSS_CSI: + switch (csi) { + case NVME_CSI_NVM: + src_iocs = nvme_cse_iocs_nvm; + break; + case NVME_CSI_ZONED: + src_iocs = nvme_cse_iocs_zoned; + break; + } } memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs)); @@ -1364,6 +2215,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) uint8_t lid = dw10 & 0xff; uint8_t lsp = (dw10 >> 8) & 0xf; uint8_t rae = (dw10 >> 15) & 0x1; + uint8_t csi = le32_to_cpu(cmd->cdw14) >> 24; uint32_t numdl, numdu; uint64_t off, lpol, lpou; size_t len; @@ -1397,7 +2249,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) case NVME_LOG_FW_SLOT_INFO: return nvme_fw_log_info(n, len, off, req); case NVME_LOG_CMD_EFFECTS: - return nvme_cmd_effects(n, len, off, req); + return nvme_cmd_effects(n, csi, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -1517,6 +2369,16 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req); } +static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns) +{ + switch (ns->csi) { + case NVME_CSI_NVM: + case NVME_CSI_ZONED: + return true; + } + return false; +} + static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_identify_ctrl(); @@ -1528,11 +2390,16 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) { NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + NvmeIdCtrlZoned id = {}; trace_pci_nvme_identify_ctrl_csi(c->csi); if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED) { + id.zasl = n->zasl; + return nvme_dma(n, (uint8_t *)&id, sizeof(id), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1560,8 +2427,12 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), - DMA_DIRECTION_FROM_DEVICE, req); + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), + DMA_DIRECTION_FROM_DEVICE, req); + } + + return NVME_INVALID_CMD_SET | NVME_DNR; } static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, @@ -1586,8 +2457,11 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - if (c->csi == NVME_CSI_NVM) { + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) { + return nvme_dma(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1649,7 +2523,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, trace_pci_nvme_identify_nslist_csi(min_nsid, c->csi); - if (c->csi != NVME_CSI_NVM) { + if (c->csi != NVME_CSI_NVM && c->csi != NVME_CSI_ZONED) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1658,7 +2532,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, if (!ns) { continue; } - if (ns->params.nsid < min_nsid) { + if (ns->params.nsid < min_nsid || c->csi != ns->csi) { continue; } if (only_active && !ns->params.attached) { @@ -1728,6 +2602,8 @@ static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_identify_cmd_set(); NVME_SET_CSI(*list, NVME_CSI_NVM); + NVME_SET_CSI(*list, NVME_CSI_ZONED); + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } @@ -1770,7 +2646,7 @@ static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req) { uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff; - req->cqe.result = 1; + req->cqe.result32 = 1; if (nvme_check_sqid(n, sqid)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1955,7 +2831,7 @@ defaults: } out: - req->cqe.result = cpu_to_le32(result); + req->cqe.result32 = cpu_to_le32(result); return NVME_SUCCESS; } @@ -2086,8 +2962,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) ((dw11 >> 16) & 0xFFFF) + 1, n->params.max_ioqpairs, n->params.max_ioqpairs); - req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) | - ((n->params.max_ioqpairs - 1) << 16)); + req->cqe.result32 = cpu_to_le32((n->params.max_ioqpairs - 1) | + ((n->params.max_ioqpairs - 1) << 16)); break; case NVME_ASYNCHRONOUS_EVENT_CONF: n->features.async_config = dw11; @@ -2242,6 +3118,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n) nvme_ns_flush(ns); } + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_clear(ns); + } + n->bar.cc = 0; } @@ -2262,6 +3147,13 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) ns->iocs = nvme_cse_iocs_nvm; } break; + case NVME_CSI_ZONED: + if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { + ns->iocs = nvme_cse_iocs_zoned; + } else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { + ns->iocs = nvme_cse_iocs_nvm; + } + break; } } } @@ -2360,6 +3252,17 @@ static int nvme_start_ctrl(NvmeCtrl *n) nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0, NVME_AQA_ASQS(n->bar.aqa) + 1); + if (!n->params.zasl_bs) { + n->zasl = n->params.mdts; + } else { + if (n->params.zasl_bs < n->page_size) { + trace_pci_nvme_err_startfail_zasl_too_small(n->params.zasl_bs, + n->page_size); + return -1; + } + n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size); + } + nvme_set_timestamp(n, 0ULL); QTAILQ_INIT(&n->aer_queue); @@ -2784,6 +3687,13 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) host_memory_backend_set_mapped(n->pmrdev, true); } + + if (n->params.zasl_bs) { + if (!is_power_of_2(n->params.zasl_bs)) { + error_setg(errp, "zone append size limit has to be a power of 2"); + return; + } + } } static void nvme_init_state(NvmeCtrl *n) @@ -3049,9 +3959,21 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) static void nvme_exit(PCIDevice *pci_dev) { NvmeCtrl *n = NVME(pci_dev); + NvmeNamespace *ns; + int i; nvme_clear_ctrl(n); + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_cleanup(ns); + } g_free(n->namespaces); + g_free(n->cq); g_free(n->sq); g_free(n->aer_reqs); @@ -3079,6 +4001,9 @@ static Property nvme_props[] = { DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64), DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7), DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), + DEFINE_PROP_UINT8("fill_pattern", NvmeCtrl, params.fill_pattern, 0), + DEFINE_PROP_SIZE32("zone_append_size_limit", NvmeCtrl, params.zasl_bs, + NVME_DEFAULT_MAX_ZA_SIZE), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e080a2318a..c406cb1c65 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -6,6 +6,9 @@ #define NVME_MAX_NAMESPACES 256 +#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) +#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ @@ -16,6 +19,8 @@ typedef struct NvmeParams { uint32_t aer_max_queued; uint8_t mdts; bool use_intel_id; + uint8_t fill_pattern; + uint32_t zasl_bs; } NvmeParams; typedef struct NvmeAsyncEvent { @@ -28,6 +33,8 @@ typedef struct NvmeRequest { struct NvmeNamespace *ns; BlockAIOCB *aiocb; uint16_t status; + uint64_t fill_ofs; + uint32_t fill_len; NvmeCqe cqe; NvmeCmd cmd; BlockAcctCookie acct; @@ -147,6 +154,8 @@ typedef struct NvmeCtrl { QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue; int aer_queued; + uint8_t zasl; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; diff --git a/hw/block/trace-events b/hw/block/trace-events index 65b964c894..af53e31fcb 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -90,6 +90,15 @@ pci_nvme_mmio_stopped(void) "cleared controller enable bit" pci_nvme_mmio_shutdown_set(void) "shutdown bit set" pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared" pci_nvme_cmd_supp_and_effects_log_read(void) "commands supported and effects log read" +pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" +pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" +pci_nvme_clear_ns_full(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Full state" # nvme traces for error conditions pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu" @@ -109,8 +118,18 @@ pci_nvme_err_invalid_prp(void) "invalid PRP" pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" +pci_nvme_err_unaligned_zone_cmd(uint8_t action, uint64_t slba, uint64_t zslba) "unaligned zone op 0x%"PRIx32", got slba=%"PRIu64", zslba=%"PRIu64"" +pci_nvme_err_invalid_zone_state_transition(uint8_t state, uint8_t action, uint64_t slba, uint8_t attrs) "0x%"PRIx32"->0x%"PRIx32", slba=%"PRIu64", attrs=0x%"PRIx32"" +pci_nvme_err_write_not_at_wp(uint64_t slba, uint64_t zone, uint64_t wp) "writing at slba=%"PRIu64", zone=%"PRIu64", but wp=%"PRIu64"" +pci_nvme_err_append_not_at_start(uint64_t slba, uint64_t zone) "appending at slba=%"PRIu64", but zone=%"PRIu64"" +pci_nvme_err_zone_write_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" +pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded" +pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" +pci_nvme_err_only_zoned_cmd_set_avail(void) "setting 001b CC.CSS, but only ZONED+NVM command set is enabled" pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" @@ -144,7 +163,9 @@ pci_nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_ pci_nvme_err_startfail_css(uint8_t css) "nvme_start_ctrl failed because invalid command set selected:%u" pci_nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero" pci_nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero" +pci_nvme_err_startfail_zasl_too_small(uint32_t zasl, uint32_t pagesz) "nvme_start_ctrl failed because zone append size limit %"PRIu32" is too small, needs to be >= %"PRIu32"" pci_nvme_err_startfail(void) "setting controller enable bit failed" +pci_nvme_err_invalid_mgmt_action(int action) "action=0x%"PRIx8"" # Traces for undefined behavior pci_nvme_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 27125c9d28..54bc93b6ab 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -489,6 +489,9 @@ enum NvmeIoCommands { NVME_CMD_COMPARE = 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM = 0x09, + NVME_CMD_ZONE_MGMT_SEND = 0x79, + NVME_CMD_ZONE_MGMT_RECV = 0x7a, + NVME_CMD_ZONE_APPEND = 0x7d, }; typedef struct QEMU_PACKED NvmeDeleteQ { @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult { } NvmeAerResult; typedef struct QEMU_PACKED NvmeCqe { - uint32_t result; - uint32_t rsvd; + union { + uint64_t result64; + uint32_t result32; + }; uint16_t sq_head; uint16_t sq_id; uint16_t cid; @@ -678,6 +683,7 @@ enum NvmeStatusCodes { NVME_SGL_DESCR_TYPE_INVALID = 0x0011, NVME_INVALID_USE_OF_CMB = 0x0012, NVME_CMD_SET_CMB_REJECTED = 0x002b, + NVME_INVALID_CMD_SET = 0x002c, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -702,6 +708,14 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO = 0x0182, + NVME_ZONE_BOUNDARY_ERROR = 0x01b8, + NVME_ZONE_FULL = 0x01b9, + NVME_ZONE_READ_ONLY = 0x01ba, + NVME_ZONE_OFFLINE = 0x01bb, + NVME_ZONE_INVALID_WRITE = 0x01bc, + NVME_ZONE_TOO_MANY_ACTIVE = 0x01bd, + NVME_ZONE_TOO_MANY_OPEN = 0x01be, + NVME_ZONE_INVAL_TRANSITION = 0x01bf, NVME_WRITE_FAULT = 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR = 0x0282, @@ -886,6 +900,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl { uint8_t vs[1024]; } NvmeIdCtrl; +typedef struct NvmeIdCtrlZoned { + uint8_t zasl; + uint8_t rsvd1[4095]; +} NvmeIdCtrlZoned; + enum NvmeIdCtrlOacs { NVME_OACS_SECURITY = 1 << 0, NVME_OACS_FORMAT = 1 << 1, @@ -1011,6 +1030,12 @@ typedef struct QEMU_PACKED NvmeLBAF { uint8_t rp; } NvmeLBAF; +typedef struct QEMU_PACKED NvmeLBAFE { + uint64_t zsze; + uint8_t zdes; + uint8_t rsvd9[7]; +} NvmeLBAFE; + #define NVME_NSID_BROADCAST 0xffffffff typedef struct QEMU_PACKED NvmeIdNs { @@ -1065,10 +1090,24 @@ enum NvmeNsIdentifierType { enum NvmeCsi { NVME_CSI_NVM = 0x00, + NVME_CSI_ZONED = 0x02, }; #define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) +typedef struct QEMU_PACKED NvmeIdNsZoned { + uint16_t zoc; + uint16_t ozcs; + uint32_t mar; + uint32_t mor; + uint32_t rrl; + uint32_t frl; + uint8_t rsvd20[2796]; + NvmeLBAFE lbafe[16]; + uint8_t rsvd3072[768]; + uint8_t vs[256]; +} NvmeIdNsZoned; + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat) ((dlfeat) & 0x08) @@ -1100,6 +1139,71 @@ enum NvmeIdNsDps { DPS_FIRST_EIGHT = 8, }; +enum NvmeZoneAttr { + NVME_ZA_FINISHED_BY_CTLR = 1 << 0, + NVME_ZA_FINISH_RECOMMENDED = 1 << 1, + NVME_ZA_RESET_RECOMMENDED = 1 << 2, + NVME_ZA_ZD_EXT_VALID = 1 << 7, +}; + +typedef struct QEMU_PACKED NvmeZoneReportHeader { + uint64_t nr_zones; + uint8_t rsvd[56]; +} NvmeZoneReportHeader; + +enum NvmeZoneReceiveAction { + NVME_ZONE_REPORT = 0, + NVME_ZONE_REPORT_EXTENDED = 1, +}; + +enum NvmeZoneReportType { + NVME_ZONE_REPORT_ALL = 0, + NVME_ZONE_REPORT_EMPTY = 1, + NVME_ZONE_REPORT_IMPLICITLY_OPEN = 2, + NVME_ZONE_REPORT_EXPLICITLY_OPEN = 3, + NVME_ZONE_REPORT_CLOSED = 4, + NVME_ZONE_REPORT_FULL = 5, + NVME_ZONE_REPORT_READ_ONLY = 6, + NVME_ZONE_REPORT_OFFLINE = 7, +}; + +enum NvmeZoneType { + NVME_ZONE_TYPE_RESERVED = 0x00, + NVME_ZONE_TYPE_SEQ_WRITE = 0x02, +}; + +enum NvmeZoneSendAction { + NVME_ZONE_ACTION_RSD = 0x00, + NVME_ZONE_ACTION_CLOSE = 0x01, + NVME_ZONE_ACTION_FINISH = 0x02, + NVME_ZONE_ACTION_OPEN = 0x03, + NVME_ZONE_ACTION_RESET = 0x04, + NVME_ZONE_ACTION_OFFLINE = 0x05, + NVME_ZONE_ACTION_SET_ZD_EXT = 0x10, +}; + +typedef struct QEMU_PACKED NvmeZoneDescr { + uint8_t zt; + uint8_t zs; + uint8_t za; + uint8_t rsvd3[5]; + uint64_t zcap; + uint64_t zslba; + uint64_t wp; + uint8_t rsvd32[32]; +} NvmeZoneDescr; + +enum NvmeZoneState { + NVME_ZONE_STATE_RESERVED = 0x00, + NVME_ZONE_STATE_EMPTY = 0x01, + NVME_ZONE_STATE_IMPLICITLY_OPEN = 0x02, + NVME_ZONE_STATE_EXPLICITLY_OPEN = 0x03, + NVME_ZONE_STATE_CLOSED = 0x04, + NVME_ZONE_STATE_READ_ONLY = 0x0D, + NVME_ZONE_STATE_FULL = 0x0E, + NVME_ZONE_STATE_OFFLINE = 0x0F, +}; + static inline void _nvme_check_size(void) { QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096); @@ -1119,9 +1223,14 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescr) != 64); } #endif From patchwork Mon Oct 19 02:17:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D16B14B7 for ; Mon, 19 Oct 2020 02:23:33 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C2D6722268 for ; Mon, 19 Oct 2020 02:23:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Q7CJJRJ+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2D6722268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51676 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKpf-0007W3-SJ for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:23:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56208) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk9-0008Cm-PW; Sun, 18 Oct 2020 22:17:49 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44111) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk7-0004HN-46; Sun, 18 Oct 2020 22:17:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073866; x=1634609866; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZInA5bc05tdFpX0RBH8fz4YxYzhOl0To+LlACckGMJw=; b=Q7CJJRJ+bxNPpcGBzkp09iHwtLO+lpMjM7LYGd4nFwUUXTQDpnTGnpq5 lib2HMD1QCin9EEfO0gF18nSVnA+vHyL9ZqZCgxyK30WGtFKiiWVZNL5+ lbxNgqqraN9K1guPBgjFyMLwLKfskozhHPwiWO7MP8RB+JhAMruFHs/2u 3Bt4F+qBnZdY6dLfQuxcUZcfCEbmpGByS7O2ewg52VY0Y4ZiZmIBp6soW zfnOeSD+MC4vw8DUuJTHh7f+NmJoJQkItSedh+Ytdrw2GcVNuSlscBkWz +PycojppzbCWFPFgHj7WNvj6lhw4CZWX9JuzWBHfoaytdiA7eh4tJZZbk A==; IronPort-SDR: JzmqYAvZGsRqwp8odk/M4ttAOiBkuitZ4vdpyt4hDdvgZ+FgKer/2eh4USXbVEwMAdpX8uAqCK ZNIPhK9qs/LfUJynlLlMJih/KC7vk7Fjy8t+YiK9paVIuixVjwANFgmJozOi8VX4CnnmvqoSID DVw54o+uZZJm3W3Ly2Ku7WWbffWNVi8PJW1fNkA0071QYx9XwGY3VSJppu7nvRmWs/nuQeW1ac xIo05aMlUlcocKfZNb6VBrd+EU9hCMKuacF/3fPQmE30mq3Y1S/Pk88F9ZTACQY0FZqnAWmtCx /Uc= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207968" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:45 +0800 IronPort-SDR: fJy5E9WJh8ZcpCuimt498pL8/u7ax86wHDPvPhWfluXyOokme85isdMvuHN2RFs0uZ/iPaNnqQ HZeWb/1mT1TFcUH3klOSo1ZyR/4hUP2D4+f38COB0pPQilpg/N/CdYaGbCGROY7Rqgs6z/ZRhE hQtReLclQT4bCD+Ap634J69Z3rzZoGP8YTLlU5F0x28JdrQuMwe2nWhiFYlWkrqi5xIOZfqKWH HRtamk7IQyjtc3La+g6I0ICfnsXQxwH9+gAXr7Gb4NdSu3FN4HhwVlxEzRsH2hk76/LvEYnYEw KpJPZa1SWHPn/MZ7dOCaN5gD Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:22 -0700 IronPort-SDR: 5uS793SbGhehkBZVYXnBpf5l/oQrJhdI3MSJkvI4+ZedfZuowlAYU7P4oCdgA+f+PtCLU/YMXt 3h+3Bh0pPHnTIR7kW3HbdFVgbYBq5x5ffZeWwlseOE349ci+l7cVDvQTrKRCPGDHpyUE5pIpN+ kSC0vz90edamH/GN8MwMmnbkcWWTRe+Qzrd5hnT2rlvJJNs0LutwT93RhsEuuY68/Reus6G+fR Ln1oCiYlGTmsC8WQgzGmj7+w93xOoZQCGuWzy0GCq4M9XDvyVya9h8LcKnps0lelkqfm0duGlr B0Q= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:44 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 06/11] hw/block/nvme: Introduce max active and open zone limits Date: Mon, 19 Oct 2020 11:17:21 +0900 Message-Id: <20201019021726.12048-7-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add two module properties, "max_active" and "max_open" to control the maximum number of zones that can be active or open. Once these variables are set to non-default values, these limits are checked during I/O and Too Many Active or Too Many Open command status is returned if they are exceeded. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 28 ++++++++++++- hw/block/nvme-ns.h | 41 +++++++++++++++++++ hw/block/nvme.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 166 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index fedfad595c..8d9e11eef2 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -118,6 +118,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) ns->zone_size_log2 = 63 - clz64(ns->zone_size); } + /* Make sure that the values of all ZNS properties are sane */ + if (ns->params.max_open_zones > nz) { + error_setg(errp, + "max_open_zones value %u exceeds the number of zones %u", + ns->params.max_open_zones, nz); + return -1; + } + if (ns->params.max_active_zones > nz) { + error_setg(errp, + "max_active_zones value %u exceeds the number of zones %u", + ns->params.max_active_zones, nz); + return -1; + } + return 0; } @@ -172,8 +186,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); /* MAR/MOR are zeroes-based, 0xffffffff means no limit */ - id_ns_z->mar = 0xffffffff; - id_ns_z->mor = 0xffffffff; + id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1); + id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1); id_ns_z->zoc = 0; id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; @@ -199,6 +213,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) uint32_t set_state; int i; + ns->nr_active_zones = 0; + ns->nr_open_zones = 0; + zone = ns->zone_array; for (i = 0; i < ns->num_zones; i++, zone++) { switch (nvme_get_zone_state(zone)) { @@ -209,6 +226,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); break; case NVME_ZONE_STATE_CLOSED: + nvme_aor_inc_active(ns); /* fall through */ default: continue; @@ -216,6 +234,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) if (zone->d.wp == zone->d.zslba) { set_state = NVME_ZONE_STATE_EMPTY; + } else if (ns->params.max_active_zones == 0 || + ns->nr_active_zones < ns->params.max_active_zones) { + set_state = NVME_ZONE_STATE_CLOSED; } else { set_state = NVME_ZONE_STATE_CLOSED; } @@ -224,6 +245,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) case NVME_ZONE_STATE_CLOSED: trace_pci_nvme_clear_ns_close(nvme_get_zone_state(zone), zone->d.zslba); + nvme_aor_inc_active(ns); QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); break; case NVME_ZONE_STATE_EMPTY: @@ -326,6 +348,8 @@ static Property nvme_ns_props[] = { DEFINE_PROP_SIZE("zone_capacity", NvmeNamespace, params.zone_cap_bs, 0), DEFINE_PROP_BOOL("cross_zone_read", NvmeNamespace, params.cross_zone_read, false), + DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 0), + DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 170cbb8cdc..b0633d0def 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -34,6 +34,8 @@ typedef struct NvmeNamespaceParams { bool cross_zone_read; uint64_t zone_size_bs; uint64_t zone_cap_bs; + uint32_t max_active_zones; + uint32_t max_open_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -56,6 +58,8 @@ typedef struct NvmeNamespace { uint64_t zone_capacity; uint64_t zone_array_size; uint32_t zone_size_log2; + int32_t nr_open_zones; + int32_t nr_active_zones; NvmeNamespaceParams params; } NvmeNamespace; @@ -123,6 +127,43 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline void nvme_aor_inc_open(NvmeNamespace *ns) +{ + assert(ns->nr_open_zones >= 0); + if (ns->params.max_open_zones) { + ns->nr_open_zones++; + assert(ns->nr_open_zones <= ns->params.max_open_zones); + } +} + +static inline void nvme_aor_dec_open(NvmeNamespace *ns) +{ + if (ns->params.max_open_zones) { + assert(ns->nr_open_zones > 0); + ns->nr_open_zones--; + } + assert(ns->nr_open_zones >= 0); +} + +static inline void nvme_aor_inc_active(NvmeNamespace *ns) +{ + assert(ns->nr_active_zones >= 0); + if (ns->params.max_active_zones) { + ns->nr_active_zones++; + assert(ns->nr_active_zones <= ns->params.max_active_zones); + } +} + +static inline void nvme_aor_dec_active(NvmeNamespace *ns) +{ + if (ns->params.max_active_zones) { + assert(ns->nr_active_zones > 0); + ns->nr_active_zones--; + assert(ns->nr_active_zones >= ns->nr_open_zones); + } + assert(ns->nr_active_zones >= 0); +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 34d0d0250d..b3cdfccdfb 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -199,6 +199,26 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, } } +/* + * Check if we can open a zone without exceeding open/active limits. + * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5). + */ +static int nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn) +{ + if (ns->params.max_active_zones != 0 && + ns->nr_active_zones + act > ns->params.max_active_zones) { + trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones); + return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR; + } + if (ns->params.max_open_zones != 0 && + ns->nr_open_zones + opn > ns->params.max_open_zones) { + trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones); + return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR; + } + + return NVME_SUCCESS; +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr low = n->ctrl_mem.addr; @@ -1207,6 +1227,41 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, return status; } +static void nvme_auto_transition_zone(NvmeNamespace *ns, bool implicit, + bool adding_active) +{ + NvmeZone *zone; + + if (implicit && ns->params.max_open_zones && + ns->nr_open_zones == ns->params.max_open_zones) { + zone = QTAILQ_FIRST(&ns->imp_open_zones); + if (zone) { + /* + * Automatically close this implicitly open zone. + */ + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + nvme_aor_dec_open(ns); + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + } + } +} + +static uint16_t nvme_auto_open_zone(NvmeNamespace *ns, NvmeZone *zone) +{ + uint16_t status = NVME_SUCCESS; + uint8_t zs = nvme_get_zone_state(zone); + + if (zs == NVME_ZONE_STATE_EMPTY) { + nvme_auto_transition_zone(ns, true, true); + status = nvme_aor_check(ns, 1, 1); + } else if (zs == NVME_ZONE_STATE_CLOSED) { + nvme_auto_transition_zone(ns, true, false); + status = nvme_aor_check(ns, 0, 1); + } + + return status; +} + static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, bool failed) { @@ -1243,7 +1298,11 @@ static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, switch (nvme_get_zone_state(zone)) { case NVME_ZONE_STATE_IMPLICITLY_OPEN: case NVME_ZONE_STATE_EXPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_EMPTY: nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); /* fall through */ @@ -1272,7 +1331,10 @@ static uint64_t nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, zs = nvme_get_zone_state(zone); switch (zs) { case NVME_ZONE_STATE_EMPTY: + nvme_aor_inc_active(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_inc_open(ns); nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); } } @@ -1378,6 +1440,11 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) goto invalid; } + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } @@ -1436,6 +1503,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) slba = zone->w_ptr; } + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } else { status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); @@ -1537,9 +1609,27 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, uint8_t state) { + uint16_t status; + switch (state) { case NVME_ZONE_STATE_EMPTY: + nvme_auto_transition_zone(ns, false, true); + status = nvme_aor_check(ns, 1, 0); + if (status != NVME_SUCCESS) { + return status; + } + nvme_aor_inc_active(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + status = nvme_aor_check(ns, 0, 1); + if (status != NVME_SUCCESS) { + if (state == NVME_ZONE_STATE_EMPTY) { + nvme_aor_dec_active(ns); + } + return status; + } + nvme_aor_inc_open(ns); + /* fall through */ case NVME_ZONE_STATE_IMPLICITLY_OPEN: nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); /* fall through */ @@ -1561,6 +1651,7 @@ static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); /* fall through */ case NVME_ZONE_STATE_CLOSED: @@ -1582,7 +1673,11 @@ static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_EMPTY: zone->w_ptr = nvme_zone_wr_boundary(zone); zone->d.wp = zone->w_ptr; @@ -1608,7 +1703,11 @@ static uint16_t nvme_reset_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_FULL: zone->w_ptr = zone->d.zslba; zone->d.wp = zone->w_ptr; From patchwork Mon Oct 19 02:17:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83B4814B7 for ; Mon, 19 Oct 2020 02:22:16 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D0D621D7B for ; Mon, 19 Oct 2020 02:22:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="rmOL0o+i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D0D621D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:47168 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKoR-0005iN-2V for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:22:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56224) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkC-0008JN-7R; Sun, 18 Oct 2020 22:17:52 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44109) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKk9-0004HF-Tv; Sun, 18 Oct 2020 22:17:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073869; x=1634609869; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T4CrKO+e3ZlxyxNPGbeWS9PvNKXYmH5Bee10DPq9190=; b=rmOL0o+i6jFbCZVV3ry3CQKAaM47U7maG8tOQA3qgVstsSHot7Vbw4Ad 2HuTuM3I8nBBvl6vPBWZGsLWCP81Us0tB9IKpqwmGZeHMhh7DTHKboLr1 RVqrlkvPPVoP/rZORzp5pCIH9r+vyEL5BZOmDwMtNxkDTprGOqkz5rN0E y3LHvn3/C0QsfARXUVXp5dYtPFjNMVvg4gdH95ZYt+YRL4upbQbRezuIc TL74FN0bplFwq6VQVNDPoq+e/rJiRaZT0iJStd0Brpiw78v13DV3u2Fzh W6sM6VM6mH/2cTSr3i8WuimWX9k89M3Xc9f1rHyCyZat6JrO1UnbX7+sk Q==; IronPort-SDR: bi1zG66lKYjVQ2GZdoPPfJHbdVLOEhHTGN0dabT4U8DWBugYJCvjdEYXd+R/mVS9urfS0T+Els cPdXpzcvYMjPrU570ObnFM054cAk04ZXTo+cph9Q1K2yJiQCrhN6DbAnmqWCSFssTEKvc48gae 8XCqXEKJViJDWMaYZQ+vQyhw/p5lQlHj3khbCWm7lIzWYVLOl5C/Rx0PKnJ2lQCwK08a+dVjbD hMcKQxo9Q3eYvFgCsqPRPzGUQ8qobmAbzz4r1OncP6GMgysVsYxo1hawsL2/GWVRLYih3SHaSP oSE= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207973" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:48 +0800 IronPort-SDR: e/5MNID6nxw8OPbCHj7/FYYg0PWE3Zrh8qYZt1rPL6n01PXzgWEUR6AOoddTVDexIRK59vF5uB 9pO3SC5909up/kIpdzMlAFE0LK9t5mE9CtKFx/94Py7QKcA4aVVupBIqH0cNhPcY3prrULKC59 vbF0IzGyJIrukIFxBfx+QGGpNalFssOVM/FvgG20Y/gVUpnh5nn0vjFrgvuILhhjaS1yK4J1ML xGnp2Bwqis/WkGny7xm0h9BSl6Dw2sFSG1oPWmcNnfGoh0gTHRAswzjkw3ChNn0EpUvypdje/4 GGjWYED9WKQhc0L48Na9NGhz Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:24 -0700 IronPort-SDR: uHbjkCIfguYYXDUNvzOYGho3IZmFNul6Chgdcs5Iuk7ObNve3Io5Z0cKE2UhBZBkeb7KAy6mI/ CLHf6HQGBO0tnQtE5cBu6EnxY+gRh/PAEHPz3l78V7xMGSI7VFpdAbMZj03Pm313y1zHYHYpCz Qhi1ymk/9IYYvjfS9oiZKFBRXofI2RlPHhi00wkFRYEs7Tz3LyJYyx0t9fBFq4GHDK15ob7rh/ GJtBn35Ypz4yd9BUaWhMa6Ic81lJ5JhnnJnXOXTm7Zu8MvrVW1zCxbDwEhGQmMXW6+CSEydIOJ 5eE= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:46 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 07/11] hw/block/nvme: Support Zone Descriptor Extensions Date: Mon, 19 Oct 2020 11:17:22 +0900 Message-Id: <20201019021726.12048-8-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Zone Descriptor Extension is a label that can be assigned to a zone. It can be set to an Empty zone and it stays assigned until the zone is reset. This commit adds a new optional module property, "zone_descr_ext_size". Its value must be a multiple of 64 bytes. If this value is non-zero, it becomes possible to assign extensions of that size to any Empty zones. The default value for this property is 0, therefore setting extensions is disabled by default. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen --- hw/block/nvme-ns.c | 14 ++++++++++-- hw/block/nvme-ns.h | 8 +++++++ hw/block/nvme.c | 51 +++++++++++++++++++++++++++++++++++++++++-- hw/block/trace-events | 2 ++ 4 files changed, 71 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 8d9e11eef2..255ded2b43 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -143,6 +143,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns) int i; ns->zone_array = g_malloc0(ns->zone_array_size); + if (ns->params.zd_extension_size) { + ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * + ns->num_zones); + } QTAILQ_INIT(&ns->exp_open_zones); QTAILQ_INIT(&ns->imp_open_zones); @@ -192,7 +196,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); - id_ns_z->lbafe[lba_index].zdes = 0; + id_ns_z->lbafe[lba_index].zdes = + ns->params.zd_extension_size >> 6; /* Units of 64B */ ns->csi = NVME_CSI_ZONED; ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); @@ -232,7 +237,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) continue; } - if (zone->d.wp == zone->d.zslba) { + if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { + set_state = NVME_ZONE_STATE_CLOSED; + } else if (zone->d.wp == zone->d.zslba) { set_state = NVME_ZONE_STATE_EMPTY; } else if (ns->params.max_active_zones == 0 || ns->nr_active_zones < ns->params.max_active_zones) { @@ -320,6 +327,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (ns->params.zoned) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); + g_free(ns->zd_extensions); } } @@ -350,6 +358,8 @@ static Property nvme_ns_props[] = { params.cross_zone_read, false), DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 0), DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), + DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace, + params.zd_extension_size, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index b0633d0def..2d70a13701 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -36,6 +36,7 @@ typedef struct NvmeNamespaceParams { uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; + uint32_t zd_extension_size; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -58,6 +59,7 @@ typedef struct NvmeNamespace { uint64_t zone_capacity; uint64_t zone_array_size; uint32_t zone_size_log2; + uint8_t *zd_extensions; int32_t nr_open_zones; int32_t nr_active_zones; @@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ + return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size]; +} + static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index b3cdfccdfb..fbf27a5098 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1747,6 +1747,26 @@ static bool nvme_cond_offline_all(uint8_t state) return state == NVME_ZONE_STATE_READ_ONLY; } +static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + uint16_t status; + + if (state == NVME_ZONE_STATE_EMPTY) { + nvme_auto_transition_zone(ns, false, true); + status = nvme_aor_check(ns, 1, 0); + if (status != NVME_SUCCESS) { + return status; + } + nvme_aor_inc_active(ns); + zone->d.za |= NVME_ZA_ZD_EXT_VALID; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, uint8_t); typedef bool (*need_to_proc_zone_t)(uint8_t); @@ -1787,6 +1807,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) uint8_t action, state; bool all; NvmeZone *zone; + uint8_t *zd_ext; action = dw13 & 0xff; all = dw13 & 0x100; @@ -1841,7 +1862,22 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) case NVME_ZONE_ACTION_SET_ZD_EXT: trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - return NVME_INVALID_FIELD | NVME_DNR; + if (all || !ns->params.zd_extension_size) { + return NVME_INVALID_FIELD | NVME_DNR; + } + zd_ext = nvme_get_zd_extension(ns, zone_idx); + status = nvme_dma(n, zd_ext, ns->params.zd_extension_size, + DMA_DIRECTION_TO_DEVICE, req); + if (status) { + trace_pci_nvme_err_zd_extension_map_error(zone_idx); + return status; + } + + status = nvme_set_zd_ext(ns, zone, state); + if (status == NVME_SUCCESS) { + trace_pci_nvme_zd_extension_set(zone_idx); + return status; + } break; default: @@ -1919,7 +1955,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - if (zra == NVME_ZONE_REPORT_EXTENDED) { + if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1931,6 +1967,9 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) partial = (dw13 >> 16) & 0x01; zone_entry_sz = sizeof(NvmeZoneDescr); + if (zra == NVME_ZONE_REPORT_EXTENDED) { + zone_entry_sz += ns->params.zd_extension_size; + } max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; buf = g_malloc0(len); @@ -1962,6 +2001,14 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) z->wp = cpu_to_le64(~0ULL); } + if (zra == NVME_ZONE_REPORT_EXTENDED) { + if (zs->d.za & NVME_ZA_ZD_EXT_VALID) { + memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), + ns->params.zd_extension_size); + } + buf_p += ns->params.zd_extension_size; + } + zone_idx++; } diff --git a/hw/block/trace-events b/hw/block/trace-events index af53e31fcb..962084e40c 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -96,6 +96,7 @@ pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, sl pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_zd_extension_set(uint32_t zone_idx) "set descriptor extension for zone_idx=%"PRIu32"" pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" pci_nvme_clear_ns_full(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Full state" @@ -127,6 +128,7 @@ pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slb pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded" pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded" +pci_nvme_err_zd_extension_map_error(uint32_t zone_idx) "can't map descriptor extension for zone_idx=%"PRIu32"" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" pci_nvme_err_only_zoned_cmd_set_avail(void) "setting 001b CC.CSS, but only ZONED+NVM command set is enabled" From patchwork Mon Oct 19 02:17:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F7AB14B7 for ; Mon, 19 Oct 2020 02:23:57 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D340D22268 for ; Mon, 19 Oct 2020 02:23:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="msJms2Iz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D340D22268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:52394 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKq3-0007oE-V1 for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:23:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56240) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkE-0008On-7h; Sun, 18 Oct 2020 22:17:54 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44142) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkC-0004J9-3M; Sun, 18 Oct 2020 22:17:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073871; x=1634609871; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HYiq0JK4dRwQV6ig2IjA1t2BEXwoUc+F9PcNVyweUws=; b=msJms2Izm925U7bqSu+3Lazb8y7xDCW2VCnlTurd6D7p1dZNUAWafPd5 U227NkQWoIq+jSVuQQCeDlQQMolwS9aB/YrHhVNygJp2lh5LMdVIStCqE N9ezna07gwLchrsLFLrlt8getAJX3nTbO9kGKFpTjiFgv5E/0yD+sQo7w XIuttrPm/g/jwNsD2a+5UUjtuHwf1DAKdTiHHF8rHGv9bm0Q81jJ+f/Tv YxwCGl+JmFfuMH/g4qoAU6KkZLgfkjLdoQDV8Vd4C471Xhljn5sQ7rvoK 9LjLsDuiE64nPubMxoB/WD/WwkKmU4GnaDlD5+B3+I7tusxgaTmfnvOTA w==; IronPort-SDR: FsmNeb69ZFxZYKL3Z2Z9yzuLuTsDrv+JvN+zJs1gpZijrRPlspDrZcar8bmkorGZujH9/OkbWE 8QjquNVmiFwvscs5RD1m8uHclUHQ97Yu2yAwLmlEz+dQy1blUlfynL2WNlurF45UnJhwJL+hQ+ ZR267oCGd/aR+wL6Fri/RGe0asVu5pm57qAnl+R/cSgXvMPjn7e6Vsoq/+omlR3yQRQ231cq6c 7nSxDZfmpPIKa7a7OBGnQmUep3ShUt2hFFpD4C4zAu+f3rRBHz2kaagtsI6GVPOLmC+YfE4Do7 1ao= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207976" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:50 +0800 IronPort-SDR: wav4xIBMmHyvArpl5pgdNYHT/kL3nCPO4ECZgNev8AEbFyRAnmyy3xABCrABqAZkp0xaXTgATj 55gjhQ+y1PksQABjnSaj/xTkpigVEmOLcntKUiMALHDeDRjI9vRoyzWxNESp73jqrC7i4bLdFq zAs6x84mdvfos1c2cv42DZJUIyniQ9et7ZLahVEAmCAcElvmERlrTNa4Y6Cq3M6VQjr03A6w1k 1JKI3/0RYHNDSgPQPqUeiy4J7PsLp1N0GNixGpq7jWFRQ0oXyiw1l/E1peDHxfs9WgNpDbdeeA otq9j0dnbVGHvOluUg01E+Ib Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:26 -0700 IronPort-SDR: atd1TVTUxlUIzz5o8ef3FqSty+xO3GodPySPuxgeHyFfFgVR4Ro5o1wZPaf5smfhydpT1/UnB0 Ha34WpkhpNOJJIRS26vwlE0dvW12/Xb6CiaAvlDRANjZWWNlkfCpj8kBmk/Ub9AFiQub2OSkSz 0lgBdqUVhcmB/ikinJruQzcwln0yd86JHbUkTJb76WWBnOvuStDonkMraItTRPw27YxkLYG90C mleirg659U+cXe3D9JOVqXfX043l6DOGNpKo0OoyuUefey4vZTq2hI0lmwz5YXQrFeKSfnFmLj PS0= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:48 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 08/11] hw/block/nvme: Add injection of Offline/Read-Only zones Date: Mon, 19 Oct 2020 11:17:23 +0900 Message-Id: <20201019021726.12048-9-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" ZNS specification defines two zone conditions for the zones that no longer can function properly, possibly because of flash wear or other internal fault. It is useful to be able to "inject" a small number of such zones for testing purposes. This commit defines two optional device properties, "offline_zones" and "rdonly_zones". Users can assign non-zero values to these variables to specify the number of zones to be initialized as Offline or Read-Only. The actual number of injected zones may be smaller than the requested amount - Read-Only and Offline counts are expected to be much smaller than the total number of zones on a drive. Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme-ns.h | 2 ++ 2 files changed, 66 insertions(+) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 255ded2b43..d050f97909 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -21,6 +21,7 @@ #include "sysemu/sysemu.h" #include "sysemu/block-backend.h" #include "qapi/error.h" +#include "crypto/random.h" #include "hw/qdev-properties.h" #include "hw/qdev-core.h" @@ -132,6 +133,32 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) return -1; } + if (ns->params.zd_extension_size) { + if (ns->params.zd_extension_size & 0x3f) { + error_setg(errp, + "zone descriptor extension size must be a multiple of 64B"); + return -1; + } + if ((ns->params.zd_extension_size >> 6) > 0xff) { + error_setg(errp, "zone descriptor extension size is too large"); + return -1; + } + } + + if (ns->params.max_open_zones < nz) { + if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) { + error_setg(errp, "offline_zones value %u is too large", + ns->params.nr_offline_zones); + return -1; + } + if (ns->params.nr_rdonly_zones > + nz - ns->params.max_open_zones - ns->params.nr_offline_zones) { + error_setg(errp, "rdonly_zones value %u is too large", + ns->params.nr_rdonly_zones); + return -1; + } + } + return 0; } @@ -140,7 +167,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns) uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; + uint32_t rnd; int i; + uint16_t zs; ns->zone_array = g_malloc0(ns->zone_array_size); if (ns->params.zd_extension_size) { @@ -167,6 +196,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns) zone->w_ptr = start; start += zone_size; } + + /* If required, make some zones Offline or Read Only */ + + for (i = 0; i < ns->params.nr_offline_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE); + } else { + i--; + } + } + + for (i = 0; i < ns->params.nr_rdonly_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE && + zs != NVME_ZONE_STATE_READ_ONLY) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY); + } else { + i--; + } + } } static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, @@ -360,6 +420,10 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace, params.zd_extension_size, 0), + DEFINE_PROP_UINT32("offline_zones", NvmeNamespace, + params.nr_offline_zones, 0), + DEFINE_PROP_UINT32("rdonly_zones", NvmeNamespace, + params.nr_rdonly_zones, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 2d70a13701..d65d8b0930 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -37,6 +37,8 @@ typedef struct NvmeNamespaceParams { uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; + uint32_t nr_offline_zones; + uint32_t nr_rdonly_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { From patchwork Mon Oct 19 02:17:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E9E514B7 for ; Mon, 19 Oct 2020 02:22:14 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 31CA621D7B for ; Mon, 19 Oct 2020 02:22:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="g2I4UgIq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31CA621D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:47010 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKoP-0005eS-8w for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:22:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56276) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkG-0008Uj-Ha; Sun, 18 Oct 2020 22:17:56 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44147) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkE-0004JR-EV; Sun, 18 Oct 2020 22:17:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073874; x=1634609874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Fhgn/f0fIuWoNzgTIkxKels/TGJIM6EgvI+Hvc5k32w=; b=g2I4UgIqimfhEL8hI2TwLryGH5qMIpTAWTScDjSrkXQgE6le85JFhrOt wfZBgsAP/Nsj8EBYP8pmqSpM1oMxROqRQIZ7pZCA/tRogXxaZXFqhWElX eB8hSKNynlO3WZdx4gibY5XFJFDCUpFZxjo0Gylb6t3llM5hZBhJtT131 G+yZ/YSyM4h0qi5E9v02DkzZX8n/UzD9dgTB6LvVHgUil2y6svHIX1gMG P34YLfy8ewuF2/dQTh9uP0KVobMl0JMvDnNju2VqdJyHtqq8UckriUtch SnfYotYMw6256jHQ8b15pjeVozvuuAdSZGhi/HxRQMTTGvNBjaQa+0WIQ A==; IronPort-SDR: AsiOGIzF/MDhQT9TNZR61Prnavanx31LqVyCDAUuZbukJBl9nC5VFS/TQ7KuPsbulalLR79/xX lc6odLD5YqrxeE1Jd0izoKrWbbQSIkx91bBz4UVDjTt6VtmmFGzgtr3/2DBiJB3Tty0t1IrwbF yEpnNask9NXwNnb+LkxC54ELg1pPaL740DCBVf7cAU2usNqbi1NacCqkg2ojy0cCGG5BOkiXU6 5dtCLKFhju+S2AeA9PQOYLGZqdZLNz7881ld2fkyf9CBUgkpE8fTnlkJg7IFI6JB4wELqhBlVK 3aI= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207980" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:52 +0800 IronPort-SDR: Q/s5K2Js/xqsphVUyGylsoMTBb1z8BPF7KUsqI8Qlv2oPhgS5yg3Y7j6ysYUHG2o/f2qkAiLic nyB545/k1kLi3ZySHARCgjQVxN2OZ8PFENhvmEHTCGvhaFYDtIwsgSot+6MydHQWUCW2xmPgoS lhytAk/zXVjaYP40IcNDl4/XSSVRyZxfPADeSqgAb2A/n71CMPawBgtlyXeCjTgh+FTSosdCFe cBv924bIDHGncUXAfPIzfcy5Ipz/qP4fH+zZ6agQun5MiL9btAXkDMySfeBAjMtN3hkTxzNYHD clWGEJpKChbla9SbTe91z8AP Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:29 -0700 IronPort-SDR: BiiDpf5Up/++2AB4PhO6jA3lOft1bv8Zoh5uPDOYuTaEcGLEA4o9qqJ8IYC6mGMivLwXFHGVHH ybHU0G1d9/8htnDgtxWvvqZMOlSFr8pEwcNqhwiXwToNHWmBmfRe3KGt6fhz95qUGVLSA2Ftis LUtM/vBhVVOARmiiJzC9mdSAfSvt9qMRAxeMEf8Bkit23dtVaZ1C8P3zEzel0JQdL9uRVHr36w XAee4M1IXVuSwh/8iT+X2syfPNRc+mqyWTV74OwVyRDFnKYaSJiBqIeeLIJz9wcGd4Er/GyUhC /SY= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:50 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 09/11] hw/block/nvme: Document zoned parameters in usage text Date: Mon, 19 Oct 2020 11:17:24 +0900 Message-Id: <20201019021726.12048-10-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Added brief descriptions of the new device properties that are now available to users to configure features of Zoned Namespace Command Set in the emulator. This patch is for documentation only, no functionality change. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 41 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index fbf27a5098..3b9ea326d7 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,7 +9,7 @@ */ /** - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e + * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e * * https://nvmexpress.org/developers/nvme-specification/ */ @@ -23,7 +23,8 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts= - * -device nvme-ns,drive=,bus=bus_name,nsid= + * -device nvme-ns,drive=,bus=,nsid=, \ + * zoned= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. @@ -49,6 +50,42 @@ * completion when there are no oustanding AERs. When the maximum number of * enqueued events are reached, subsequent events will be dropped. * + * Setting `zoned` to true selects Zoned Command Set at the namespace. + * In this case, the following options are available to configure zoned + * operation: + * zone_size= + * The number may be followed by K, M, G as in kilo-, mega- or giga. + * + * zone_capacity= + * The value 0 (default) forces zone capacity to be the same as zone + * size. The value of this property may not exceed zone size. + * + * zone_descr_ext_size= + * This value needs to be specified in 64B units. If it is zero, + * namespace(s) will not support zone descriptor extensions. + * + * max_active= + * + * max_open= + * + * zone_append_size_limit= + * The maximum I/O size that can be supported by Zone Append + * command. Since internally this this value is maintained as + * ZASL = log2( / ), some + * values assigned to this property may be rounded down and + * result in a lower maximum ZA data size being in effect. + * By setting this property to 0, user can make ZASL to be + * equial to MDTS. + * + * offline_zones= + * + * rdonly_zones= + * + * cross_zone_read= + * + * fill_pattern= + * The byte pattern to return for any portions of unwritten data + * during read. */ #include "qemu/osdep.h" From patchwork Mon Oct 19 02:17:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843513 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A17441580 for ; Mon, 19 Oct 2020 02:27:24 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51BE222268 for ; Mon, 19 Oct 2020 02:27:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="quN6jdAT" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51BE222268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:33018 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKtP-00035O-7v for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:27:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56316) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkL-000087-LV; Sun, 18 Oct 2020 22:18:01 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44142) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkF-0004J9-Tg; Sun, 18 Oct 2020 22:18:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073875; x=1634609875; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1iCaLcRtJJI6ux6xIcpSpsgXa0rjaAlQdhmLZREtcPs=; b=quN6jdATFX5XJ9dTETrD3hXJg4fbIPH4eq1mYiO1AFTo4nf7lZg6ix9Z XjNp3GzkPc6MtqWhxOq3ybiT8/l0MZrqf6xwseKhu58tyLGBrOtVxkoX7 2wGqcWvVcaqU0LDsIWsGcZs9Xe+hkhL1dU01+5RNQhr48kGWMnnslzIVx gQhOjTNJfCgyw68lfUHafy5Sof5NtjA+c8jiZLNMlzSJ0pJL7jVM+LpB5 ZeYzs0BG0nMVigF3C6x/gYTIKeetjD0nfgtOwW18N+7a6d6SX3RjoZs2n rGnj2pp7T1P86wid400o8aDFltcA0lBW+tCma1W6utHUxYcbiXUyyeU0X w==; IronPort-SDR: ov/+7RVXszbCz0ni4vRNjtFLp6ATgThWbbMHuTa7tkfF17hCVdvOpQvQdIDXaeMppPCYIVCOrD HEodd3QzJmdB6uvoem0z5Dn/jkJLxI6p6tlQ8als4NziJRgD0om93b8Mf3+bN/UswhGJqjQ24N lyfslSkY0+AWN0YMdMtDdcx7RlAGbu3f2rb2bOgCpbPUUX4bDTxf0SO5rA+0OYd4oBNbZBGAGw uOT+WNocCVZtxvvQEF06HET716KL/6lrrWyiJ3nzMahF3VryPMz3KMBKQ284hl+n0QbjGXavx6 6Es= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207982" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:54 +0800 IronPort-SDR: nhFKUpOnvI/jImUYuc/nAhZtrtHuuW5sUktlf6wT9DCagxth1UzmlzuEq2o58eNywVnkEzXYIr DiGNwsoOsiy71OzlR/PNcEIPB5EAAJ18PIv7V5n4Ep7eipu5Tt5LbiNAwkRECR9KKP5MDQxeJ2 6US8oRyXuZF1jH+T8menYCK1qzAnYC+Q3s0nQvnf6/wlpm2rkHmzOUi6y1VcQdEtZw6VvcW8gT k0uadzExPniKTHMip5kofoKPUXlg3j0PxhrM16oviI9GrpVSd9hrgKaUBmwj+U7FMSG41tr6jC 0HBBQZIsyt66CleKaxugiUiS Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:31 -0700 IronPort-SDR: oGppLOyS0TD5KXnNeT1gRs8w5G9X8rhHeIwI7STcV1Y1aPoq8qc6DKLgSNMJI5mL91+48PgPXe 7SIes05Rabb4cHuuGRxFHRXtWc/WXM0gEQH84rAZqN6LsmdRdHQXIX3tH1lyzEAUegM+Y3ejgD xClZ3HT5QpW/z2QrJbWd64Qk3aD0umoffzzZwFBAkcS/QslgwxycAlvobuZVXsL11Nw4ZAKipt kEVGKBfVOyjOxhEWOdmUxuIi1+COKd1PpDodnQnvSLqLS/wlYl96MNrbuJ1uL8LMu2aCIIYVYR 0g8= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:52 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 10/11] hw/block/nvme: Separate read and write handlers Date: Mon, 19 Oct 2020 11:17:25 +0900 Message-Id: <20201019021726.12048-11-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" With ZNS support in place, the majority of code in nvme_rw() has become read- or write-specific. Move these parts to two separate handlers, nvme_read() and nvme_write() to make the code more readable and to remove multiple is_write checks that so far existed in the i/o path. This is a refactoring patch, no change in functionality. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 191 +++++++++++++++++++++++++----------------- hw/block/trace-events | 3 +- 2 files changed, 114 insertions(+), 80 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 3b9ea326d7..5ec4ce5e28 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1162,10 +1162,10 @@ typedef struct NvmeReadFillCtx { uint32_t post_rd_fill_nlb; } NvmeReadFillCtx; -static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, - uint64_t slba, uint32_t nlb, - NvmeReadFillCtx *rfc) +static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba, + uint32_t nlb, NvmeReadFillCtx *rfc) { + NvmeZone *zone = nvme_get_zone_by_slba(ns, slba); NvmeZone *next_zone; uint64_t bndry = nvme_zone_rd_boundary(ns, zone); uint64_t end = slba + nlb, wp1, wp2; @@ -1449,6 +1449,86 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + uint64_t slba = le64_to_cpu(rw->slba); + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + uint32_t fill_len; + uint64_t data_size = nvme_l2b(ns, nlb); + uint64_t data_offset, fill_ofs; + NvmeReadFillCtx rfc; + BlockBackend *blk = ns->blkconf.blk; + uint16_t status; + + trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba); + + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + goto invalid; + } + + status = nvme_check_bounds(n, ns, slba, nlb); + if (status) { + trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); + goto invalid; + } + + if (ns->params.zoned) { + status = nvme_check_zone_read(ns, slba, nlb, &rfc); + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); + goto invalid; + } + } + + status = nvme_map_dptr(n, data_size, req); + if (status) { + goto invalid; + } + + if (ns->params.zoned) { + if (rfc.pre_rd_fill_nlb) { + fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); + fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); + nvme_fill_read_data(req, fill_ofs, fill_len, + n->params.fill_pattern); + } + if (!rfc.read_nlb) { + /* No backend I/O necessary, only needed to fill the buffer */ + req->status = NVME_SUCCESS; + return NVME_SUCCESS; + } + if (rfc.post_rd_fill_nlb) { + req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); + req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); + } else { + req->fill_len = 0; + } + slba = rfc.read_slba; + data_size = nvme_l2b(ns, rfc.read_nlb); + } + + data_offset = nvme_l2b(ns, slba); + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_READ); + if (req->qsg.sg) { + req->aiocb = dma_blk_read(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } else { + req->aiocb = blk_aio_preadv(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); + } + return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); + return status | NVME_DNR; +} + static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; @@ -1495,25 +1575,20 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; - uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t slba = le64_to_cpu(rw->slba); + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t data_size = nvme_l2b(ns, nlb); - uint64_t data_offset, fill_ofs; - + uint64_t data_offset; NvmeZone *zone; - uint32_t fill_len; - NvmeReadFillCtx rfc; - bool is_write = rw->opcode == NVME_CMD_WRITE || append; - enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; BlockBackend *blk = ns->blkconf.blk; uint16_t status; - trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode), - nvme_nsid(ns), nlb, data_size, slba); + trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), + nvme_nsid(ns), nlb, data_size, slba); status = nvme_check_mdts(n, data_size); if (status) { @@ -1530,29 +1605,21 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) if (ns->params.zoned) { zone = nvme_get_zone_by_slba(ns, slba); - if (is_write) { - status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); - if (status != NVME_SUCCESS) { - goto invalid; - } - - if (append) { - slba = zone->w_ptr; - } - - status = nvme_auto_open_zone(ns, zone); - if (status != NVME_SUCCESS) { - goto invalid; - } - - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } else { - status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); - if (status != NVME_SUCCESS) { - trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); - goto invalid; - } + status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); + if (status != NVME_SUCCESS) { + goto invalid; } + + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + + if (append) { + slba = zone->w_ptr; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } else if (append) { trace_pci_nvme_err_invalid_opc(rw->opcode); status = NVME_INVALID_OPCODE; @@ -1566,56 +1633,21 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) goto invalid; } - if (ns->params.zoned) { - if (is_write) { - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } else { - if (rfc.pre_rd_fill_nlb) { - fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); - fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); - nvme_fill_read_data(req, fill_ofs, fill_len, - n->params.fill_pattern); - } - if (!rfc.read_nlb) { - /* No backend I/O necessary, only needed to fill the buffer */ - req->status = NVME_SUCCESS; - return NVME_SUCCESS; - } - if (rfc.post_rd_fill_nlb) { - req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); - req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); - } else { - req->fill_len = 0; - } - slba = rfc.read_slba; - data_size = nvme_l2b(ns, rfc.read_nlb); - } - } - data_offset = nvme_l2b(ns, slba); - block_acct_start(blk_get_stats(blk), &req->acct, data_size, acct); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); if (req->qsg.sg) { - if (is_write) { - req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); - } else { - req->aiocb = dma_blk_read(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); - } + req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); } else { - if (is_write) { - req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); - } else { - req->aiocb = blk_aio_preadv(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); - } + req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); } return NVME_NO_COMPLETE; invalid: - block_acct_invalid(blk_get_stats(blk), acct); + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); return status | NVME_DNR; } @@ -2096,10 +2128,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_WRITE_ZEROES: return nvme_write_zeroes(n, req); case NVME_CMD_ZONE_APPEND: - return nvme_rw(n, req, true); + return nvme_write(n, req, true); case NVME_CMD_WRITE: + return nvme_write(n, req, false); case NVME_CMD_READ: - return nvme_rw(n, req, false); + return nvme_read(n, req); case NVME_CMD_ZONE_MGMT_SEND: return nvme_zone_mgmt_send(n, req); case NVME_CMD_ZONE_MGMT_RECV: diff --git a/hw/block/trace-events b/hw/block/trace-events index 962084e40c..7ee90a50c3 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -40,7 +40,8 @@ pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, pci_nvme_map_sgl(uint16_t cid, uint8_t typ, uint64_t len) "cid %"PRIu16" type 0x%"PRIx8" len %"PRIu64"" pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'" pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'" -pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" +pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" +pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" From patchwork Mon Oct 19 02:17:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11843511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36E511580 for ; Mon, 19 Oct 2020 02:26:00 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D5B5322268 for ; Mon, 19 Oct 2020 02:25:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="auwWx2XW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5B5322268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56742 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUKs2-0001FC-Q2 for patchwork-qemu-devel@patchwork.kernel.org; Sun, 18 Oct 2020 22:25:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56352) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkN-0000D5-Cw; Sun, 18 Oct 2020 22:18:03 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:44147) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUKkI-0004JR-7i; Sun, 18 Oct 2020 22:18:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1603073878; x=1634609878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fuuX0wK/clVu0dfU7G8Ccc31uXLe2PXHzQgnXocAqvc=; b=auwWx2XW/0wdLHPz74djJ1C5DzCTZSi3tUBQrfAgJIU2joM2QMRFwE4a AwOhdeV+NcfvwYuiWtK66PewsTDkLn7ykPVvJCz+7lhEbFVnQPBCLEtzD 6ssOg6qGtmVx76jKNbh12lCg1zZ+kHOhcCCP0ThEnMyX2f9wAjHY/qix1 v0Qm78kxOnS5F+GEe/dOCBG+tCZC0BCuVVi7wIuhQYJ4PZKH2tfZyuf64 ryFulz8dWj0izc4fuWwDtjOPjrogCcrMew+dDeRs+M0PybyNNDMDPmOqx H2Ui+4Yoq+jjWghS31UTBvOO+jvMsnWuqVBtbT0e1WDLsJ9iJIEI9oUSu w==; IronPort-SDR: rNIabVQcA/2g0lN5ineLvbPoIJI8Y86Bcblpj8W586GfG38Xv5bS8IEY6cGFNMQ/8fgt2WT+B1 IvThZSoizW5miD2Ouhrb0Z1hK+X0fcA2YMuXnGNrYauQkcWH5MK+nXYwThdp/AzoGAmkoGtfE4 ZO7UhxTN5EU4fO+NAjkPvPrI9/SutAd7N7zyy+3dPnP2pbgkadhArGmf9fU/51As3Wu5mkWqlk txJn132O50XA2JSYXenEYD29FJJYU1wDVsC5PLRAogx7zFUJEQMRNSuBVji2lfnocPIYhgf025 ruI= X-IronPort-AV: E=Sophos;i="5.77,393,1596470400"; d="scan'208";a="150207989" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 19 Oct 2020 10:17:56 +0800 IronPort-SDR: B+0C1gidDlU6cOvx4JD/Yg9rc+Al8q+XO7Ty+k8NmRyWk848gs9RGYiTzWYDNGJRbfRSSG/QCq iBB0CKl1zz4nmuFTTAc1WUEEgJh2imfeWQs2/OEHvCQJJr5qD8w6zEPlJWBB2PiGObr5mAgx5h GRACu7v2wku0ZHJkKNPTRO2HmZszfUMl1SGwgG7D0FRUEh0Ozn1PH4cjPiXCeU+4Se1qdUytJJ iCUn59uSm5e6fY/J73OR/PJ9PJFMoSEnJTSNs5g2DQY15ftdCombA5byndsr1kHyfZm8MlaudI CewLBFAtxDRPbqem+nwu7LIB Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2020 19:03:33 -0700 IronPort-SDR: 5R869P/8XumV35KmKMYAuaAvKoD4zmNMbekQXwW4N3g9xzFXKJ5eTlS+mRtZ2YA/twOKvdUDBH GH97WweMYH7hY2IBS46FrV2f9Rfn4/E59eRhK9qP1TLFoIDUh6/HN4wXJt2T3RPqs8GTr9UCfJ yxasPjEKMJIpwUCPm4wl2zLfzrVwVWXq/6zCJtyio3Zyp8aUEq9ToFeYQcsdwAmTn+f9DADa0A v3DICXTEyECdCmbBDXy5pwjtYaXSIeI97jos0CU3tqMgKT8r+XwCGhl6xk6QKK7UFryx0KYqOH RwI= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 18 Oct 2020 19:17:54 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v7 11/11] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write() Date: Mon, 19 Oct 2020 11:17:26 +0900 Message-Id: <20201019021726.12048-12-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201019021726.12048-1-dmitry.fomichev@wdc.com> References: <20201019021726.12048-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.42; envelope-from=prvs=5541069a6=dmitry.fomichev@wdc.com; helo=esa4.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/18 22:17:33 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 95 +++++++++++++------------------------------ hw/block/trace-events | 1 - 2 files changed, 28 insertions(+), 68 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5ec4ce5e28..aa929d1edf 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1529,53 +1529,7 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeNamespace *ns = req->ns; - uint64_t slba = le64_to_cpu(rw->slba); - uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; - NvmeZone *zone; - uint64_t offset = nvme_l2b(ns, slba); - uint32_t count = nvme_l2b(ns, nlb); - BlockBackend *blk = ns->blkconf.blk; - uint16_t status; - - trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); - - status = nvme_check_bounds(n, ns, slba, nlb); - if (status) { - trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); - return status; - } - - if (ns->params.zoned) { - zone = nvme_get_zone_by_slba(ns, slba); - - status = nvme_check_zone_write(n, ns, zone, slba, nlb, false); - if (status != NVME_SUCCESS) { - goto invalid; - } - - status = nvme_auto_open_zone(ns, zone); - if (status != NVME_SUCCESS) { - goto invalid; - } - - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } - - block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); - req->aiocb = blk_aio_pwrite_zeroes(blk, offset, count, - BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); - return NVME_NO_COMPLETE; - -invalid: - block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); - return status | NVME_DNR; -} - -static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append, bool wrz) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; @@ -1590,10 +1544,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), nvme_nsid(ns), nlb, data_size, slba); - status = nvme_check_mdts(n, data_size); - if (status) { - trace_pci_nvme_err_mdts(nvme_cid(req), data_size); - goto invalid; + if (!wrz) { + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + goto invalid; + } } status = nvme_check_bounds(n, ns, slba, nlb); @@ -1628,21 +1584,26 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) data_offset = nvme_l2b(ns, slba); - status = nvme_map_dptr(n, data_size, req); - if (status) { - goto invalid; - } + if (!wrz) { + status = nvme_map_dptr(n, data_size, req); + if (status) { + goto invalid; + } - data_offset = nvme_l2b(ns, slba); - - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - if (req->qsg.sg) { - req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + if (req->qsg.sg) { + req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } else { + req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); + } } else { - req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); + block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, + BDRV_REQ_MAY_UNMAP, nvme_rw_cb, + req); } return NVME_NO_COMPLETE; @@ -2126,11 +2087,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_FLUSH: return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: - return nvme_write_zeroes(n, req); + return nvme_write(n, req, false, true); case NVME_CMD_ZONE_APPEND: - return nvme_write(n, req, true); + return nvme_write(n, req, true, false); case NVME_CMD_WRITE: - return nvme_write(n, req, false); + return nvme_write(n, req, false, false); case NVME_CMD_READ: return nvme_read(n, req); case NVME_CMD_ZONE_MGMT_SEND: diff --git a/hw/block/trace-events b/hw/block/trace-events index 7ee90a50c3..5a3cd4c5dc 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -43,7 +43,6 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opna pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" -pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""