From patchwork Tue Oct 13 21:42:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0EF7921 for ; Tue, 13 Oct 2020 21:44:16 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 679722080A for ; Tue, 13 Oct 2020 21:44:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="RfaH+b0n" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 679722080A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:55084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS5f-0007Wh-D9 for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:44:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47062) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3w-0005WG-Tk; Tue, 13 Oct 2020 17:42:28 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45635) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3u-0001mO-4a; Tue, 13 Oct 2020 17:42:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625346; x=1634161346; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=arJwAnMe+oHKbPPXbn2bBhhukcRGt46N6Mp5iO7GAYU=; b=RfaH+b0niZdJGZC2r1KN88B6CTaJnP/allsBNWsf723CutTi2zzeegVX P5F4Dbz7zA8iW8j2ssAi19ffAN2fhpS6Xi61U69+NFLZZ1jYn9P3bU7Su fKe9ZT1bnmtF3KC/s/3MfEnNlQZ60th1iMzGln5vtaT3URh5FIMG1xslg Udl+h6usYNPcr/0XlkLuemhn2n+PKetNh5kr1IjOv0q0/sN/YEoIsu/7e xh/Ju5nUsr3hVz4aDGiWLMAKAnDqJ+Gq0OH3GxhBjs8k0q4ssw3X9d7Pp nFjr253CykMjMOEktCaAF20uRqGAIa4TaFtrY8Pt1FSunAaYo9KHOaj7d A==; IronPort-SDR: l9uN31mqx9CGUmOyXV/Q8i3y0g54r5r2b8DwDsa3a6SAEZflLs0B7YiWXTwsKSKXR7os6WJqoX nvaJhVKvvenlsL/E9+Eh9k2OKdjljdZ8BqnNKZfm2N4aOY5V2O0K6kj71yBeBdchg+5cL6oEn6 /ou+CqBWQrgGpQYz+d/SG7PuApPC0Sa2ciUL0j0vHfvS7iwBjjIM4xpGDa6/21a9yQjJh4Pgyv bJu92JrdVCkil7zflbDr+u5u7fgIwhFpGt/zKs0IUntN6lvbUlPKZh+rcZQiuVmpl/s+Q2UV91 E5w= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185921" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:21 +0800 IronPort-SDR: ncfT4/Qfohjm/KGNDbqa83POmjj4KNYYSmbxyLUo6/wlqXfJHTpeSrMX0KwOGZXtvKl1wdA6vE gs5jv9WgcCSR/b9jVz7mqv2BpWPhv5wjE2WiFMiDAFc2NEmhLiGQvF55fG7VK1f8SuHVqNMQHa AIm9trH4ZckJj36qM4okrMovEVvOt0htP8oilMj5sEyFFMDI4dAfoP0x8aJz3Me3+3axNnRjxV xWFQtKVfhtEtKbOBVhQtwZJqDIGZ4DncgLdWXFd0mbl3wffKyMTDCj4BO/WCpS/DuPAajqcBvO gPFbJdKtld8dPB4WUPEOZGFL Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:28:55 -0700 IronPort-SDR: vA0T1cos6PXGBIfNjg4aSFRmZ+lK96FQyeWjSLtlgDZHMSeXj7o4JJk29THHVnEU0Bd2WPRolS pWafd6oKG4bQPTBxMkfyJoPHCNKSml1I0Ew3QpICQWrgedyHpc/VqkZjJk6nqV51KZ/pK8RdiV 7HOn/xsH2aZRSAYH1K/0ceSuV3nH+526ibCdAi2GJ0VUdgklStLtFiYA0zYgQt/08UzA1kvkpV lcYMu3O6Ym1OdYNZIRsI29TX+IXJDalIgDgb9akIcKIP1L98QJRn0rNE7IHBcwljhfuKvNbp3z nTY= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:19 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 01/11] hw/block/nvme: Add Commands Supported and Effects log Date: Wed, 14 Oct 2020 06:42:02 +0900 Message-Id: <20201013214212.2152-2-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 69 +++++++++++++++++++++++++++++++++++++++---- hw/block/trace-events | 2 ++ include/block/nvme.h | 19 ++++++++++++ 4 files changed, 86 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 83734f4606..ea8c2f785d 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -29,6 +29,7 @@ typedef struct NvmeNamespace { int32_t bootindex; int64_t size; NvmeIdNs id_ns; + const uint32_t *iocs; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 9d30ca69dc..ee0eff98da 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -111,6 +111,25 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = { [NVME_TIMESTAMP] = NVME_FEAT_CAP_CHANGE, }; +static const uint32_t nvme_cse_acs[256] = { + [NVME_ADM_CMD_DELETE_SQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_CREATE_SQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_DELETE_CQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_CREATE_CQ] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_IDENTIFY] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, +}; + +static const uint32_t nvme_cse_iocs_nvm[256] = { + [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -1045,6 +1064,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } + if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { + trace_pci_nvme_err_invalid_opc(req->cmd.opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + switch (req->cmd.opcode) { case NVME_CMD_FLUSH: return nvme_flush(n, req); @@ -1054,8 +1078,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_READ: return nvme_rw(n, req); default: - trace_pci_nvme_err_invalid_opc(req->cmd.opcode); - return NVME_INVALID_OPCODE | NVME_DNR; + assert(false); } } @@ -1291,6 +1314,35 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, + uint64_t off, NvmeRequest *req) +{ + NvmeEffectsLog log = {}; + uint32_t *dst_acs = log.acs, *dst_iocs = log.iocs; + uint32_t trans_len; + int i; + + trace_pci_nvme_cmd_supp_and_effects_log_read(); + + if (off >= sizeof(log)) { + trace_pci_nvme_err_invalid_effects_log_offset(off); + return NVME_INVALID_FIELD | NVME_DNR; + } + + for (i = 0; i < 256; i++) { + dst_acs[i] = nvme_cse_acs[i]; + } + + for (i = 0; i < 256; i++) { + dst_iocs[i] = nvme_cse_iocs_nvm[i]; + } + + trans_len = MIN(sizeof(log) - off, buf_len); + + return nvme_dma(n, ((uint8_t *)&log) + off, trans_len, + DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = &req->cmd; @@ -1334,6 +1386,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) return nvme_smart_info(n, rae, len, off, req); case NVME_LOG_FW_SLOT_INFO: return nvme_fw_log_info(n, len, off, req); + case NVME_LOG_CMD_EFFECTS: + return nvme_cmd_effects(n, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -1920,6 +1974,11 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode, nvme_adm_opc_str(req->cmd.opcode)); + if (!(nvme_cse_acs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { + trace_pci_nvme_err_invalid_admin_opc(req->cmd.opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + switch (req->cmd.opcode) { case NVME_ADM_CMD_DELETE_SQ: return nvme_del_sq(n, req); @@ -1942,8 +2001,7 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); default: - trace_pci_nvme_err_invalid_admin_opc(req->cmd.opcode); - return NVME_INVALID_OPCODE | NVME_DNR; + assert(false); } } @@ -2596,6 +2654,7 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) trace_pci_nvme_register_namespace(nsid); n->namespaces[nsid - 1] = ns; + ns->iocs = nvme_cse_iocs_nvm; return 0; } @@ -2737,7 +2796,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->acl = 3; id->aerl = n->params.aerl; id->frmw = (NVME_NUM_FW_SLOTS << 1) | NVME_FRMW_SLOT1_RO; - id->lpa = NVME_LPA_NS_SMART | NVME_LPA_EXTENDED; + id->lpa = NVME_LPA_NS_SMART | NVME_LPA_CSE | NVME_LPA_EXTENDED; /* recommended default value (~70 C) */ id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING); diff --git a/hw/block/trace-events b/hw/block/trace-events index fac5995d94..0ae9cb0d35 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -85,6 +85,7 @@ pci_nvme_mmio_start_success(void) "setting controller enable bit succeeded" pci_nvme_mmio_stopped(void) "cleared controller enable bit" pci_nvme_mmio_shutdown_set(void) "shutdown bit set" pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared" +pci_nvme_cmd_supp_and_effects_log_read(void) "commands supported and effects log read" # nvme traces for error conditions pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu" @@ -104,6 +105,7 @@ pci_nvme_err_invalid_prp(void) "invalid PRP" pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" +pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" pci_nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 6de2d5aa75..4779495b7d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -744,10 +744,27 @@ enum NvmeSmartWarn { NVME_SMART_FAILED_VOLATILE_MEDIA = 1 << 4, }; +typedef struct NvmeEffectsLog { + uint32_t acs[256]; + uint32_t iocs[256]; + uint8_t resv[2048]; +} NvmeEffectsLog; + +enum { + NVME_CMD_EFF_CSUPP = 1 << 0, + NVME_CMD_EFF_LBCC = 1 << 1, + NVME_CMD_EFF_NCC = 1 << 2, + NVME_CMD_EFF_NIC = 1 << 3, + NVME_CMD_EFF_CCC = 1 << 4, + NVME_CMD_EFF_CSE_MASK = 3 << 16, + NVME_CMD_EFF_UUID_SEL = 1 << 19, +}; + enum NvmeLogIdentifier { NVME_LOG_ERROR_INFO = 0x01, NVME_LOG_SMART_INFO = 0x02, NVME_LOG_FW_SLOT_INFO = 0x03, + NVME_LOG_CMD_EFFECTS = 0x05, }; typedef struct QEMU_PACKED NvmePSD { @@ -860,6 +877,7 @@ enum NvmeIdCtrlFrmw { enum NvmeIdCtrlLpa { NVME_LPA_NS_SMART = 1 << 0, + NVME_LPA_CSE = 1 << 1, NVME_LPA_EXTENDED = 1 << 2, }; @@ -1059,6 +1077,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); From patchwork Tue Oct 13 21:42:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836121 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57789139F for ; Tue, 13 Oct 2020 21:47:02 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E387220848 for ; Tue, 13 Oct 2020 21:47:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="SCP0lha7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E387220848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:35872 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS8K-00033R-VX for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:47:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47088) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3y-0005X7-9P; Tue, 13 Oct 2020 17:42:30 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45639) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3v-0001mW-BN; Tue, 13 Oct 2020 17:42:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625347; x=1634161347; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1HggLFcC+uXBTV/4bp7bcM6T4bLVE6zSM5nuTBrcaa8=; b=SCP0lha7OAaVf3uf0M+l7qBblIDr8Mg7n5NqwXBitZdrsGySBytb43V7 2kg01BZX8pwI1p44LS7XvHF+bVVO/vdUXK6w/8SZ8tMbB4nAjUnUVRmAn Er1uUEdnjNBE1ztMd7Y0GeVJdI5roRa9i+gZ8zVa1nauzNeAeCLSCGOB9 fza47TgcAvWbrEewu0mjf54i/WPVIuwYxQJSlSBmFxavbFn6Hpmbir29w KBLymPi55sS3lFDDcXax/NaHiTm2tzedq8+xLnj6j4ChQjdTy4MLdJqCU HTd4AeKp1MBDkln8f04FXMqtbrQ8yHVRaZo+iGJcRq0KqLZO21SKuAYKS g==; IronPort-SDR: sfOsuv29Dpw3O1T3xyDdl3PnEPZNQioRlWZ3LjGr/7bPYGV75JTObNyTvUnMonXPkWLlo/vQj7 oJPOHPkiMnC/3tBLSpD3bRb8l73ZujJs0LTlx6YTH0FafHsJfLyQQ9o4HbuV9Xra+masklHM98 efgX8Ec87zMHwomvqfmza+F3F/hvgKiIxIUokdMq9kojHZiTZzq9t+AzL9NwfP4XPzXbjjTJnK fyCxbZGzFgUfB4PIzOn+TVmR0MyF59YOh9OZ4w7BdD2K4hp4BapTrpWdtpF+78CI6gmMAHLH/o EkI= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185924" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:23 +0800 IronPort-SDR: kDqHx6EC0SKaRJfuEhc2r0opICq5sQum/9tyegJq7sJP6V1j7WxFlmhAX8sq8Z+YaGGfga+ZSe LNQFm/MO5RbeA+c8b3E8fNup3pQtLuofRa6qzPgeXJ13rjnvdTOqGvG0HWPoK9Q1rrGejhFi9Y BBaU+wSYKOmeXT83xh7dFNgW6HeByKUqrBK0nLurxd9BDy+ZvtU0WBtyyeinYjoF0Vcqknqjac smKJoGcOf41y6aYRW/PzOlGtG3QlIufUPmTiFBcrSOToNTW6BD9gqW2M1OxeXGB+Gw/Bqw/1nZ 5r4rgTyb9LsNP3vAetDpWMaF Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:28:57 -0700 IronPort-SDR: 5uISiDgT/ilw94GAzgQB8OvORvKVj4Aj+BN0+EixZ5WSqaK2LmWwGbyBgr7qGhyDqrCWvPKQBV f9p2j/8oSolXteKBRlvkmJRHvrR1zZuyvq8WDh2mpYYKfuieYa/mLtqMNFB15uqoJUGDjilo7I ax3vmkYB4fNd7hJAuDpb5JFyO+2k9MqNM1dhZofITRGie3ZmFs9m9SoECLrVpR4pAROIKT7A/i SJUHVMj4pST1OrkVyv0YtKWvqemWnI+hPMp8HI0fzGH3jCAJeN6OIW/msuQxxXBybITHmjVD8/ VuA= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:21 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 02/11] hw/block/nvme: Generate namespace UUIDs Date: Wed, 14 Oct 2020 06:42:03 +0900 Message-Id: <20201013214212.2152-3-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" In NVMe 1.4, a namespace must report an ID descriptor of UUID type if it doesn't support EUI64 or NGUID. Add a new namespace property, "uuid", that provides the user the option to either specify the UUID explicitly or have a UUID generated automatically every time a namespace is initialized. Suggested-by: Klaus Jansen Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 9 +++++---- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index b69cdaf27e..de735eb9f3 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), + DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index ea8c2f785d..a38071884a 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + QemuUUID uuid; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme.c b/hw/block/nvme.c index ee0eff98da..89d91926d9 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1571,6 +1571,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; uint32_t nsid = le32_to_cpu(c->nsid); uint8_t list[NVME_IDENTIFY_DATA_SIZE]; @@ -1590,7 +1591,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - if (unlikely(!nvme_ns(n, nsid))) { + ns = nvme_ns(n, nsid); + if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1599,12 +1601,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) /* * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data * structure, a Namespace UUID (nidt = 0x3) must be reported in the - * Namespace Identification Descriptor. Add a very basic Namespace UUID - * here. + * Namespace Identification Descriptor. Add the namespace UUID here. */ ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID; ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; - stl_be_p(&ns_descrs->uuid.v, nsid); + memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN); return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, DMA_DIRECTION_FROM_DEVICE, req); From patchwork Tue Oct 13 21:42:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836117 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73FAD61C for ; Tue, 13 Oct 2020 21:44:23 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11AA620848 for ; Tue, 13 Oct 2020 21:44:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="BbfZ9UIE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11AA620848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:55620 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS5l-0007lY-Sq for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:44:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47116) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS40-0005bC-Ej; Tue, 13 Oct 2020 17:42:32 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45635) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3x-0001mO-Gf; Tue, 13 Oct 2020 17:42:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625350; x=1634161350; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8+gmpzYQVTnMqTfQ59StdgrL9cmDcc0NrxclPpZy/Kg=; b=BbfZ9UIEsmO+IGEoUmmFELn74YV1guqlv3XtqFaFRslI37ccGw6s7XlK 25CwVn+58iAHm+mLj8YUkAZ+rEayqqsOv54XGJx57ppPmqPYhZhkrIQD5 oS2OvdQ4eZsSgFBLzueb8VrVPRDOGH8LfwqSMWHdS12ojaCt9TwvzpE/d 5SFVHaS3oaYb+olSAinm+dvBEuCKocZY0tmx97K5flmbyaCl+kKheYayT 4+wVH6ee8Mz/WOycYHXJSzpDW/h2vLij8JQ8iml3mp70DpSoCSY+HzRvl l7BOAcyLu2zKUYGWo1vQ+4hdkYJ0AVmKOyOGk13RlnQkvnGm97d9ljEmy g==; IronPort-SDR: AUzNr+bKHpQCfLOQm8vyksRGwyqOmyJWQ3B0eVPm1HFiFas9p9lKchL4a/Xf5CgGE93ajSV44x QVQzjN2iYI3THt2FdY9f24aWJ+QoHhBVmN6jUFQ6r63WCxVJrlMan7SZo6pPgEI4h5HnWb9jo+ Ad1ARANUh7oeMGN4kv6BWndip2xWf/c4T93Adb9lp3s8i3aiCITbplAwNrjG8W0AW80SJNZNX8 1Slqcx9P8MissVU29M06/S8WGO6cZCpo4bSdhz+1sgPFXzB67fftXQ7DyytjGcMJntFWHB76AG NSI= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185927" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:26 +0800 IronPort-SDR: WfzsX3AND936B5Xn/zuH1YKFLauMRR/4lefwUxWD92Woi53K4ldwbTNB3FNsBkts2j98+u+s1n tkRlEc1ud/YS5Z4ZBs5+QCq8FmW34Dy12BjwlujlYlBCWNWKciY9JdZQw3mkPPXcG25MRhbOER CmmDgnyTgo9ID7rcAb8KThSpru+M44ykMpQewB/dmS9LR9fwD2kb0N6qV8YRUFs1KlDuItlku+ GVonn1cB4FfjJgr2SVeGXRlPDLoB4jHrJp6zTZ9JgefPP/XgtgGmZ3sEtwSPt7ssgCedv+3DyD SNfk3pvZPS13fugZ0zgkM0t8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:28:59 -0700 IronPort-SDR: s2e3Nj96SQlj7ggbuT5lL9rg/PamhI5jUcl/+BJ8m59OtrFlKdpYJ3TqDyDmS2bRg/grLu2fjP v1cqD1m+bGgfT0mAuKLFg723/lD3UxhIx6jbu0FIAbETg2ivaPinCmozJ0gkYEPRtyQghJbjlc d+bzfkJ95loIHpAtOFAG0ZeaarN83ZFwGLJuIatUkjxFkJlPB7xdESRiLnzURg8I269CpaA8Y4 lgnVp8ah0GhNDCLg0CaEv+PJIfoCVdcm5yPWUIWf+Xx2g46UUuncEU0NsvJgFebEs5ws1//gpN MjI= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:23 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 03/11] hw/block/nvme: Add support for Namespace Types Date: Wed, 14 Oct 2020 06:42:04 +0900 Message-Id: <20201013214212.2152-4-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" From: Niklas Cassel Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 2 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 198 +++++++++++++++++++++++++++++++++++------- hw/block/trace-events | 7 ++ include/block/nvme.h | 65 +++++++++++--- 5 files changed, 226 insertions(+), 47 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index de735eb9f3..c0362426cc 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -41,6 +41,8 @@ static void nvme_ns_init(NvmeNamespace *ns) id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); + ns->csi = NVME_CSI_NVM; + /* no thin provisioning */ id_ns->ncap = id_ns->nsze; id_ns->nuse = id_ns->ncap; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index a38071884a..d795e44bab 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,7 @@ typedef struct NvmeNamespace { int64_t size; NvmeIdNs id_ns; const uint32_t *iocs; + uint8_t csi; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 89d91926d9..af46448234 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -123,6 +123,9 @@ static const uint32_t nvme_cse_acs[256] = { [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, }; +static const uint32_t nvme_cse_iocs_none[256] = { +}; + static const uint32_t nvme_cse_iocs_nvm[256] = { [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, @@ -1051,10 +1054,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req), req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode)); - if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) { - return NVME_INVALID_OPCODE | NVME_DNR; - } - if (!nvme_nsid_valid(n, nsid)) { return NVME_INVALID_NSID | NVME_DNR; } @@ -1333,8 +1332,10 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, dst_acs[i] = nvme_cse_acs[i]; } - for (i = 0; i < 256; i++) { - dst_iocs[i] = nvme_cse_iocs_nvm[i]; + if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { + for (i = 0; i < 256; i++) { + dst_iocs[i] = nvme_cse_iocs_nvm[i]; + } } trans_len = MIN(sizeof(log) - off, buf_len); @@ -1500,6 +1501,13 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) return NVME_SUCCESS; } +static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req) +{ + uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {}; + + return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_identify_ctrl(); @@ -1508,11 +1516,23 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + + trace_pci_nvme_identify_ctrl_csi(c->csi); + + if (c->csi == NVME_CSI_NVM) { + return nvme_rpt_empty_id_struct(n, req); + } + + return NVME_INVALID_FIELD | NVME_DNR; +} + static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - NvmeIdNs *id_ns, inactive = { 0 }; uint32_t nsid = le32_to_cpu(c->nsid); trace_pci_nvme_identify_ns(nsid); @@ -1523,23 +1543,46 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) ns = nvme_ns(n, nsid); if (unlikely(!ns)) { - id_ns = &inactive; - } else { - id_ns = &ns->id_ns; + return nvme_rpt_empty_id_struct(n, req); } - return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs), + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeNamespace *ns; + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + uint32_t nsid = le32_to_cpu(c->nsid); + + trace_pci_nvme_identify_ns_csi(nsid, c->csi); + + if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) { + return NVME_INVALID_NSID | NVME_DNR; + } + + ns = nvme_ns(n, nsid); + if (unlikely(!ns)) { + return nvme_rpt_empty_id_struct(n, req); + } + + if (c->csi == NVME_CSI_NVM) { + return nvme_rpt_empty_id_struct(n, req); + } + + return NVME_INVALID_FIELD | NVME_DNR; +} + static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - static const int data_len = NVME_IDENTIFY_DATA_SIZE; uint32_t min_nsid = le32_to_cpu(c->nsid); - uint32_t *list; - uint16_t ret; - int j = 0; + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + uint32_t *list_ptr = (uint32_t *)list; + int i, j = 0; trace_pci_nvme_identify_nslist(min_nsid); @@ -1553,20 +1596,54 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - list = g_malloc0(data_len); - for (int i = 1; i <= n->num_namespaces; i++) { - if (i <= min_nsid || !nvme_ns(n, i)) { + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { continue; } - list[j++] = cpu_to_le32(i); + if (ns->params.nsid < min_nsid) { + continue; + } + list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; } } - ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE, - req); - g_free(list); - return ret; + + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); +} + +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeNamespace *ns; + NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + uint32_t min_nsid = le32_to_cpu(c->nsid); + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + uint32_t *list_ptr = (uint32_t *)list; + int i, j = 0; + + trace_pci_nvme_identify_nslist_csi(min_nsid, c->csi); + + if (c->csi != NVME_CSI_NVM) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + if (ns->params.nsid < min_nsid) { + continue; + } + list_ptr[j++] = cpu_to_le32(ns->params.nsid); + if (j == data_len / sizeof(uint32_t)) { + break; + } + } + + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) @@ -1574,13 +1651,17 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; uint32_t nsid = le32_to_cpu(c->nsid); - uint8_t list[NVME_IDENTIFY_DATA_SIZE]; + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; struct data { struct { NvmeIdNsDescr hdr; - uint8_t v[16]; + uint8_t v[NVME_NIDL_UUID]; } uuid; + struct { + NvmeIdNsDescr hdr; + uint8_t v; + } csi; }; struct data *ns_descrs = (struct data *)list; @@ -1596,19 +1677,31 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - memset(list, 0x0, sizeof(list)); - /* * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data * structure, a Namespace UUID (nidt = 0x3) must be reported in the * Namespace Identification Descriptor. Add the namespace UUID here. */ ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID; - ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; - memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN); + ns_descrs->uuid.hdr.nidl = NVME_NIDL_UUID; + memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDL_UUID); - return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, - DMA_DIRECTION_FROM_DEVICE, req); + ns_descrs->csi.hdr.nidt = NVME_NIDT_CSI; + ns_descrs->csi.hdr.nidl = NVME_NIDL_CSI; + ns_descrs->csi.v = ns->csi; + + return nvme_dma(n, list, sizeof(list), DMA_DIRECTION_FROM_DEVICE, req); +} + +static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req) +{ + uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {}; + static const int data_len = sizeof(list); + + trace_pci_nvme_identify_cmd_set(); + + NVME_SET_CSI(*list, NVME_CSI_NVM); + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) @@ -1618,12 +1711,20 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) switch (le32_to_cpu(c->cns)) { case NVME_ID_CNS_NS: return nvme_identify_ns(n, req); + case NVME_ID_CNS_CS_NS: + return nvme_identify_ns_csi(n, req); case NVME_ID_CNS_CTRL: return nvme_identify_ctrl(n, req); + case NVME_ID_CNS_CS_CTRL: + return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: return nvme_identify_nslist(n, req); + case NVME_ID_CNS_CS_NS_ACTIVE_LIST: + return nvme_identify_nslist_csi(n, req); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); + case NVME_ID_CNS_IO_COMMAND_SET: + return nvme_identify_cmd_set(n, req); default: trace_pci_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns)); return NVME_INVALID_FIELD | NVME_DNR; @@ -1804,7 +1905,9 @@ defaults: if (iv == n->admin_cq.vector) { result |= NVME_INTVC_NOCOALESCING; } - + break; + case NVME_COMMAND_SET_PROFILE: + result = 0; break; default: result = nvme_feature_default[fid]; @@ -1945,6 +2048,12 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) break; case NVME_TIMESTAMP: return nvme_set_feature_timestamp(n, req); + case NVME_COMMAND_SET_PROFILE: + if (dw11 & 0x1ff) { + trace_pci_nvme_err_invalid_iocsci(dw11 & 0x1ff); + return NVME_CMD_SET_CMB_REJECTED | NVME_DNR; + } + break; default: return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR; } @@ -2090,6 +2199,27 @@ static void nvme_clear_ctrl(NvmeCtrl *n) n->bar.cc = 0; } +static void nvme_select_ns_iocs(NvmeCtrl *n) +{ + NvmeNamespace *ns; + int i; + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + ns->iocs = nvme_cse_iocs_none; + switch (ns->csi) { + case NVME_CSI_NVM: + if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { + ns->iocs = nvme_cse_iocs_nvm; + } + break; + } + } +} + static int nvme_start_ctrl(NvmeCtrl *n) { uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12; @@ -2188,6 +2318,8 @@ static int nvme_start_ctrl(NvmeCtrl *n) QTAILQ_INIT(&n->aer_queue); + nvme_select_ns_iocs(n); + return 0; } @@ -2655,7 +2787,6 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) trace_pci_nvme_register_namespace(nsid); n->namespaces[nsid - 1] = ns; - ns->iocs = nvme_cse_iocs_nvm; return 0; } @@ -2826,6 +2957,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM); + NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_CSI_SUPP); NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_ADMIN_ONLY); NVME_CAP_SET_MPSMAX(n->bar.cap, 4); diff --git a/hw/block/trace-events b/hw/block/trace-events index 0ae9cb0d35..65b964c894 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -48,8 +48,12 @@ pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16"" pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16"" pci_nvme_identify_ctrl(void) "identify controller" +pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8"" pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", csi=0x%"PRIx8"" pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", csi=0x%"PRIx8"" +pci_nvme_identify_cmd_set(void) "identify i/o command set" pci_nvme_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32"" pci_nvme_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64"" pci_nvme_getfeat(uint16_t cid, uint32_t nsid, uint8_t fid, uint8_t sel, uint32_t cdw11) "cid %"PRIu16" nsid 0x%"PRIx32" fid 0x%"PRIx8" sel 0x%"PRIx8" cdw11 0x%"PRIx32"" @@ -106,6 +110,8 @@ pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" +pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" +pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" pci_nvme_err_invalid_create_sq_sqid(uint16_t sqid) "failed creating submission queue, invalid sqid=%"PRIu16"" @@ -162,6 +168,7 @@ pci_nvme_ub_db_wr_invalid_cq(uint32_t qid) "completion queue doorbell write for pci_nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t new_head) "completion queue doorbell write value beyond queue size, cqid=%"PRIu32", new_head=%"PRIu16", ignoring" pci_nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write for nonexistent queue, sqid=%"PRIu32", ignoring" pci_nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission queue doorbell write value beyond queue size, sqid=%"PRIu32", new_head=%"PRIu16", ignoring" +pci_nvme_ub_unknown_css_value(void) "unknown value in cc.css field" # xen-block.c xen_block_realize(const char *type, uint32_t disk, uint32_t partition) "%s d%up%u" diff --git a/include/block/nvme.h b/include/block/nvme.h index 4779495b7d..f5ac9143c4 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -84,6 +84,7 @@ enum NvmeCapMask { enum NvmeCapCss { NVME_CAP_CSS_NVM = 1 << 0, + NVME_CAP_CSS_CSI_SUPP = 1 << 6, NVME_CAP_CSS_ADMIN_ONLY = 1 << 7, }; @@ -117,9 +118,25 @@ enum NvmeCcMask { enum NvmeCcCss { NVME_CC_CSS_NVM = 0x0, + NVME_CC_CSS_CSI = 0x6, NVME_CC_CSS_ADMIN_ONLY = 0x7, }; +#define NVME_SET_CC_EN(cc, val) \ + (cc |= (uint32_t)((val) & CC_EN_MASK) << CC_EN_SHIFT) +#define NVME_SET_CC_CSS(cc, val) \ + (cc |= (uint32_t)((val) & CC_CSS_MASK) << CC_CSS_SHIFT) +#define NVME_SET_CC_MPS(cc, val) \ + (cc |= (uint32_t)((val) & CC_MPS_MASK) << CC_MPS_SHIFT) +#define NVME_SET_CC_AMS(cc, val) \ + (cc |= (uint32_t)((val) & CC_AMS_MASK) << CC_AMS_SHIFT) +#define NVME_SET_CC_SHN(cc, val) \ + (cc |= (uint32_t)((val) & CC_SHN_MASK) << CC_SHN_SHIFT) +#define NVME_SET_CC_IOSQES(cc, val) \ + (cc |= (uint32_t)((val) & CC_IOSQES_MASK) << CC_IOSQES_SHIFT) +#define NVME_SET_CC_IOCQES(cc, val) \ + (cc |= (uint32_t)((val) & CC_IOCQES_MASK) << CC_IOCQES_SHIFT) + enum NvmeCstsShift { CSTS_RDY_SHIFT = 0, CSTS_CFS_SHIFT = 1, @@ -534,8 +551,13 @@ typedef struct QEMU_PACKED NvmeIdentify { uint64_t rsvd2[2]; uint64_t prp1; uint64_t prp2; - uint32_t cns; - uint32_t rsvd11[5]; + uint8_t cns; + uint8_t rsvd10; + uint16_t ctrlid; + uint16_t nvmsetid; + uint8_t rsvd11; + uint8_t csi; + uint32_t rsvd12[4]; } NvmeIdentify; typedef struct QEMU_PACKED NvmeRwCmd { @@ -655,6 +677,7 @@ enum NvmeStatusCodes { NVME_MD_SGL_LEN_INVALID = 0x0010, NVME_SGL_DESCR_TYPE_INVALID = 0x0011, NVME_INVALID_USE_OF_CMB = 0x0012, + NVME_CMD_SET_CMB_REJECTED = 0x002b, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -781,11 +804,15 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 -enum { - NVME_ID_CNS_NS = 0x0, - NVME_ID_CNS_CTRL = 0x1, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x2, - NVME_ID_CNS_NS_DESCR_LIST = 0x3, +enum NvmeIdCns { + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_CS_NS = 0x05, + NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, + NVME_ID_CNS_IO_COMMAND_SET = 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { @@ -933,6 +960,7 @@ enum NvmeFeatureIds { NVME_WRITE_ATOMICITY = 0xa, NVME_ASYNCHRONOUS_EVENT_CONF = 0xb, NVME_TIMESTAMP = 0xe, + NVME_COMMAND_SET_PROFILE = 0x19, NVME_SOFTWARE_PROGRESS_MARKER = 0x80, NVME_FID_MAX = 0x100, }; @@ -1017,18 +1045,26 @@ typedef struct QEMU_PACKED NvmeIdNsDescr { uint8_t rsvd2[2]; } NvmeIdNsDescr; -enum { - NVME_NIDT_EUI64_LEN = 8, - NVME_NIDT_NGUID_LEN = 16, - NVME_NIDT_UUID_LEN = 16, +enum NvmeNsIdentifierLength { + NVME_NIDL_EUI64 = 8, + NVME_NIDL_NGUID = 16, + NVME_NIDL_UUID = 16, + NVME_NIDL_CSI = 1, }; enum NvmeNsIdentifierType { - NVME_NIDT_EUI64 = 0x1, - NVME_NIDT_NGUID = 0x2, - NVME_NIDT_UUID = 0x3, + NVME_NIDT_EUI64 = 0x01, + NVME_NIDT_NGUID = 0x02, + NVME_NIDT_UUID = 0x03, + NVME_NIDT_CSI = 0x04, }; +enum NvmeCsi { + NVME_CSI_NVM = 0x00, +}; + +#define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat) ((dlfeat) & 0x08) @@ -1079,6 +1115,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); From patchwork Tue Oct 13 21:42:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836129 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D68B661C for ; Tue, 13 Oct 2020 21:50:19 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6B29420872 for ; Tue, 13 Oct 2020 21:50:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="ni9eIYRo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6B29420872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44110 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSSBW-0006dx-1R for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:50:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47118) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS41-0005bt-52; Tue, 13 Oct 2020 17:42:33 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45631) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3y-0001lt-Gk; Tue, 13 Oct 2020 17:42:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625351; x=1634161351; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z8ywIN4xHa2aS5EQzMSE87Q0kEDTBZUzbEGWaFVnwu0=; b=ni9eIYRoIq/icjYGYvTAXlwrZK+gXaUIqG4nWEXaLAGboP485UUdFC2x RuafUvMcOrxbpiJPks7+0pE4ICrsNcLJ6CA+mrRE5TMXretUue2/SjsFP Jt5G+hYvazAo/OfVzjidmvkBuXGng3wI+WcHBaIkfCY5WtsPABgrhed1G Vs9BTRxWyKbJD06bhch58t0eysF2jD1UVOpx03UV8T1pvKPgBvEscwG/e 4uyyq5XCpm1uoGWCuLGyuJymFDco//QGIAVBPgDuAykqD721gRrZ31ZAn Mr+DVCvI+MVGIexE7hurTOTFRwexlR6XP8ijsBbL4Wda+B1qmSIEMB4MG w==; IronPort-SDR: Eq6q8SpgotTyyVwvYVEhKmBt/yyTZrKbA+9I63UabzEiu5wIT2wCOvWuDTodl7Foc7Eqs2Qo6j aS5r4vEqCpOTxDBDxVDPBUXu9AuZFfu9rXjTdjdsgaFvHoGSJ97SKR/oIyAUOcSXk4wdzmkcX4 oJYHBm5J6MzPonqjDLrkEpDgc6jJBIOCvSykYbE1gZiYb3/6SLrovGFW9ml863agL/G0uFwqnC 714AO4Vj8voxFxG1WMjhbuZKIDXlVzg9SZC+M+qqDYhJXcPLToKm9vwtT6s6bu73j6T8h6XQIt NSk= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185929" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:28 +0800 IronPort-SDR: aH0FIzx33LwQmhCOTX4p0qIIjj93ACX4W/kTH0F4cTA3SWyVmu/Ilw4+aXNYRvZnoVkHY3GLlu Y/KnwZPWBO6Zw7VFBXmSSEJevAZ6nzjrJucwOrAc4GTRuHJnzfPV29m6jPo/xlnWppDlW3qGUx jIYYacOqMp3Uv6ntnEoCcUw5ewNOwMr+ESR5LoCunDXhhK/8dwAWMaFOWux817hWX6LblFhbo+ uHELu207V9PaKl9EhOUov6g+hD8a2XzAvTGtz7mtF8tTgrHrhqiCsYPRFU9xxa516tlPpmOek+ 1IUrvJn3HKWHj7HdN/lBMabN Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:01 -0700 IronPort-SDR: BGCoLbw8SWskBwMrDfybZzm6IiCwITquax1WPVgAq6ACdey2kEvKdB9MdrhQbv0VAzry4ihcCC omVoOStECNsZo0lSgQmDoVU3mcFj27NqOuwEhNfuWd/icH4wZ8aA8cMzy35A+MsMq52shxYiHP 20pzwfArEVxEuYKxqn2V7T8aIg6dgefjZpCd87qLdRvNd7UkMl8WPLvguEKJTmAXzySBP6Ui8f HPl6iEKhD0nZdK3r9hZjfEZjSVm6+GSEk+Nro3sf75hil8W/9A2zz5VomOQbaxE2OSSDtqE2uT C80= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:26 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 04/11] hw/block/nvme: Support allocated CNS command variants Date: Wed, 14 Oct 2020 06:42:05 +0900 Message-Id: <20201013214212.2152-5-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" From: Niklas Cassel Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. In NVMe, a namespace is active if it exists and is attached to the controller. CAP.CSS (together with the I/O Command Set data structure) defines what command sets are supported by the controller. CC.CSS (together with Set Profile) can be set to enable a subset of the available command sets. Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces will still be attached (and thus marked as active). Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces will still be attached (and thus marked as active). However, any operation from a disabled command set will result in a Invalid Command Opcode. Add a new Boolean namespace property, "attached", to provide the most basic namespace attachment support. The default value for this new property is true. Also, implement the logic in the new CNS values to include/exclude namespaces based on this new property. The only thing missing is hooking up the actual Namespace Attachment command opcode, which will allow a user to toggle the "attached" flag per namespace. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 1 + hw/block/nvme-ns.h | 1 + hw/block/nvme.c | 68 ++++++++++++++++++++++++++++++++++++-------- include/block/nvme.h | 20 +++++++------ 4 files changed, 70 insertions(+), 20 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c0362426cc..974aea33f7 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -132,6 +132,7 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), + DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d795e44bab..d6b2808b97 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + bool attached; QemuUUID uuid; } NvmeNamespaceParams; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index af46448234..2b4488575b 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1062,6 +1062,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) if (unlikely(!req->ns)) { return NVME_INVALID_FIELD | NVME_DNR; } + if (!req->ns->params.attached) { + return NVME_INVALID_FIELD | NVME_DNR; + } if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { trace_pci_nvme_err_invalid_opc(req->cmd.opcode); @@ -1222,6 +1225,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, uint32_t trans_len; NvmeNamespace *ns; time_t current_ms; + int i; if (off >= sizeof(smart)) { return NVME_INVALID_FIELD | NVME_DNR; @@ -1232,15 +1236,18 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, if (!ns) { return NVME_INVALID_NSID | NVME_DNR; } - nvme_set_blk_stats(ns, &stats); + if (ns->params.attached) { + nvme_set_blk_stats(ns, &stats); + } } else { - int i; - for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { continue; } + if (!ns->params.attached) { + continue; + } nvme_set_blk_stats(ns, &stats); } } @@ -1529,7 +1536,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1546,11 +1554,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->params.attached) { + return nvme_rpt_empty_id_struct(n, req); + } + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1567,6 +1580,10 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->params.attached) { + return nvme_rpt_empty_id_struct(n, req); + } + if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); } @@ -1574,7 +1591,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1604,6 +1622,9 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid < min_nsid) { continue; } + if (only_active && !ns->params.attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1613,7 +1634,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1637,6 +1659,9 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid < min_nsid) { continue; } + if (only_active && !ns->params.attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1710,17 +1735,25 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) switch (le32_to_cpu(c->cns)) { case NVME_ID_CNS_NS: - return nvme_identify_ns(n, req); + return nvme_identify_ns(n, req, true); case NVME_ID_CNS_CS_NS: - return nvme_identify_ns_csi(n, req); + return nvme_identify_ns_csi(n, req, true); + case NVME_ID_CNS_NS_PRESENT: + return nvme_identify_ns(n, req, false); + case NVME_ID_CNS_CS_NS_PRESENT: + return nvme_identify_ns_csi(n, req, false); case NVME_ID_CNS_CTRL: return nvme_identify_ctrl(n, req); case NVME_ID_CNS_CS_CTRL: return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: - return nvme_identify_nslist(n, req); + return nvme_identify_nslist(n, req, true); case NVME_ID_CNS_CS_NS_ACTIVE_LIST: - return nvme_identify_nslist_csi(n, req); + return nvme_identify_nslist_csi(n, req, true); + case NVME_ID_CNS_NS_PRESENT_LIST: + return nvme_identify_nslist(n, req, false); + case NVME_ID_CNS_CS_NS_PRESENT_LIST: + return nvme_identify_nslist_csi(n, req, false); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); case NVME_ID_CNS_IO_COMMAND_SET: @@ -1793,6 +1826,7 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; NvmeCmd *cmd = &req->cmd; uint32_t dw10 = le32_to_cpu(cmd->cdw10); uint32_t dw11 = le32_to_cpu(cmd->cdw11); @@ -1824,7 +1858,11 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - if (!nvme_ns(n, nsid)) { + ns = nvme_ns(n, nsid); + if (!ns) { + return NVME_INVALID_FIELD | NVME_DNR; + } + if (!ns->params.attached) { return NVME_INVALID_FIELD | NVME_DNR; } } @@ -1966,6 +2004,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } + if (!ns->params.attached) { + return NVME_INVALID_FIELD | NVME_DNR; + } } } else if (nsid && nsid != NVME_NSID_BROADCAST) { if (!nvme_nsid_valid(n, nsid)) { @@ -2013,6 +2054,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) if (!ns) { continue; } + if (!ns->params.attached) { + continue; + } if (!(dw11 & 0x1) && blk_enable_write_cache(ns->blkconf.blk)) { blk_flush(ns->blkconf.blk); diff --git a/include/block/nvme.h b/include/block/nvme.h index f5ac9143c4..27125c9d28 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -805,14 +805,18 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 enum NvmeIdCns { - NVME_ID_CNS_NS = 0x00, - NVME_ID_CNS_CTRL = 0x01, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, - NVME_ID_CNS_NS_DESCR_LIST = 0x03, - NVME_ID_CNS_CS_NS = 0x05, - NVME_ID_CNS_CS_CTRL = 0x06, - NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, - NVME_ID_CNS_IO_COMMAND_SET = 0x1c, + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_CS_NS = 0x05, + NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, + NVME_ID_CNS_NS_PRESENT_LIST = 0x10, + NVME_ID_CNS_NS_PRESENT = 0x11, + NVME_ID_CNS_CS_NS_PRESENT_LIST = 0x1a, + NVME_ID_CNS_CS_NS_PRESENT = 0x1b, + NVME_ID_CNS_IO_COMMAND_SET = 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { From patchwork Tue Oct 13 21:42:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 777C561C for ; Tue, 13 Oct 2020 21:52:53 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA23320872 for ; Tue, 13 Oct 2020 21:52:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="D52DlfWW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA23320872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:49350 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSSDz-0000f1-Uv for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:52:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS44-0005gg-5n; Tue, 13 Oct 2020 17:42:36 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45639) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS3z-0001mW-A3; Tue, 13 Oct 2020 17:42:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625351; x=1634161351; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Q4R1gq2Cc+FcQjP3e7IJtnv9eBGGx/IC3puV32dgdh4=; b=D52DlfWW34q03lJsoy/6t4HWuC9Z864WY1iisebiC9O07Nf7P8+r5MB1 Zr631a7k9eEmW+ZsiXLCHBI9z7YyEBFqraHyD8nuT4gBkGATPSiWLbuAh MLc0pCM7WY0MA3VSBLpNtUi6KnChlrGawZO3CES/tBm6yhLBbQA1h/Xa0 8VbmFEKRonyZZG2+wjyGGhMDHulYOcMWtnBOTWTHetLJ2+EEIxNHQrRmw j08ciOpOXlijLMUqaQ9nhtZ6b/Xyg7lU9y3kFYACGAo8otVBdRk6QGRFv iLSs3JODRojKq5KMFM8KogtNKcXVv9YanvQ9vtc8j6jeCbywZiphzN2XG A==; IronPort-SDR: EakAwplndvOdE+KT+qnNgjlE9UyquGknqK4bxvq9OccPHRWzImUdW6aXgPg5z8JdYGJzWxJJ2h 1x1k2Ftt9I1Ozn1t40VyG/dqL8mzXcw9LIP74nCgCLwEr0ILhyqrvMgkCZ0BhV2sIRIBxgEMww Kg694mXA3netcVnhDOmeeL3Lnfd/YUbxbxliQ4lA35q56VipSD62n1PkvxeQh4fnhf/+8jEbSe XQrolL58yVV6X1MQ6lAFDhIfwxiBj0t0FZ3jHDT8UOlMbP9nWjvcjbseSVEhNPsPn6L5AQALL+ U6M= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185930" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:30 +0800 IronPort-SDR: MupGGMB3+Yvv9JiBM3URvDiXcNfjp1S6PCe0VSPF/iCSN3rqA8jcLlS2967z1TMdQ/aoogGL70 MgyyK1mVV5FIBv3ozK2FQ2iV7ebu1nEfIpABT+KFFCIuL58XIavt+6wRIEQbmhTuSaQrrwcIVR UKg+q+BDjjkOt7S416w1OZNQETnF580YLcu0cKeImL/8CMvzAqoyPCLkXzZmiHS+v/Gdi0FwcS +v9aYLQEj2YjHenetew3k+FSsfCo2VdYomHt/DLQ117MHX45c2vNpLROuQ/jcOplG4AmTYn8rC eEvCx2ZdvCAQQvhUVN5FNqGU Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:03 -0700 IronPort-SDR: WOGLg85n/3aCxTP3s0vJzMX5Bk4MPX/UZTAeWOX2ggV/xK9AOq/X1xF6IWM+89BhlOwZPFYus0 iyvpjw+4V3uf9tdAxqlT+MIB7hx87cMwf/x6QgSzmLJmMa6th45L0NjapKGv1Ww0ruc8XPcopK nkHK6JrdKAS0i9zNAkyij5oE4/tMtJjXhq6V24xFHqPrkixr+o7Nyu8iSBqsl+gfLO5smVzO73 n7ZEvjmn4fEBzcvHyFWtc9S8Jl2q6rMNbz/OGHDAP2XttAZEmA+mE1UQsdyAcxAO7T6E140iYm hdU= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:28 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 05/11] hw/block/nvme: Support Zoned Namespace Command Set Date: Wed, 14 Oct 2020 06:42:06 +0900 Message-Id: <20201013214212.2152-6-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel Signed-off-by: Hans Holmberg Signed-off-by: Ajay Joshi Signed-off-by: Chaitanya Kulkarni Signed-off-by: Matias Bjorling Signed-off-by: Aravind Ramesh Signed-off-by: Shin'ichiro Kawasaki Signed-off-by: Adam Manzanares Signed-off-by: Dmitry Fomichev --- block/nvme.c | 2 +- hw/block/nvme-ns.c | 193 +++++++++ hw/block/nvme-ns.h | 54 +++ hw/block/nvme.c | 987 ++++++++++++++++++++++++++++++++++++++++-- hw/block/nvme.h | 9 + hw/block/trace-events | 21 + include/block/nvme.h | 113 ++++- 7 files changed, 1349 insertions(+), 30 deletions(-) diff --git a/block/nvme.c b/block/nvme.c index 05485fdd11..7a513c9a17 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c) { uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; if (status) { - trace_nvme_error(le32_to_cpu(c->result), + trace_nvme_error(le32_to_cpu(c->result32), le16_to_cpu(c->sq_head), le16_to_cpu(c->sq_id), le16_to_cpu(c->cid), diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 974aea33f7..fedfad595c 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -25,6 +25,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-core.h" +#include "trace.h" #include "nvme.h" #include "nvme-ns.h" @@ -76,6 +77,171 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) return 0; } +static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) +{ + uint64_t zone_size, zone_cap; + uint32_t nz, lbasz = ns->blkconf.logical_block_size; + + if (ns->params.zone_size_bs) { + zone_size = ns->params.zone_size_bs; + } else { + zone_size = NVME_DEFAULT_ZONE_SIZE; + } + if (ns->params.zone_cap_bs) { + zone_cap = ns->params.zone_cap_bs; + } else { + zone_cap = zone_size; + } + if (zone_cap > zone_size) { + error_setg(errp, "zone capacity %luB exceeds zone size %luB", + zone_cap, zone_size); + return -1; + } + if (zone_size < lbasz) { + error_setg(errp, "zone size %luB too small, must be at least %uB", + zone_size, lbasz); + return -1; + } + if (zone_cap < lbasz) { + error_setg(errp, "zone capacity %luB too small, must be at least %uB", + zone_cap, lbasz); + return -1; + } + ns->zone_size = zone_size / lbasz; + ns->zone_capacity = zone_cap / lbasz; + + nz = DIV_ROUND_UP(ns->size / lbasz, ns->zone_size); + ns->num_zones = nz; + ns->zone_array_size = sizeof(NvmeZone) * nz; + ns->zone_size_log2 = 0; + if (is_power_of_2(ns->zone_size)) { + ns->zone_size_log2 = 63 - clz64(ns->zone_size); + } + + return 0; +} + +static void nvme_init_zone_state(NvmeNamespace *ns) +{ + uint64_t start = 0, zone_size = ns->zone_size; + uint64_t capacity = ns->num_zones * zone_size; + NvmeZone *zone; + int i; + + ns->zone_array = g_malloc0(ns->zone_array_size); + + QTAILQ_INIT(&ns->exp_open_zones); + QTAILQ_INIT(&ns->imp_open_zones); + QTAILQ_INIT(&ns->closed_zones); + QTAILQ_INIT(&ns->full_zones); + + zone = ns->zone_array; + for (i = 0; i < ns->num_zones; i++, zone++) { + if (start + zone_size > capacity) { + zone_size = capacity - start; + } + zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE; + nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); + zone->d.za = 0; + zone->d.zcap = ns->zone_capacity; + zone->d.zslba = start; + zone->d.wp = start; + zone->w_ptr = start; + start += zone_size; + } +} + +static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, + Error **errp) +{ + NvmeIdNsZoned *id_ns_z; + + if (n->params.fill_pattern == 0xff) { + ns->id_ns.dlfeat |= 0x02; + } + if (n->params.fill_pattern != 0x00) { + ns->id_ns.dlfeat &= ~0x01; + } + + if (nvme_calc_zone_geometry(ns, errp) != 0) { + return -1; + } + + nvme_init_zone_state(ns); + + id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); + + /* MAR/MOR are zeroes-based, 0xffffffff means no limit */ + id_ns_z->mar = 0xffffffff; + id_ns_z->mor = 0xffffffff; + id_ns_z->zoc = 0; + id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; + + id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); + id_ns_z->lbafe[lba_index].zdes = 0; + + ns->csi = NVME_CSI_ZONED; + ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); + ns->id_ns.ncap = cpu_to_le64(ns->zone_capacity * ns->num_zones); + ns->id_ns.nuse = ns->id_ns.ncap; + + ns->id_ns_zoned = id_ns_z; + + return 0; +} + +/* + * Close or finish all the zones that are currently open. + */ +static void nvme_zoned_clear_ns(NvmeNamespace *ns) +{ + NvmeZone *zone; + uint32_t set_state; + int i; + + zone = ns->zone_array; + for (i = 0; i < ns->num_zones; i++, zone++) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + /* fall through */ + default: + continue; + } + + if (zone->d.wp == zone->d.zslba) { + set_state = NVME_ZONE_STATE_EMPTY; + } else { + set_state = NVME_ZONE_STATE_CLOSED; + } + + switch (set_state) { + case NVME_ZONE_STATE_CLOSED: + trace_pci_nvme_clear_ns_close(nvme_get_zone_state(zone), + zone->d.zslba); + QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_EMPTY: + trace_pci_nvme_clear_ns_reset(nvme_get_zone_state(zone), + zone->d.zslba); + break; + case NVME_ZONE_STATE_FULL: + trace_pci_nvme_clear_ns_full(nvme_get_zone_state(zone), + zone->d.zslba); + zone->d.wp = nvme_zone_wr_boundary(zone); + QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); + } + + zone->w_ptr = zone->d.wp; + nvme_set_zone_state(zone, set_state); + } +} + static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { if (!ns->blkconf.blk) { @@ -97,6 +263,12 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) } nvme_ns_init(ns); + if (ns->params.zoned) { + if (nvme_zoned_init_ns(n, ns, 0, errp) != 0) { + return -1; + } + } + if (nvme_register_namespace(n, ns, errp)) { return -1; } @@ -114,6 +286,21 @@ void nvme_ns_flush(NvmeNamespace *ns) blk_flush(ns->blkconf.blk); } +void nvme_ns_clear(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + nvme_zoned_clear_ns(ns); + } +} + +void nvme_ns_cleanup(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + g_free(ns->id_ns_zoned); + g_free(ns->zone_array); + } +} + static void nvme_ns_realize(DeviceState *dev, Error **errp) { NvmeNamespace *ns = NVME_NS(dev); @@ -133,6 +320,12 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true), + DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), + DEFINE_PROP_SIZE("zone_size", NvmeNamespace, params.zone_size_bs, + NVME_DEFAULT_ZONE_SIZE), + DEFINE_PROP_SIZE("zone_capacity", NvmeNamespace, params.zone_cap_bs, 0), + DEFINE_PROP_BOOL("cross_zone_read", NvmeNamespace, + params.cross_zone_read, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d6b2808b97..170cbb8cdc 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -19,10 +19,21 @@ #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +typedef struct NvmeZone { + NvmeZoneDescr d; + uint64_t w_ptr; + QTAILQ_ENTRY(NvmeZone) entry; +} NvmeZone; + typedef struct NvmeNamespaceParams { uint32_t nsid; bool attached; QemuUUID uuid; + + bool zoned; + bool cross_zone_read; + uint64_t zone_size_bs; + uint64_t zone_cap_bs; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -34,6 +45,18 @@ typedef struct NvmeNamespace { const uint32_t *iocs; uint8_t csi; + NvmeIdNsZoned *id_ns_zoned; + NvmeZone *zone_array; + QTAILQ_HEAD(, NvmeZone) exp_open_zones; + QTAILQ_HEAD(, NvmeZone) imp_open_zones; + QTAILQ_HEAD(, NvmeZone) closed_zones; + QTAILQ_HEAD(, NvmeZone) full_zones; + uint32_t num_zones; + uint64_t zone_size; + uint64_t zone_capacity; + uint64_t zone_array_size; + uint32_t zone_size_log2; + NvmeNamespaceParams params; } NvmeNamespace; @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) typedef struct NvmeCtrl NvmeCtrl; +static inline uint8_t nvme_get_zone_state(NvmeZone *zone) +{ + return zone->d.zs >> 4; +} + +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState state) +{ + zone->d.zs = state << 4; +} + +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) +{ + return zone->d.zslba + ns->zone_size; +} + +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) +{ + return zone->d.zslba + zone->d.zcap; +} + +static inline bool nvme_wp_is_valid(NvmeZone *zone) +{ + uint8_t st = nvme_get_zone_state(zone); + + return st != NVME_ZONE_STATE_FULL && + st != NVME_ZONE_STATE_READ_ONLY && + st != NVME_ZONE_STATE_OFFLINE; +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); +void nvme_ns_clear(NvmeNamespace *ns); +void nvme_ns_cleanup(NvmeNamespace *ns); #endif /* NVME_NS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 2b4488575b..2e663713c7 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -133,6 +133,16 @@ static const uint32_t nvme_cse_iocs_nvm[256] = { [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, }; +static const uint32_t nvme_cse_iocs_zoned[256] = { + [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_APPEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_ZONE_MGMT_SEND] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_MGMT_RECV] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -149,6 +159,46 @@ static uint16_t nvme_sqid(NvmeRequest *req) return le16_to_cpu(req->sq->sqid); } +static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + if (QTAILQ_IN_USE(zone, entry)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_REMOVE(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_REMOVE(&ns->full_zones, zone, entry); + } + } + + nvme_set_zone_state(zone, state); + + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); + case NVME_ZONE_STATE_READ_ONLY: + break; + default: + zone->d.za = 0; + } +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr low = n->ctrl_mem.addr; @@ -841,7 +891,7 @@ static void nvme_process_aers(void *opaque) req = n->aer_reqs[n->outstanding_aers]; - result = (NvmeAerResult *) &req->cqe.result; + result = (NvmeAerResult *) &req->cqe.result32; result->event_type = event->result.event_type; result->event_info = event->result.event_info; result->log_page = event->result.log_page; @@ -910,6 +960,326 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns, return NVME_SUCCESS; } +static void nvme_fill_read_data(NvmeRequest *req, uint64_t offset, + uint32_t max_len, uint8_t pattern) +{ + QEMUSGList *qsg = &req->qsg; + QEMUIOVector *iov = &req->iov; + ScatterGatherEntry *entry; + uint32_t len, ent_len; + + if (qsg->nsg > 0) { + entry = qsg->sg; + len = qsg->size; + if (max_len) { + len = MIN(len, max_len); + } + for (; len > 0; len -= ent_len) { + ent_len = MIN(len, entry->len); + if (offset > ent_len) { + offset -= ent_len; + } else if (offset != 0) { + dma_memory_set(qsg->as, entry->base + offset, + pattern, ent_len - offset); + offset = 0; + } else { + dma_memory_set(qsg->as, entry->base, pattern, ent_len); + } + entry++; + } + } else if (iov->iov) { + len = iov_size(iov->iov, iov->niov); + if (max_len) { + len = MIN(len, max_len); + } + qemu_iovec_memset(iov, offset, pattern, len - offset); + } +} + +static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba) +{ + return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 : + slba / ns->zone_size; +} + +static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba) +{ + uint32_t zone_idx = nvme_zone_idx(ns, slba); + + assert(zone_idx < ns->num_zones); + return &ns->zone_array[zone_idx]; +} + +static uint16_t nvme_zone_state_ok_to_write(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_FULL: + status = NVME_ZONE_FULL; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE; + break; + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_ZONE_READ_ONLY; + break; + default: + assert(false); + } + + return status; +} + +static uint16_t nvme_check_zone_write(NvmeCtrl *n, NvmeNamespace *ns, + NvmeZone *zone, uint64_t slba, + uint32_t nlb, bool append) +{ + uint16_t status; + + if (unlikely((slba + nlb) > nvme_zone_wr_boundary(zone))) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + status = nvme_zone_state_ok_to_write(zone); + } + + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_write_not_ok(slba, nlb, status); + } else { + assert(nvme_wp_is_valid(zone)); + if (append) { + if (unlikely(slba != zone->d.zslba)) { + trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba); + status = NVME_ZONE_INVALID_WRITE; + } + if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) { + trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl); + status = NVME_INVALID_FIELD; + } + } else if (unlikely(slba != zone->w_ptr)) { + trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba, + zone->w_ptr); + status = NVME_ZONE_INVALID_WRITE; + } + } + + return status; +} + +static uint16_t nvme_zone_state_ok_to_read(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_FULL: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE | NVME_DNR; + break; + default: + assert(false); + } + + return status; +} + +typedef struct NvmeReadFillCtx { + uint64_t pre_rd_fill_slba; + uint64_t read_slba; + uint64_t post_rd_fill_slba; + + uint32_t pre_rd_fill_nlb; + uint32_t read_nlb; + uint32_t post_rd_fill_nlb; +} NvmeReadFillCtx; + +static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, + uint64_t slba, uint32_t nlb, + NvmeReadFillCtx *rfc) +{ + NvmeZone *next_zone; + uint64_t bndry = nvme_zone_rd_boundary(ns, zone); + uint64_t end = slba + nlb, wp1, wp2; + uint16_t status; + + rfc->pre_rd_fill_slba = ~0ULL; + rfc->pre_rd_fill_nlb = 0; + rfc->read_slba = slba; + rfc->read_nlb = nlb; + rfc->post_rd_fill_slba = ~0ULL; + rfc->post_rd_fill_nlb = 0; + + status = nvme_zone_state_ok_to_read(zone); + if (status != NVME_SUCCESS) { + ; + } else if (likely(end <= bndry)) { + if (end > zone->w_ptr) { + wp1 = zone->w_ptr; + if (slba >= wp1) { + /* No i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } + } else if (!ns->params.cross_zone_read) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + /* + * Read across zone boundary, look at the next zone. + * Earlier bounds checks ensure that the current zone + * is not the last one. + */ + next_zone = zone + 1; + status = nvme_zone_state_ok_to_read(next_zone); + if (status != NVME_SUCCESS) { + ; + } else if (end > nvme_zone_rd_boundary(ns, next_zone)) { + /* + * As zone size is much larger than a typical maximum + * i/o size in real hardware, allow the i/o range + * to span no more than one pair of zones. + */ + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + wp1 = zone->w_ptr; + wp2 = next_zone->w_ptr; + if (wp2 == bndry) { + if (slba >= wp1) { + /* Again, no i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } else if (slba < wp1) { + if (end > wp2) { + if (wp1 == bndry) { + rfc->post_rd_fill_slba = wp2; + rfc->post_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + } else { + rfc->pre_rd_fill_slba = wp2; + rfc->pre_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + if (end > wp2) { + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = end - slba; + rfc->read_slba = bndry; + rfc->read_nlb = wp2 - bndry; + } else { + rfc->read_slba = bndry; + rfc->read_nlb = end - bndry; + rfc->post_rd_fill_slba = slba; + rfc->post_rd_fill_nlb = bndry - slba; + } + } + } + } + + return status; +} + +static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, + bool failed) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeZone *zone; + uint64_t slba, start_wp = req->cqe.result64; + uint32_t nlb; + + if (rw->opcode != NVME_CMD_WRITE && + rw->opcode != NVME_CMD_ZONE_APPEND && + rw->opcode != NVME_CMD_WRITE_ZEROES) { + return false; + } + + slba = le64_to_cpu(rw->slba); + nlb = le16_to_cpu(rw->nlb) + 1; + zone = nvme_get_zone_by_slba(ns, slba); + + if (!failed && zone->w_ptr < start_wp + nlb) { + /* + * A preceding queued write to the zone has failed, + * now this write is not at the WP, fail it too. + */ + failed = true; + } + + if (failed) { + if (zone->w_ptr > start_wp) { + zone->w_ptr = start_wp; + zone->d.wp = start_wp; + } + req->cqe.result64 = 0; + } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + break; + default: + assert(false); + } + zone->d.wp = zone->w_ptr; + } else { + zone->d.wp += nlb; + } + + return failed; +} + +static uint64_t nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, + uint32_t nlb) +{ + uint64_t result = zone->w_ptr; + uint8_t zs; + + zone->w_ptr += nlb; + + if (zone->w_ptr < nvme_zone_wr_boundary(zone)) { + zs = nvme_get_zone_state(zone); + switch (zs) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); + } + } + + return result; +} + static void nvme_rw_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -924,10 +1294,27 @@ static void nvme_rw_cb(void *opaque, int ret) trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); if (!ret) { - block_acct_done(stats, acct); + if (ns->params.zoned) { + if (nvme_finalize_zoned_write(ns, req, false)) { + ret = EIO; + block_acct_failed(stats, acct); + req->status = NVME_ZONE_INVALID_WRITE; + } else if (req->fill_len) { + nvme_fill_read_data(req, req->fill_ofs, req->fill_len, + nvme_ctrl(req)->params.fill_pattern); + req->fill_len = 0; + } + } + if (!ret) { + block_acct_done(stats, acct); + } } else { uint16_t status; + if (ns->params.zoned) { + nvme_finalize_zoned_write(ns, req, true); + } + block_acct_failed(stats, acct); switch (req->cmd.opcode) { @@ -969,8 +1356,10 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) NvmeNamespace *ns = req->ns; uint64_t slba = le64_to_cpu(rw->slba); uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + NvmeZone *zone; uint64_t offset = nvme_l2b(ns, slba); uint32_t count = nvme_l2b(ns, nlb); + BlockBackend *blk = ns->blkconf.blk; uint16_t status; trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); @@ -981,24 +1370,41 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return status; } - block_acct_start(blk_get_stats(req->ns->blkconf.blk), &req->acct, 0, - BLOCK_ACCT_WRITE); - req->aiocb = blk_aio_pwrite_zeroes(req->ns->blkconf.blk, offset, count, + if (ns->params.zoned) { + zone = nvme_get_zone_by_slba(ns, slba); + + status = nvme_check_zone_write(n, ns, zone, slba, nlb, false); + if (status != NVME_SUCCESS) { + goto invalid; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } + + block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(blk, offset, count, BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); + return status | NVME_DNR; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t slba = le64_to_cpu(rw->slba); - uint64_t data_size = nvme_l2b(ns, nlb); - uint64_t data_offset = nvme_l2b(ns, slba); - enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ? - BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; + uint64_t data_offset, fill_ofs; + + NvmeZone *zone; + uint32_t fill_len; + NvmeReadFillCtx rfc; + bool is_write = rw->opcode == NVME_CMD_WRITE || append; + enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -1017,14 +1423,71 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) goto invalid; } + if (ns->params.zoned) { + zone = nvme_get_zone_by_slba(ns, slba); + + if (is_write) { + status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); + if (status != NVME_SUCCESS) { + goto invalid; + } + + if (append) { + slba = zone->w_ptr; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } else { + status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); + goto invalid; + } + } + } else if (append) { + trace_pci_nvme_err_invalid_opc(rw->opcode); + status = NVME_INVALID_OPCODE; + goto invalid; + } + + data_offset = nvme_l2b(ns, slba); + status = nvme_map_dptr(n, data_size, req); if (status) { goto invalid; } + if (ns->params.zoned) { + if (is_write) { + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); + } else { + if (rfc.pre_rd_fill_nlb) { + fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); + fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); + nvme_fill_read_data(req, fill_ofs, fill_len, + n->params.fill_pattern); + } + if (!rfc.read_nlb) { + /* No backend I/O necessary, only needed to fill the buffer */ + req->status = NVME_SUCCESS; + return NVME_SUCCESS; + } + if (rfc.post_rd_fill_nlb) { + req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); + req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); + } else { + req->fill_len = 0; + } + slba = rfc.read_slba; + data_size = nvme_l2b(ns, rfc.read_nlb); + } + } + + data_offset = nvme_l2b(ns, slba); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, acct); if (req->qsg.sg) { - if (acct == BLOCK_ACCT_WRITE) { + if (is_write) { req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE, nvme_rw_cb, req); } else { @@ -1032,7 +1495,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) BDRV_SECTOR_SIZE, nvme_rw_cb, req); } } else { - if (acct == BLOCK_ACCT_WRITE) { + if (is_write) { req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, nvme_rw_cb, req); } else { @@ -1043,10 +1506,383 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; invalid: - block_acct_invalid(blk_get_stats(ns->blkconf.blk), acct); + block_acct_invalid(blk_get_stats(blk), acct); + return status | NVME_DNR; +} + +static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, + uint64_t *slba, uint32_t *zone_idx) +{ + uint32_t dw10 = le32_to_cpu(c->cdw10); + uint32_t dw11 = le32_to_cpu(c->cdw11); + + if (!ns->params.zoned) { + trace_pci_nvme_err_invalid_opc(c->opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + + *slba = ((uint64_t)dw11) << 32 | dw10; + if (unlikely(*slba >= ns->id_ns.nsze)) { + trace_pci_nvme_err_invalid_lba_range(*slba, 0, ns->id_ns.nsze); + *slba = 0; + return NVME_LBA_RANGE | NVME_DNR; + } + + *zone_idx = nvme_zone_idx(ns, *slba); + assert(*zone_idx < ns->num_zones); + + return NVME_SUCCESS; +} + +static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); + /* fall through */ + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_open_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_CLOSED; +} + +static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + /* fall through */ + case NVME_ZONE_STATE_CLOSED: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_close_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN; +} + +static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + zone->w_ptr = nvme_zone_wr_boundary(zone); + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_finish_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN || + state == NVME_ZONE_STATE_CLOSED; +} + +static uint16_t nvme_reset_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_FULL: + zone->w_ptr = zone->d.zslba; + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY); + /* fall through */ + case NVME_ZONE_STATE_EMPTY: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_reset_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_IMPLICITLY_OPEN || + state == NVME_ZONE_STATE_EXPLICITLY_OPEN || + state == NVME_ZONE_STATE_CLOSED || + state == NVME_ZONE_STATE_FULL; +} + +static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_READ_ONLY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE); + /* fall through */ + case NVME_ZONE_STATE_OFFLINE: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static bool nvme_cond_offline_all(uint8_t state) +{ + return state == NVME_ZONE_STATE_READ_ONLY; +} + +typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, + uint8_t); +typedef bool (*need_to_proc_zone_t)(uint8_t); + +static uint16_t name_do_zone_op(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state, bool all, + op_handler_t op_hndlr, + need_to_proc_zone_t proc_zone) +{ + int i; + uint16_t status = 0; + + if (!all) { + status = op_hndlr(ns, zone, state); + } else { + for (i = 0; i < ns->num_zones; i++, zone++) { + state = nvme_get_zone_state(zone); + if (proc_zone(state)) { + status = op_hndlr(ns, zone, state); + if (status != NVME_SUCCESS) { + break; + } + } + } + } + return status; } +static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint64_t slba = 0; + uint32_t zone_idx = 0; + uint16_t status; + uint8_t action, state; + bool all; + NvmeZone *zone; + + action = dw13 & 0xff; + all = dw13 & 0x100; + + req->status = NVME_SUCCESS; + + if (!all) { + status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (status) { + return status; + } + } + + zone = &ns->zone_array[zone_idx]; + if (slba != zone->d.zslba) { + trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + state = nvme_get_zone_state(zone); + + switch (action) { + + case NVME_ZONE_ACTION_OPEN: + trace_pci_nvme_open_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_open_zone, nvme_cond_open_all); + break; + + case NVME_ZONE_ACTION_CLOSE: + trace_pci_nvme_close_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_close_zone, nvme_cond_close_all); + break; + + case NVME_ZONE_ACTION_FINISH: + trace_pci_nvme_finish_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_finish_zone, nvme_cond_finish_all); + break; + + case NVME_ZONE_ACTION_RESET: + trace_pci_nvme_reset_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_reset_zone, nvme_cond_reset_all); + break; + + case NVME_ZONE_ACTION_OFFLINE: + trace_pci_nvme_offline_zone(slba, zone_idx, all); + status = name_do_zone_op(ns, zone, state, all, + nvme_offline_zone, nvme_cond_offline_all); + break; + + case NVME_ZONE_ACTION_SET_ZD_EXT: + trace_pci_nvme_set_descriptor_extension(slba, zone_idx); + return NVME_INVALID_FIELD | NVME_DNR; + break; + + default: + trace_pci_nvme_err_invalid_mgmt_action(action); + status = NVME_INVALID_FIELD; + } + + if (status == NVME_ZONE_INVAL_TRANSITION) { + trace_pci_nvme_err_invalid_zone_state_transition(state, action, slba, + zone->d.za); + } + if (status) { + status |= NVME_DNR; + } + + return status; +} + +static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl) +{ + int zs = nvme_get_zone_state(zl); + + switch (zafs) { + case NVME_ZONE_REPORT_ALL: + return true; + case NVME_ZONE_REPORT_EMPTY: + return zs == NVME_ZONE_STATE_EMPTY; + case NVME_ZONE_REPORT_IMPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN; + case NVME_ZONE_REPORT_EXPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN; + case NVME_ZONE_REPORT_CLOSED: + return zs == NVME_ZONE_STATE_CLOSED; + case NVME_ZONE_REPORT_FULL: + return zs == NVME_ZONE_STATE_FULL; + case NVME_ZONE_REPORT_READ_ONLY: + return zs == NVME_ZONE_STATE_READ_ONLY; + case NVME_ZONE_REPORT_OFFLINE: + return zs == NVME_ZONE_STATE_OFFLINE; + default: + return false; + } +} + +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + /* cdw12 is zero-based number of dwords to return. Convert to bytes */ + uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint32_t zone_idx, zra, zrasf, partial; + uint64_t max_zones, nr_zones = 0; + uint16_t ret; + uint64_t slba; + NvmeZoneDescr *z; + NvmeZone *zs; + NvmeZoneReportHeader *header; + void *buf, *buf_p; + size_t zone_entry_sz; + + req->status = NVME_SUCCESS; + + ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (ret) { + return ret; + } + + if (len < sizeof(NvmeZoneReportHeader)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + zra = dw13 & 0xff; + if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (zra == NVME_ZONE_REPORT_EXTENDED) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + zrasf = (dw13 >> 8) & 0xff; + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + partial = (dw13 >> 16) & 0x01; + + zone_entry_sz = sizeof(NvmeZoneDescr); + + max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; + buf = g_malloc0(len); + + header = (NvmeZoneReportHeader *)buf; + buf_p = buf + sizeof(NvmeZoneReportHeader); + + while (zone_idx < ns->num_zones && nr_zones < max_zones) { + zs = &ns->zone_array[zone_idx]; + + if (!nvme_zone_matches_filter(zrasf, zs)) { + zone_idx++; + continue; + } + + z = (NvmeZoneDescr *)buf_p; + buf_p += sizeof(NvmeZoneDescr); + nr_zones++; + + z->zt = zs->d.zt; + z->zs = zs->d.zs; + z->zcap = cpu_to_le64(zs->d.zcap); + z->zslba = cpu_to_le64(zs->d.zslba); + z->za = zs->d.za; + + if (nvme_wp_is_valid(zs)) { + z->wp = cpu_to_le64(zs->d.wp); + } else { + z->wp = cpu_to_le64(~0ULL); + } + + zone_idx++; + } + + if (!partial) { + for (; zone_idx < ns->num_zones; zone_idx++) { + zs = &ns->zone_array[zone_idx]; + if (nvme_zone_matches_filter(zrasf, zs)) { + nr_zones++; + } + } + } + header->nr_zones = cpu_to_le64(nr_zones); + + ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req); + + g_free(buf); + + return ret; +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { uint32_t nsid = le32_to_cpu(req->cmd.nsid); @@ -1076,9 +1912,15 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: return nvme_write_zeroes(n, req); + case NVME_CMD_ZONE_APPEND: + return nvme_rw(n, req, true); case NVME_CMD_WRITE: case NVME_CMD_READ: - return nvme_rw(n, req); + return nvme_rw(n, req, false); + case NVME_CMD_ZONE_MGMT_SEND: + return nvme_zone_mgmt_send(n, req); + case NVME_CMD_ZONE_MGMT_RECV: + return nvme_zone_mgmt_recv(n, req); default: assert(false); } @@ -1320,10 +2162,11 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, +static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len, uint64_t off, NvmeRequest *req) { NvmeEffectsLog log = {}; + const uint32_t *src_iocs = NULL; uint32_t *dst_acs = log.acs, *dst_iocs = log.iocs; uint32_t trans_len; int i; @@ -1335,13 +2178,31 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len, return NVME_INVALID_FIELD | NVME_DNR; } + switch (NVME_CC_CSS(n->bar.cc)) { + case NVME_CC_CSS_NVM: + src_iocs = nvme_cse_iocs_nvm; + break; + case NVME_CC_CSS_CSI: + switch (csi) { + case NVME_CSI_NVM: + src_iocs = nvme_cse_iocs_nvm; + break; + case NVME_CSI_ZONED: + src_iocs = nvme_cse_iocs_zoned; + break; + } + /* fall through */ + case NVME_CC_CSS_ADMIN_ONLY: + break; + } + for (i = 0; i < 256; i++) { dst_acs[i] = nvme_cse_acs[i]; } - if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) { + if (src_iocs) { for (i = 0; i < 256; i++) { - dst_iocs[i] = nvme_cse_iocs_nvm[i]; + dst_iocs[i] = src_iocs[i]; } } @@ -1362,6 +2223,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) uint8_t lid = dw10 & 0xff; uint8_t lsp = (dw10 >> 8) & 0xf; uint8_t rae = (dw10 >> 15) & 0x1; + uint8_t csi = le32_to_cpu(cmd->cdw14) >> 24; uint32_t numdl, numdu; uint64_t off, lpol, lpou; size_t len; @@ -1395,7 +2257,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) case NVME_LOG_FW_SLOT_INFO: return nvme_fw_log_info(n, len, off, req); case NVME_LOG_CMD_EFFECTS: - return nvme_cmd_effects(n, len, off, req); + return nvme_cmd_effects(n, csi, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -1515,6 +2377,16 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req); } +static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns) +{ + switch (ns->csi) { + case NVME_CSI_NVM: + case NVME_CSI_ZONED: + return true; + } + return false; +} + static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_identify_ctrl(); @@ -1526,11 +2398,16 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) { NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + NvmeIdCtrlZoned id = {}; trace_pci_nvme_identify_ctrl_csi(c->csi); if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED) { + id.zasl = n->zasl; + return nvme_dma(n, (uint8_t *)&id, sizeof(id), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1558,8 +2435,12 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), - DMA_DIRECTION_FROM_DEVICE, req); + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), + DMA_DIRECTION_FROM_DEVICE, req); + } + + return NVME_INVALID_CMD_SET | NVME_DNR; } static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, @@ -1584,8 +2465,11 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - if (c->csi == NVME_CSI_NVM) { + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) { + return nvme_dma(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1647,7 +2531,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, trace_pci_nvme_identify_nslist_csi(min_nsid, c->csi); - if (c->csi != NVME_CSI_NVM) { + if (c->csi != NVME_CSI_NVM && c->csi != NVME_CSI_ZONED) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1656,7 +2540,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, if (!ns) { continue; } - if (ns->params.nsid < min_nsid) { + if (ns->params.nsid < min_nsid || c->csi != ns->csi) { continue; } if (only_active && !ns->params.attached) { @@ -1726,6 +2610,8 @@ static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_identify_cmd_set(); NVME_SET_CSI(*list, NVME_CSI_NVM); + NVME_SET_CSI(*list, NVME_CSI_ZONED); + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } @@ -1768,7 +2654,7 @@ static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req) { uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff; - req->cqe.result = 1; + req->cqe.result32 = 1; if (nvme_check_sqid(n, sqid)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1953,7 +2839,7 @@ defaults: } out: - req->cqe.result = cpu_to_le32(result); + req->cqe.result32 = cpu_to_le32(result); return NVME_SUCCESS; } @@ -2084,8 +2970,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) ((dw11 >> 16) & 0xFFFF) + 1, n->params.max_ioqpairs, n->params.max_ioqpairs); - req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) | - ((n->params.max_ioqpairs - 1) << 16)); + req->cqe.result32 = cpu_to_le32((n->params.max_ioqpairs - 1) | + ((n->params.max_ioqpairs - 1) << 16)); break; case NVME_ASYNCHRONOUS_EVENT_CONF: n->features.async_config = dw11; @@ -2240,6 +3126,15 @@ static void nvme_clear_ctrl(NvmeCtrl *n) nvme_ns_flush(ns); } + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_clear(ns); + } + n->bar.cc = 0; } @@ -2260,6 +3155,11 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) ns->iocs = nvme_cse_iocs_nvm; } break; + case NVME_CSI_ZONED: + if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { + ns->iocs = nvme_cse_iocs_zoned; + } + break; } } } @@ -2358,6 +3258,17 @@ static int nvme_start_ctrl(NvmeCtrl *n) nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0, NVME_AQA_ASQS(n->bar.aqa) + 1); + if (!n->params.zasl_bs) { + n->zasl = n->params.mdts; + } else { + if (n->params.zasl_bs < n->page_size) { + trace_pci_nvme_err_startfail_zasl_too_small(n->params.zasl_bs, + n->page_size); + return -1; + } + n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size); + } + nvme_set_timestamp(n, 0ULL); QTAILQ_INIT(&n->aer_queue); @@ -2782,6 +3693,13 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) host_memory_backend_set_mapped(n->pmrdev, true); } + + if (n->params.zasl_bs) { + if (!is_power_of_2(n->params.zasl_bs)) { + error_setg(errp, "zone append size limit has to be a power of 2"); + return; + } + } } static void nvme_init_state(NvmeCtrl *n) @@ -3047,9 +3965,21 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) static void nvme_exit(PCIDevice *pci_dev) { NvmeCtrl *n = NVME(pci_dev); + NvmeNamespace *ns; + int i; nvme_clear_ctrl(n); + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_cleanup(ns); + } g_free(n->namespaces); + g_free(n->cq); g_free(n->sq); g_free(n->aer_reqs); @@ -3077,6 +4007,9 @@ static Property nvme_props[] = { DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64), DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7), DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), + DEFINE_PROP_UINT8("fill_pattern", NvmeCtrl, params.fill_pattern, 0), + DEFINE_PROP_SIZE32("zone_append_size_limit", NvmeCtrl, params.zasl_bs, + NVME_DEFAULT_MAX_ZA_SIZE), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e080a2318a..c406cb1c65 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -6,6 +6,9 @@ #define NVME_MAX_NAMESPACES 256 +#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) +#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ @@ -16,6 +19,8 @@ typedef struct NvmeParams { uint32_t aer_max_queued; uint8_t mdts; bool use_intel_id; + uint8_t fill_pattern; + uint32_t zasl_bs; } NvmeParams; typedef struct NvmeAsyncEvent { @@ -28,6 +33,8 @@ typedef struct NvmeRequest { struct NvmeNamespace *ns; BlockAIOCB *aiocb; uint16_t status; + uint64_t fill_ofs; + uint32_t fill_len; NvmeCqe cqe; NvmeCmd cmd; BlockAcctCookie acct; @@ -147,6 +154,8 @@ typedef struct NvmeCtrl { QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue; int aer_queued; + uint8_t zasl; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; diff --git a/hw/block/trace-events b/hw/block/trace-events index 65b964c894..af53e31fcb 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -90,6 +90,15 @@ pci_nvme_mmio_stopped(void) "cleared controller enable bit" pci_nvme_mmio_shutdown_set(void) "shutdown bit set" pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared" pci_nvme_cmd_supp_and_effects_log_read(void) "commands supported and effects log read" +pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" +pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" +pci_nvme_clear_ns_full(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Full state" # nvme traces for error conditions pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu" @@ -109,8 +118,18 @@ pci_nvme_err_invalid_prp(void) "invalid PRP" pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" +pci_nvme_err_unaligned_zone_cmd(uint8_t action, uint64_t slba, uint64_t zslba) "unaligned zone op 0x%"PRIx32", got slba=%"PRIu64", zslba=%"PRIu64"" +pci_nvme_err_invalid_zone_state_transition(uint8_t state, uint8_t action, uint64_t slba, uint8_t attrs) "0x%"PRIx32"->0x%"PRIx32", slba=%"PRIu64", attrs=0x%"PRIx32"" +pci_nvme_err_write_not_at_wp(uint64_t slba, uint64_t zone, uint64_t wp) "writing at slba=%"PRIu64", zone=%"PRIu64", but wp=%"PRIu64"" +pci_nvme_err_append_not_at_start(uint64_t slba, uint64_t zone) "appending at slba=%"PRIu64", but zone=%"PRIu64"" +pci_nvme_err_zone_write_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" +pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded" +pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" +pci_nvme_err_only_zoned_cmd_set_avail(void) "setting 001b CC.CSS, but only ZONED+NVM command set is enabled" pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" @@ -144,7 +163,9 @@ pci_nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_ pci_nvme_err_startfail_css(uint8_t css) "nvme_start_ctrl failed because invalid command set selected:%u" pci_nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero" pci_nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero" +pci_nvme_err_startfail_zasl_too_small(uint32_t zasl, uint32_t pagesz) "nvme_start_ctrl failed because zone append size limit %"PRIu32" is too small, needs to be >= %"PRIu32"" pci_nvme_err_startfail(void) "setting controller enable bit failed" +pci_nvme_err_invalid_mgmt_action(int action) "action=0x%"PRIx8"" # Traces for undefined behavior pci_nvme_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 27125c9d28..54bc93b6ab 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -489,6 +489,9 @@ enum NvmeIoCommands { NVME_CMD_COMPARE = 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM = 0x09, + NVME_CMD_ZONE_MGMT_SEND = 0x79, + NVME_CMD_ZONE_MGMT_RECV = 0x7a, + NVME_CMD_ZONE_APPEND = 0x7d, }; typedef struct QEMU_PACKED NvmeDeleteQ { @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult { } NvmeAerResult; typedef struct QEMU_PACKED NvmeCqe { - uint32_t result; - uint32_t rsvd; + union { + uint64_t result64; + uint32_t result32; + }; uint16_t sq_head; uint16_t sq_id; uint16_t cid; @@ -678,6 +683,7 @@ enum NvmeStatusCodes { NVME_SGL_DESCR_TYPE_INVALID = 0x0011, NVME_INVALID_USE_OF_CMB = 0x0012, NVME_CMD_SET_CMB_REJECTED = 0x002b, + NVME_INVALID_CMD_SET = 0x002c, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -702,6 +708,14 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO = 0x0182, + NVME_ZONE_BOUNDARY_ERROR = 0x01b8, + NVME_ZONE_FULL = 0x01b9, + NVME_ZONE_READ_ONLY = 0x01ba, + NVME_ZONE_OFFLINE = 0x01bb, + NVME_ZONE_INVALID_WRITE = 0x01bc, + NVME_ZONE_TOO_MANY_ACTIVE = 0x01bd, + NVME_ZONE_TOO_MANY_OPEN = 0x01be, + NVME_ZONE_INVAL_TRANSITION = 0x01bf, NVME_WRITE_FAULT = 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR = 0x0282, @@ -886,6 +900,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl { uint8_t vs[1024]; } NvmeIdCtrl; +typedef struct NvmeIdCtrlZoned { + uint8_t zasl; + uint8_t rsvd1[4095]; +} NvmeIdCtrlZoned; + enum NvmeIdCtrlOacs { NVME_OACS_SECURITY = 1 << 0, NVME_OACS_FORMAT = 1 << 1, @@ -1011,6 +1030,12 @@ typedef struct QEMU_PACKED NvmeLBAF { uint8_t rp; } NvmeLBAF; +typedef struct QEMU_PACKED NvmeLBAFE { + uint64_t zsze; + uint8_t zdes; + uint8_t rsvd9[7]; +} NvmeLBAFE; + #define NVME_NSID_BROADCAST 0xffffffff typedef struct QEMU_PACKED NvmeIdNs { @@ -1065,10 +1090,24 @@ enum NvmeNsIdentifierType { enum NvmeCsi { NVME_CSI_NVM = 0x00, + NVME_CSI_ZONED = 0x02, }; #define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) +typedef struct QEMU_PACKED NvmeIdNsZoned { + uint16_t zoc; + uint16_t ozcs; + uint32_t mar; + uint32_t mor; + uint32_t rrl; + uint32_t frl; + uint8_t rsvd20[2796]; + NvmeLBAFE lbafe[16]; + uint8_t rsvd3072[768]; + uint8_t vs[256]; +} NvmeIdNsZoned; + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat) ((dlfeat) & 0x08) @@ -1100,6 +1139,71 @@ enum NvmeIdNsDps { DPS_FIRST_EIGHT = 8, }; +enum NvmeZoneAttr { + NVME_ZA_FINISHED_BY_CTLR = 1 << 0, + NVME_ZA_FINISH_RECOMMENDED = 1 << 1, + NVME_ZA_RESET_RECOMMENDED = 1 << 2, + NVME_ZA_ZD_EXT_VALID = 1 << 7, +}; + +typedef struct QEMU_PACKED NvmeZoneReportHeader { + uint64_t nr_zones; + uint8_t rsvd[56]; +} NvmeZoneReportHeader; + +enum NvmeZoneReceiveAction { + NVME_ZONE_REPORT = 0, + NVME_ZONE_REPORT_EXTENDED = 1, +}; + +enum NvmeZoneReportType { + NVME_ZONE_REPORT_ALL = 0, + NVME_ZONE_REPORT_EMPTY = 1, + NVME_ZONE_REPORT_IMPLICITLY_OPEN = 2, + NVME_ZONE_REPORT_EXPLICITLY_OPEN = 3, + NVME_ZONE_REPORT_CLOSED = 4, + NVME_ZONE_REPORT_FULL = 5, + NVME_ZONE_REPORT_READ_ONLY = 6, + NVME_ZONE_REPORT_OFFLINE = 7, +}; + +enum NvmeZoneType { + NVME_ZONE_TYPE_RESERVED = 0x00, + NVME_ZONE_TYPE_SEQ_WRITE = 0x02, +}; + +enum NvmeZoneSendAction { + NVME_ZONE_ACTION_RSD = 0x00, + NVME_ZONE_ACTION_CLOSE = 0x01, + NVME_ZONE_ACTION_FINISH = 0x02, + NVME_ZONE_ACTION_OPEN = 0x03, + NVME_ZONE_ACTION_RESET = 0x04, + NVME_ZONE_ACTION_OFFLINE = 0x05, + NVME_ZONE_ACTION_SET_ZD_EXT = 0x10, +}; + +typedef struct QEMU_PACKED NvmeZoneDescr { + uint8_t zt; + uint8_t zs; + uint8_t za; + uint8_t rsvd3[5]; + uint64_t zcap; + uint64_t zslba; + uint64_t wp; + uint8_t rsvd32[32]; +} NvmeZoneDescr; + +enum NvmeZoneState { + NVME_ZONE_STATE_RESERVED = 0x00, + NVME_ZONE_STATE_EMPTY = 0x01, + NVME_ZONE_STATE_IMPLICITLY_OPEN = 0x02, + NVME_ZONE_STATE_EXPLICITLY_OPEN = 0x03, + NVME_ZONE_STATE_CLOSED = 0x04, + NVME_ZONE_STATE_READ_ONLY = 0x0D, + NVME_ZONE_STATE_FULL = 0x0E, + NVME_ZONE_STATE_OFFLINE = 0x0F, +}; + static inline void _nvme_check_size(void) { QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096); @@ -1119,9 +1223,14 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescr) != 64); } #endif From patchwork Tue Oct 13 21:42:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836123 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3664561C for ; Tue, 13 Oct 2020 21:47:05 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DF73420848 for ; Tue, 13 Oct 2020 21:47:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="n1rs4Oi5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF73420848 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36070 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS8N-00038N-7F for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:47:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47154) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS44-0005iK-V7; Tue, 13 Oct 2020 17:42:36 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45635) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS41-0001mO-Eu; Tue, 13 Oct 2020 17:42:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625353; x=1634161353; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S/2ZgrBTIYiF5BBDkyjZH5Hc7re+QfT2pFAKdyxs1m0=; b=n1rs4Oi5Ilfub0Gtfl4aAWUzy7S1HUUNE/BZ1GDUrPAWSX9176W4y9sq bKLoNLylPBKJr+Pbs1X1Dz99SLaLTxwh8bmYzbamo9tbnNUgHcrPgsE7d MsFDs1unDQaPK4EAd3pGrn1h+66tzdm8oQhtCKrkSATTO+yFl70M70q4E IddRld9ugbXbt7WbK5FBtLk2xNDdjTVtefrRopKTC10LZHM5B87E4hoqy vylRxNFkRJLvCLzvW6AkqnnoEDXD/g4FbPSj38XzQEvGINkKO/+N1IigU hn0rvEgYZuJM3ipVzha/DWFySCDZGq3JJp/yw+w3JL7FqrMUNqcyhV+mP Q==; IronPort-SDR: 0y9ynmE9E0jyYvrrrDHpfoC8PIXKFqJ+TcVJ2STo7uXIZ894+OJERmJLip2NhzFdkNAFNGZTXH zaIX80u8pnk1mT2BwLyQNDXBeA9pyaZTpw1zYwNLv6OeHs+YA+p+Mn1s04AzRUHC50ln1zFcfL hxNzPKp70PzT0eJtpHOI3hJzC+dlPNMxk4bP/Akw8aZXSlwldHY7IN2aXyVy6kmi+vXZyQvXlw +HbQmfD+0LofAFsj7uRAmBq2i1Q/BaODRDZiGfBmEFoNuGfsMaO7hfoMgaVjVPHsr9OmIu2rFa 9Ro= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185932" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:32 +0800 IronPort-SDR: W9qZawXKRL11RlKLzm5l/AVlLJgstGv6wctcv39EK8kRhO+w/E6SAGBcewm/uwTWrLjJREDE41 lUHAtQOjeJtd3/Kp6bxFruiVrToDGj3Mj+a7XqzzlGc7KQHbznmjWHSF89/tbG1d+31CDFgt4u vcEET5YDlxh+VjoJj1OU25ZOVh7i1dQDzeBKCzW0uRtgiq11F+vC+cY67rdLoO4MomcCYAtLsi sqG+udJUn3b7ICoPfJTD4v1FdteG3uJ31pH3Hf9mdMNKf+JtPpAIQJt/pderEoqac6qq6Qgs8Z BJY5yQBA0e3p0EimNzz3mjEd Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:06 -0700 IronPort-SDR: vk5Bp1k2zML1yGaYZxzHmqHR4jyBR9cn5yfPY8TU4bJFn+EHlt1G7JdyBv+mEU01b4HmVdijlr 1CxToGawrWk0PefiRfrFr097ooSrlri7m0RVrGB43vqH6Ooz1jGeVRqPUSZyiEQOVEHrDo2cvX FnC1hmBqicLttgK4RGgHoCu5awLRgW1lkBCpyCNhuejYAy6MjJBW755KRxYQ09QzWEuQWczJ08 gLSNkV/4cfrpXyIJGiS79FH2uxRxUetSOfWk/JO6RiA197xXo759AebEL88bk1aN1qZTxe3JvV zmw= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:30 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 06/11] hw/block/nvme: Introduce max active and open zone limits Date: Wed, 14 Oct 2020 06:42:07 +0900 Message-Id: <20201013214212.2152-7-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add two module properties, "max_active" and "max_open" to control the maximum number of zones that can be active or open. Once these variables are set to non-default values, these limits are checked during I/O and Too Many Active or Too Many Open command status is returned if they are exceeded. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 28 ++++++++++++- hw/block/nvme-ns.h | 41 +++++++++++++++++++ hw/block/nvme.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 166 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index fedfad595c..8d9e11eef2 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -118,6 +118,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) ns->zone_size_log2 = 63 - clz64(ns->zone_size); } + /* Make sure that the values of all ZNS properties are sane */ + if (ns->params.max_open_zones > nz) { + error_setg(errp, + "max_open_zones value %u exceeds the number of zones %u", + ns->params.max_open_zones, nz); + return -1; + } + if (ns->params.max_active_zones > nz) { + error_setg(errp, + "max_active_zones value %u exceeds the number of zones %u", + ns->params.max_active_zones, nz); + return -1; + } + return 0; } @@ -172,8 +186,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); /* MAR/MOR are zeroes-based, 0xffffffff means no limit */ - id_ns_z->mar = 0xffffffff; - id_ns_z->mor = 0xffffffff; + id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1); + id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1); id_ns_z->zoc = 0; id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; @@ -199,6 +213,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) uint32_t set_state; int i; + ns->nr_active_zones = 0; + ns->nr_open_zones = 0; + zone = ns->zone_array; for (i = 0; i < ns->num_zones; i++, zone++) { switch (nvme_get_zone_state(zone)) { @@ -209,6 +226,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); break; case NVME_ZONE_STATE_CLOSED: + nvme_aor_inc_active(ns); /* fall through */ default: continue; @@ -216,6 +234,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) if (zone->d.wp == zone->d.zslba) { set_state = NVME_ZONE_STATE_EMPTY; + } else if (ns->params.max_active_zones == 0 || + ns->nr_active_zones < ns->params.max_active_zones) { + set_state = NVME_ZONE_STATE_CLOSED; } else { set_state = NVME_ZONE_STATE_CLOSED; } @@ -224,6 +245,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) case NVME_ZONE_STATE_CLOSED: trace_pci_nvme_clear_ns_close(nvme_get_zone_state(zone), zone->d.zslba); + nvme_aor_inc_active(ns); QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); break; case NVME_ZONE_STATE_EMPTY: @@ -326,6 +348,8 @@ static Property nvme_ns_props[] = { DEFINE_PROP_SIZE("zone_capacity", NvmeNamespace, params.zone_cap_bs, 0), DEFINE_PROP_BOOL("cross_zone_read", NvmeNamespace, params.cross_zone_read, false), + DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 0), + DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 170cbb8cdc..b0633d0def 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -34,6 +34,8 @@ typedef struct NvmeNamespaceParams { bool cross_zone_read; uint64_t zone_size_bs; uint64_t zone_cap_bs; + uint32_t max_active_zones; + uint32_t max_open_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -56,6 +58,8 @@ typedef struct NvmeNamespace { uint64_t zone_capacity; uint64_t zone_array_size; uint32_t zone_size_log2; + int32_t nr_open_zones; + int32_t nr_active_zones; NvmeNamespaceParams params; } NvmeNamespace; @@ -123,6 +127,43 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline void nvme_aor_inc_open(NvmeNamespace *ns) +{ + assert(ns->nr_open_zones >= 0); + if (ns->params.max_open_zones) { + ns->nr_open_zones++; + assert(ns->nr_open_zones <= ns->params.max_open_zones); + } +} + +static inline void nvme_aor_dec_open(NvmeNamespace *ns) +{ + if (ns->params.max_open_zones) { + assert(ns->nr_open_zones > 0); + ns->nr_open_zones--; + } + assert(ns->nr_open_zones >= 0); +} + +static inline void nvme_aor_inc_active(NvmeNamespace *ns) +{ + assert(ns->nr_active_zones >= 0); + if (ns->params.max_active_zones) { + ns->nr_active_zones++; + assert(ns->nr_active_zones <= ns->params.max_active_zones); + } +} + +static inline void nvme_aor_dec_active(NvmeNamespace *ns) +{ + if (ns->params.max_active_zones) { + assert(ns->nr_active_zones > 0); + ns->nr_active_zones--; + assert(ns->nr_active_zones >= ns->nr_open_zones); + } + assert(ns->nr_active_zones >= 0); +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 2e663713c7..088df2e813 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -199,6 +199,26 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, } } +/* + * Check if we can open a zone without exceeding open/active limits. + * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5). + */ +static int nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn) +{ + if (ns->params.max_active_zones != 0 && + ns->nr_active_zones + act > ns->params.max_active_zones) { + trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones); + return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR; + } + if (ns->params.max_open_zones != 0 && + ns->nr_open_zones + opn > ns->params.max_open_zones) { + trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones); + return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR; + } + + return NVME_SUCCESS; +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr low = n->ctrl_mem.addr; @@ -1207,6 +1227,41 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, return status; } +static void nvme_auto_transition_zone(NvmeNamespace *ns, bool implicit, + bool adding_active) +{ + NvmeZone *zone; + + if (implicit && ns->params.max_open_zones && + ns->nr_open_zones == ns->params.max_open_zones) { + zone = QTAILQ_FIRST(&ns->imp_open_zones); + if (zone) { + /* + * Automatically close this implicitly open zone. + */ + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + nvme_aor_dec_open(ns); + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + } + } +} + +static uint16_t nvme_auto_open_zone(NvmeNamespace *ns, NvmeZone *zone) +{ + uint16_t status = NVME_SUCCESS; + uint8_t zs = nvme_get_zone_state(zone); + + if (zs == NVME_ZONE_STATE_EMPTY) { + nvme_auto_transition_zone(ns, true, true); + status = nvme_aor_check(ns, 1, 1); + } else if (zs == NVME_ZONE_STATE_CLOSED) { + nvme_auto_transition_zone(ns, true, false); + status = nvme_aor_check(ns, 0, 1); + } + + return status; +} + static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, bool failed) { @@ -1243,7 +1298,11 @@ static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, switch (nvme_get_zone_state(zone)) { case NVME_ZONE_STATE_IMPLICITLY_OPEN: case NVME_ZONE_STATE_EXPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_EMPTY: nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); /* fall through */ @@ -1272,7 +1331,10 @@ static uint64_t nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, zs = nvme_get_zone_state(zone); switch (zs) { case NVME_ZONE_STATE_EMPTY: + nvme_aor_inc_active(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_inc_open(ns); nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); } } @@ -1378,6 +1440,11 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) goto invalid; } + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } @@ -1436,6 +1503,11 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) slba = zone->w_ptr; } + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } else { status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); @@ -1537,9 +1609,27 @@ static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, uint8_t state) { + uint16_t status; + switch (state) { case NVME_ZONE_STATE_EMPTY: + nvme_auto_transition_zone(ns, false, true); + status = nvme_aor_check(ns, 1, 0); + if (status != NVME_SUCCESS) { + return status; + } + nvme_aor_inc_active(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + status = nvme_aor_check(ns, 0, 1); + if (status != NVME_SUCCESS) { + if (state == NVME_ZONE_STATE_EMPTY) { + nvme_aor_dec_active(ns); + } + return status; + } + nvme_aor_inc_open(ns); + /* fall through */ case NVME_ZONE_STATE_IMPLICITLY_OPEN: nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); /* fall through */ @@ -1561,6 +1651,7 @@ static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); /* fall through */ case NVME_ZONE_STATE_CLOSED: @@ -1582,7 +1673,11 @@ static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_EMPTY: zone->w_ptr = nvme_zone_wr_boundary(zone); zone->d.wp = zone->w_ptr; @@ -1608,7 +1703,11 @@ static uint16_t nvme_reset_zone(NvmeNamespace *ns, NvmeZone *zone, switch (state) { case NVME_ZONE_STATE_EXPLICITLY_OPEN: case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_aor_dec_open(ns); + /* fall through */ case NVME_ZONE_STATE_CLOSED: + nvme_aor_dec_active(ns); + /* fall through */ case NVME_ZONE_STATE_FULL: zone->w_ptr = zone->d.zslba; zone->d.wp = zone->w_ptr; From patchwork Tue Oct 13 21:42:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836127 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 88B07139F for ; Tue, 13 Oct 2020 21:48:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 284B420872 for ; Tue, 13 Oct 2020 21:48:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="l0x7DldM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 284B420872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:39726 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSSAD-0004io-UO for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:48:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS46-0005jh-AZ; Tue, 13 Oct 2020 17:42:39 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45631) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS43-0001lt-K6; Tue, 13 Oct 2020 17:42:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625356; x=1634161356; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qC0MGM8sAyPys4yE4bYRH9iCuqrzdx7FRQ1XqgRn37k=; b=l0x7DldMXC8dqe2QACzmYIqVpwE0SDnrlDHYy+RUQugOK9fK9saEmBWw aE/YTrMjVI/KMrOuv7LCTD/RpLPLk5pd6rvx0cHxNQvsifpC0UZfGktEt re/MQaRGDalH4aW4WLwFjWYpYuweGi+0Ph/5LwgXXABroGp70zbhKoti1 gcPMTf9SilXF9EYtW7SSUSg4u4c8ttEfgRuL753W6UBzMeUC1Xu13tCqG mpWdjGlteI74GNQE2mce3+JxLCa+g7hy/60Y4V4AN4GJwNfguOGlnmMSf DyZurWDxYTIwCxoQHzgJdRLBQAlwhtPWQhTnZQPRlTFULkmeTjVKk9vs1 w==; IronPort-SDR: xzOBvQjSRKkO0V1T/Mwz6/F0RmNV8b+60YIi/23MxitbOIclrK/YrY6619IFBQ6cOL5WDne4Ah MbP03AYJoEwVkbXAsRtLzWpUnd596YOsVYu0GDa37wEsBKfSOJaBljW6acvdo8W4jxmm+SEFgF eN3fwSBAZPMFFzb7A2GfF26fWOSUcjyjgPuTBF5w8lHKkZltvAX0yjhJDzdXK/QIvVZ7V8NR6m nGIJZ9nWWSke0uSCGgI+QeG1jzwAIZyPWda02KCxf8MiDjiirYlWdcyFEFsuNvvpBPGSrQZjc2 Elw= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185935" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:34 +0800 IronPort-SDR: 1/QFurA8Tx4E1O8OJ9L4Hgoo+v8fLQXM+8iWBkzU08C5SmYrcW7NIHsVrUlXiv5TpMtHMVqLs6 O5BTR1xugwyxflSCmyzV8ORm6XhwqGtKcbr798sniS5MWQJrb/tBwvmgOxJpBNMFlljpB73Rdp 0i2CdtD489r7OesDSGzT1j6BDxK16PiituCQbnJWUebOvDDmgbHz7DwUsxfcoYBQghguyc/P68 TDqQSzmPbpSlpbGwNNolWHR7QMoGZnmJzMV8/e54ujKPGNPtHzpqeq1q4VoiFvu9x0d7NCdl1f BBiixmBzsUm+PG26T29ZOwVV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:08 -0700 IronPort-SDR: mJXIoF6Zz21D2X/MMq9NBVHRaPpOta6nWbejYn7lSL8zF0K5TKGlTlsWj7TClITayWlCOwh9LU qE14SoZnHygdjrjGFleJKF3jW6TMOibv+uFdL3kr/UmVLKcgQuOX2t5Hw+v4DNW1YILq/iFHg0 S0h3I+vgpNSYna/M1PKzBRb36GZxHQ4zbPPks6VnCW0nqn6zL5rpVf5iYQnSHmWGSn6vB5fU0s wlRPMU3dJajVpgORPJv3NeA2Lm64Y3CzmN8GHW/bFkf2wSObYJ0WOM1bDHWKh6PQXRLVky2oN7 UVg= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:32 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 07/11] hw/block/nvme: Support Zone Descriptor Extensions Date: Wed, 14 Oct 2020 06:42:08 +0900 Message-Id: <20201013214212.2152-8-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Zone Descriptor Extension is a label that can be assigned to a zone. It can be set to an Empty zone and it stays assigned until the zone is reset. This commit adds a new optional module property, "zone_descr_ext_size". Its value must be a multiple of 64 bytes. If this value is non-zero, it becomes possible to assign extensions of that size to any Empty zones. The default value for this property is 0, therefore setting extensions is disabled by default. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen --- hw/block/nvme-ns.c | 14 ++++++++++-- hw/block/nvme-ns.h | 8 +++++++ hw/block/nvme.c | 51 +++++++++++++++++++++++++++++++++++++++++-- hw/block/trace-events | 2 ++ 4 files changed, 71 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 8d9e11eef2..255ded2b43 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -143,6 +143,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns) int i; ns->zone_array = g_malloc0(ns->zone_array_size); + if (ns->params.zd_extension_size) { + ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * + ns->num_zones); + } QTAILQ_INIT(&ns->exp_open_zones); QTAILQ_INIT(&ns->imp_open_zones); @@ -192,7 +196,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); - id_ns_z->lbafe[lba_index].zdes = 0; + id_ns_z->lbafe[lba_index].zdes = + ns->params.zd_extension_size >> 6; /* Units of 64B */ ns->csi = NVME_CSI_ZONED; ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); @@ -232,7 +237,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns) continue; } - if (zone->d.wp == zone->d.zslba) { + if (zone->d.za & NVME_ZA_ZD_EXT_VALID) { + set_state = NVME_ZONE_STATE_CLOSED; + } else if (zone->d.wp == zone->d.zslba) { set_state = NVME_ZONE_STATE_EMPTY; } else if (ns->params.max_active_zones == 0 || ns->nr_active_zones < ns->params.max_active_zones) { @@ -320,6 +327,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (ns->params.zoned) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); + g_free(ns->zd_extensions); } } @@ -350,6 +358,8 @@ static Property nvme_ns_props[] = { params.cross_zone_read, false), DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 0), DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), + DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace, + params.zd_extension_size, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index b0633d0def..2d70a13701 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -36,6 +36,7 @@ typedef struct NvmeNamespaceParams { uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; + uint32_t zd_extension_size; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -58,6 +59,7 @@ typedef struct NvmeNamespace { uint64_t zone_capacity; uint64_t zone_array_size; uint32_t zone_size_log2; + uint8_t *zd_extensions; int32_t nr_open_zones; int32_t nr_active_zones; @@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ + return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size]; +} + static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 088df2e813..18547722af 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1747,6 +1747,26 @@ static bool nvme_cond_offline_all(uint8_t state) return state == NVME_ZONE_STATE_READ_ONLY; } +static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + uint16_t status; + + if (state == NVME_ZONE_STATE_EMPTY) { + nvme_auto_transition_zone(ns, false, true); + status = nvme_aor_check(ns, 1, 0); + if (status != NVME_SUCCESS) { + return status; + } + nvme_aor_inc_active(ns); + zone->d.za |= NVME_ZA_ZD_EXT_VALID; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, uint8_t); typedef bool (*need_to_proc_zone_t)(uint8_t); @@ -1787,6 +1807,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) uint8_t action, state; bool all; NvmeZone *zone; + uint8_t *zd_ext; action = dw13 & 0xff; all = dw13 & 0x100; @@ -1841,7 +1862,22 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) case NVME_ZONE_ACTION_SET_ZD_EXT: trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - return NVME_INVALID_FIELD | NVME_DNR; + if (all || !ns->params.zd_extension_size) { + return NVME_INVALID_FIELD | NVME_DNR; + } + zd_ext = nvme_get_zd_extension(ns, zone_idx); + status = nvme_dma(n, zd_ext, ns->params.zd_extension_size, + DMA_DIRECTION_TO_DEVICE, req); + if (status) { + trace_pci_nvme_err_zd_extension_map_error(zone_idx); + return status; + } + + status = nvme_set_zd_ext(ns, zone, state); + if (status == NVME_SUCCESS) { + trace_pci_nvme_zd_extension_set(zone_idx); + return status; + } break; default: @@ -1919,7 +1955,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } - if (zra == NVME_ZONE_REPORT_EXTENDED) { + if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1931,6 +1967,9 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) partial = (dw13 >> 16) & 0x01; zone_entry_sz = sizeof(NvmeZoneDescr); + if (zra == NVME_ZONE_REPORT_EXTENDED) { + zone_entry_sz += ns->params.zd_extension_size; + } max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; buf = g_malloc0(len); @@ -1962,6 +2001,14 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) z->wp = cpu_to_le64(~0ULL); } + if (zra == NVME_ZONE_REPORT_EXTENDED) { + if (zs->d.za & NVME_ZA_ZD_EXT_VALID) { + memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), + ns->params.zd_extension_size); + } + buf_p += ns->params.zd_extension_size; + } + zone_idx++; } diff --git a/hw/block/trace-events b/hw/block/trace-events index af53e31fcb..962084e40c 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -96,6 +96,7 @@ pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, sl pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_zd_extension_set(uint32_t zone_idx) "set descriptor extension for zone_idx=%"PRIu32"" pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" pci_nvme_clear_ns_full(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Full state" @@ -127,6 +128,7 @@ pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slb pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded" pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded" +pci_nvme_err_zd_extension_map_error(uint32_t zone_idx) "can't map descriptor extension for zone_idx=%"PRIu32"" pci_nvme_err_invalid_effects_log_offset(uint64_t ofs) "commands supported and effects log offset must be 0, got %"PRIu64"" pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" pci_nvme_err_only_zoned_cmd_set_avail(void) "setting 001b CC.CSS, but only ZONED+NVM command set is enabled" From patchwork Tue Oct 13 21:42:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2E74139F for ; Tue, 13 Oct 2020 21:50:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8412C20872 for ; Tue, 13 Oct 2020 21:50:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="YXnQteqd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8412C20872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:44350 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSSBi-0006lh-L4 for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:50:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47208) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS49-0005kY-U2; Tue, 13 Oct 2020 17:42:45 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45657) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS46-0001p9-9f; Tue, 13 Oct 2020 17:42:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625358; x=1634161358; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HYiq0JK4dRwQV6ig2IjA1t2BEXwoUc+F9PcNVyweUws=; b=YXnQteqdImDjgJSDAgXUowp48JMG3Go8D8egOSjS9Ss1W+EGGxV2KQ3g Y0sX+IUebZFVc5LXtoLuYnOox+7vPT+0BY+WjNy1MCkg0aUlx3xb78Auh L6nJ7ZJOlfgThSYo+8LPAhXMHIOCoEm02VYH3Fg0y5HKKgZsCpryceDY4 rU7BSCKyL1GLCD9xgEf5HMP+CGrX6HRyzM/MBdf3QAtQfoEJyxVZojVm7 FlI1yZ4R4gUhg+cxVbZwTqK8p6FygXOR5ViROwNzv0Wux+k18ifu0FahV AW7BTa7wnOAKKnoY8EEPbqn+JrKkRVCQcQyuS2JYrJzAe1XOiDUTRDqMp A==; IronPort-SDR: ZbzC3cobtEbj8wUdkABcuXSyIrTQeUjF0bcLOkx+XDKn1vJK19x/VjrfHgAUJqOwOx9L2OQWiN rFj7kKlBc0zZjpKdqwAubjP/5mrwHkhIHwoLuU9JVnRn3Dy2pAo27urEH11k/3vY6IkGXuN3LC GENWPTEkFDajUYybuRAgc+cjYyzVbL7UGoDKjT6f9yFlYZmX252L+b+QZ6YvKSl614WqLWfQ28 jKbnxh2CicRAO6I8xqVTL7ikhJaGRKB67nKKK/w6q0UWxhcAU8Vib63pzaSRVTcprDEPcEQE3l toQ= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185940" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:37 +0800 IronPort-SDR: uUm5oWQTKCbO13xx63WkqMsbjmulaubNm3lpwD2EABtfw4iX8lmbNXVfyvX/BYXgAXj3swnko8 IfJYLXhm5pzDkpR81cntYt2nDZM4tlpwL9ykWdV17jbIIH1WfHxIQRlg2dV7ThCXFSzHK0CNT0 58meP3g/zdz2Jnzo+XYlQWIOa2gjwHSkyK8n8fj9gtz86NoZ7y6oW7QFCWxjq+r4aCjtMRwFSC vZNaIvTupyoTj12mzfydKN8l5Gb8cVkSDhwOs5jhgQrSKd3TDW1GaxUh152jVdgDoLswoqBPlP 293I5lTs29nuAqaLYaJStEsl Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:10 -0700 IronPort-SDR: EvFUMEPrO/1mkt6j+ljpg8Z8ju8ltTzQTtbc0sVUPxNaIc0vLJaIXNSpRHVo2YCkPZmsKuppw6 Ty6UD7DzshCsu+v47/xUAs/J5lpfC70xa9MLqT3u2TXviDYj90LR7LoTnBuhVxpE2JSP4WKf3o jr+eVGRW+MS5lkWF13ul/RiKu1haFOqVSdLsSkhtw6Mb872MW1myMZE130c+PcIvoQQWtVM43x gTtTFHTT1WXpVBBCFvdOQUYf8ufhPs/mVBMQiUE80rTWbpgkB+xeBgiABuXDffD3KSIBwRTwfb oOA= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:35 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 08/11] hw/block/nvme: Add injection of Offline/Read-Only zones Date: Wed, 14 Oct 2020 06:42:09 +0900 Message-Id: <20201013214212.2152-9-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" ZNS specification defines two zone conditions for the zones that no longer can function properly, possibly because of flash wear or other internal fault. It is useful to be able to "inject" a small number of such zones for testing purposes. This commit defines two optional device properties, "offline_zones" and "rdonly_zones". Users can assign non-zero values to these variables to specify the number of zones to be initialized as Offline or Read-Only. The actual number of injected zones may be smaller than the requested amount - Read-Only and Offline counts are expected to be much smaller than the total number of zones on a drive. Signed-off-by: Dmitry Fomichev --- hw/block/nvme-ns.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme-ns.h | 2 ++ 2 files changed, 66 insertions(+) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 255ded2b43..d050f97909 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -21,6 +21,7 @@ #include "sysemu/sysemu.h" #include "sysemu/block-backend.h" #include "qapi/error.h" +#include "crypto/random.h" #include "hw/qdev-properties.h" #include "hw/qdev-core.h" @@ -132,6 +133,32 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) return -1; } + if (ns->params.zd_extension_size) { + if (ns->params.zd_extension_size & 0x3f) { + error_setg(errp, + "zone descriptor extension size must be a multiple of 64B"); + return -1; + } + if ((ns->params.zd_extension_size >> 6) > 0xff) { + error_setg(errp, "zone descriptor extension size is too large"); + return -1; + } + } + + if (ns->params.max_open_zones < nz) { + if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) { + error_setg(errp, "offline_zones value %u is too large", + ns->params.nr_offline_zones); + return -1; + } + if (ns->params.nr_rdonly_zones > + nz - ns->params.max_open_zones - ns->params.nr_offline_zones) { + error_setg(errp, "rdonly_zones value %u is too large", + ns->params.nr_rdonly_zones); + return -1; + } + } + return 0; } @@ -140,7 +167,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns) uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; + uint32_t rnd; int i; + uint16_t zs; ns->zone_array = g_malloc0(ns->zone_array_size); if (ns->params.zd_extension_size) { @@ -167,6 +196,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns) zone->w_ptr = start; start += zone_size; } + + /* If required, make some zones Offline or Read Only */ + + for (i = 0; i < ns->params.nr_offline_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE); + } else { + i--; + } + } + + for (i = 0; i < ns->params.nr_rdonly_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE && + zs != NVME_ZONE_STATE_READ_ONLY) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY); + } else { + i--; + } + } } static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, @@ -360,6 +420,10 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0), DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace, params.zd_extension_size, 0), + DEFINE_PROP_UINT32("offline_zones", NvmeNamespace, + params.nr_offline_zones, 0), + DEFINE_PROP_UINT32("rdonly_zones", NvmeNamespace, + params.nr_rdonly_zones, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 2d70a13701..d65d8b0930 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -37,6 +37,8 @@ typedef struct NvmeNamespaceParams { uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; + uint32_t nr_offline_zones; + uint32_t nr_rdonly_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { From patchwork Tue Oct 13 21:42:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836119 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AAD961C for ; Tue, 13 Oct 2020 21:44:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E347020872 for ; Tue, 13 Oct 2020 21:44:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="oq4gb43M" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E347020872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56368 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS5t-00085b-Um for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:44:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47240) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS4C-0005l4-VT; Tue, 13 Oct 2020 17:42:47 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45631) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS47-0001lt-Vy; Tue, 13 Oct 2020 17:42:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625360; x=1634161360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fQR5BxQm0BmUBslyj3C9Wad69LMH6turCoV2zcDgCMo=; b=oq4gb43MXzzTLckiQqL212AMQPffzBu8H8e+RwP4kU0cqL9Q9+6tkheB dQHuZc5vQq1c60fKmCuKR2UMjAvM0p6plYCYi3JIMdVvyU7C3kx29RTyt lV7bWBehFtFBFtDv0qX9P7QWjTCbNATM0L1T2rlwDKg2TzIhfmpaOauDK YGN4JWMOcn4nxn6EueznT/Nyuchullng0aMXTVCEnnPHky2b+gdXoF+zy nGDVTEW/q3r9lu9Mqk4bC+hQR8Hs1ctLw3OqYyNXnV+S8y7gWrWciltg0 YoxJiWbVWibKGvyQYbF/2wqvUFWFlv68QOTMOVsVZsgmVy+kJmA2FRDN8 w==; IronPort-SDR: aOHftTMkJbZYLWeQNNaz/xtfPsFqfKYGh/jcwg7CQ0mu7SH5H64VKkQRwTvNk35EAU0PdNE1EJ Jw+RkAXnWEfAPtp6AgKFM3VG403U/987PERw2PyByJ2TbyAyLM570/AGVHP8/lmSnbOm//gKTH qZwQLgf/7T8ez2F8My70xOOCtPfDX+E4dXPvZ1qgmwzNniyQrYXhKW1AzX3QWwg14FDVNANyRu IHqN6pL85kNuRskvCT+Kv3xgm0p41nfaW6iSae0oqG2TOyPL/mfXHsIB0G8ZtQAHQk4j4dnDx8 moc= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185941" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:39 +0800 IronPort-SDR: 53YJsNcf9wMimJoFMZkhefs0C6lJBy0KILikrZhrxvS1iwAnL1Nvs2peusjh3sEDgsxEqZASUi dI9SzkNQKmg9EWz1x0Skf3l/0HHiRoZ3FOIfE4oj+S1tqovh5SY8RJPgbEmcc4Zv4Zf7qyPQBB UtZrBRg2H71cmVvrRkaKCfzdnvSQ3CnDmUdOS+LmO/PytxsHGlaywkaFH8JuSNZznGGPTOm+fJ 9jXLB3RPeqA8cveOIlPxb/cXh/OEfal4XOJ1wZJhI0uw1QTgV+ibkJbbcf3YXwGwOvezBh8KkZ t62oFgp8fiCyNfBTVlLWhy+E Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:12 -0700 IronPort-SDR: zWC1z8tx5LBTMpcsVSDvTsN0M8Y0eCkMtsUd86wpo4OxXmizZqT0Ug1N/t3jR3HT/DfYbbxw1P lIwLbLbYu32pa2A2uyrIV6gLnRdFTxupBP31ugE6tQ9VTSfAN++hMTEW9HCb7wQJn4C+5j2csJ 3X7OMIPudJtsbj/X0kl2F53ONsa8fl8D7DbMQ+1qCyf/tiNJHPKEWxtYDvlbRMVuY1L18n14rp M7ttE+lLNaLM9ZG2fp/D+LjihkxQOIqa7d/pNCQc2XRF+izeT1hHGUgbxmSTpnrdfoT9eNVcH4 CgE= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:37 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 09/11] hw/block/nvme: Document zoned parameters in usage text Date: Wed, 14 Oct 2020 06:42:10 +0900 Message-Id: <20201013214212.2152-10-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Added brief descriptions of the new device properties that are now available to users to configure features of Zoned Namespace Command Set in the emulator. This patch is for documentation only, no functionality change. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 41 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 18547722af..41caf35430 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,7 +9,7 @@ */ /** - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e + * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e * * https://nvmexpress.org/developers/nvme-specification/ */ @@ -23,7 +23,8 @@ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ * mdts= - * -device nvme-ns,drive=,bus=bus_name,nsid= + * -device nvme-ns,drive=,bus=,nsid=, \ + * zoned= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. @@ -49,6 +50,42 @@ * completion when there are no oustanding AERs. When the maximum number of * enqueued events are reached, subsequent events will be dropped. * + * Setting `zoned` to true selects Zoned Command Set at the namespace. + * In this case, the following options are available to configure zoned + * operation: + * zone_size= + * The number may be followed by K, M, G as in kilo-, mega- or giga. + * + * zone_capacity= + * The value 0 (default) forces zone capacity to be the same as zone + * size. The value of this property may not exceed zone size. + * + * zone_descr_ext_size= + * This value needs to be specified in 64B units. If it is zero, + * namespace(s) will not support zone descriptor extensions. + * + * max_active= + * + * max_open= + * + * zone_append_size_limit= + * The maximum I/O size that can be supported by Zone Append + * command. Since internally this this value is maintained as + * ZASL = log2( / ), some + * values assigned to this property may be rounded down and + * result in a lower maximum ZA data size being in effect. + * By setting this property to 0, user can make ZASL to be + * equial to MDTS. + * + * offline_zones= + * + * rdonly_zones= + * + * cross_zone_read= + * + * fill_pattern= + * The byte pattern to return for any portions of unwritten data + * during read. */ #include "qemu/osdep.h" From patchwork Tue Oct 13 21:42:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836125 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2A0961C for ; Tue, 13 Oct 2020 21:47:51 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A2D5B20872 for ; Tue, 13 Oct 2020 21:47:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="fX35jyos" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A2D5B20872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:37164 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSS98-0003cX-MB for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:47:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47260) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS4G-0005lI-1X; Tue, 13 Oct 2020 17:42:49 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45657) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS4A-0001p9-QD; Tue, 13 Oct 2020 17:42:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625363; x=1634161363; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2tvzvHlyWyqcpNkDR+W2zhp5lJ8kigB3/iLnx2zDj+I=; b=fX35jyosm0lW8FyRYjydUl0EGdSXBt0sWYM+BsY0RPiFDyyzyE85NcKa fvGqGVgspeAdy2zhruDwn4vIPYlesYmQFi5wCdbrqaaI1WPPJUVzFvyP7 G703eidcg75GYT1C78kRCsL1k7+mL7ZyE1HBOCwJPTMACnOshwJGw86ZF FS/l7ASdyYbSV7xLSazW96zrBXK3HeVOKbcvVmDy3WFJoG8Gt3yF5Ng4B C0Tw2W3HnHOsjT5WLZguy3Ryf9fq1jeRmzT+Ev4tKdZllYn4X/1g3J1K5 Rl/pG1inXtoVMX0yiVLBNSZX54uLzWTWNmQyrcgYVImqdDFC50/b2l2W0 w==; IronPort-SDR: qqr3T9r9iv+3XsmeVL+sWmuNWzV/BaKDkBA0ujjuxDo4dHhckpQJf5LNutffXrR/whz484Y9az BKczfA/P5unKrhtVVrdnlqDLjT8y8KypTYNxrOcZraNA3iQRBJNq5V4gNPqVfOXMO+9AWpzwhF wi2Jq+mlCGn+ma7I6O2cCP7ZJeFc/G3Ha79lyNKxESBGwMGuYnnjNgqZNoOiz66k3/uUxHJbIh PBk3DOlWC2txL6pb0OW/AwIDad9Qv8sEy4sXBoF1P79wNXNk13b+Q2MgN5+c6KxJNRKNpbqX/f KV4= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185943" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:41 +0800 IronPort-SDR: /384elzfAH/qNtMdmvMGUFPujIK704DGhDZ60BgGHRZII2FYvvLGp/qUJdUYDwXqFNRxb7cX7d fg60Ev+GO+OQE2xrs23i9x8YttBXB//+gdePGx1vHB0NF0IzHJw8GQUNg8vPbYj+WtpUrRTff0 ZGBUG+p6WbaZ+bC3TMuN3jcOeHvQLu4B3hXWtbTHATcpzv5yDECBUQpHB1i6lgWq9cPvT8hRKk nelO44lI7yP20twD0HKgGzY+FvVnZd5IfH4uMRVHxWQpG2o5WlohRsLoOB6pQCUDf6KQ3s3QZx Jft2LF7MsGViUdPV61/1Sl8z Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:14 -0700 IronPort-SDR: J4sqloMtH1iWVsgzR/sel2qw8NhGucBFo762gTkqJDJ3TYkLNa+zcmEI5a219DOQxAQ+vA841a KU19e3OHWUnwD+2o6tlQIYFTGAIHUgQeZbCEWKtVJ8C4cQeHm8q5ecH471sUu/mgMjHrMY0hqo fGOwMGTnVWc+vwt7niGQb5JllG/CUUmfx6LuPEbTtgV0YWWr1xVIZTY05zgMUxkuq5AsqgmKWQ 3hWgpvpBZ1EjPTK4nXn66PqBsPe+lk7DzumwoeRnGdwXJYgSPvcLJYiXWFwOr4YF1FByGDfrZh 3KY= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:39 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 10/11] hw/block/nvme: Separate read and write handlers Date: Wed, 14 Oct 2020 06:42:11 +0900 Message-Id: <20201013214212.2152-11-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" With ZNS support in place, the majority of code in nvme_rw() has become read- or write-specific. Move these parts to two separate handlers, nvme_read() and nvme_write() to make the code more readable and to remove multiple is_write checks that so far existed in the i/o path. This is a refactoring patch, no change in functionality. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 191 +++++++++++++++++++++++++----------------- hw/block/trace-events | 3 +- 2 files changed, 114 insertions(+), 80 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 41caf35430..1186c16b50 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1162,10 +1162,10 @@ typedef struct NvmeReadFillCtx { uint32_t post_rd_fill_nlb; } NvmeReadFillCtx; -static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone, - uint64_t slba, uint32_t nlb, - NvmeReadFillCtx *rfc) +static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba, + uint32_t nlb, NvmeReadFillCtx *rfc) { + NvmeZone *zone = nvme_get_zone_by_slba(ns, slba); NvmeZone *next_zone; uint64_t bndry = nvme_zone_rd_boundary(ns, zone); uint64_t end = slba + nlb, wp1, wp2; @@ -1449,6 +1449,86 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + uint64_t slba = le64_to_cpu(rw->slba); + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; + uint32_t fill_len; + uint64_t data_size = nvme_l2b(ns, nlb); + uint64_t data_offset, fill_ofs; + NvmeReadFillCtx rfc; + BlockBackend *blk = ns->blkconf.blk; + uint16_t status; + + trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba); + + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + goto invalid; + } + + status = nvme_check_bounds(n, ns, slba, nlb); + if (status) { + trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); + goto invalid; + } + + if (ns->params.zoned) { + status = nvme_check_zone_read(ns, slba, nlb, &rfc); + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); + goto invalid; + } + } + + status = nvme_map_dptr(n, data_size, req); + if (status) { + goto invalid; + } + + if (ns->params.zoned) { + if (rfc.pre_rd_fill_nlb) { + fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); + fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); + nvme_fill_read_data(req, fill_ofs, fill_len, + n->params.fill_pattern); + } + if (!rfc.read_nlb) { + /* No backend I/O necessary, only needed to fill the buffer */ + req->status = NVME_SUCCESS; + return NVME_SUCCESS; + } + if (rfc.post_rd_fill_nlb) { + req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); + req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); + } else { + req->fill_len = 0; + } + slba = rfc.read_slba; + data_size = nvme_l2b(ns, rfc.read_nlb); + } + + data_offset = nvme_l2b(ns, slba); + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_READ); + if (req->qsg.sg) { + req->aiocb = dma_blk_read(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } else { + req->aiocb = blk_aio_preadv(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); + } + return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); + return status | NVME_DNR; +} + static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; @@ -1495,25 +1575,20 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; - uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t slba = le64_to_cpu(rw->slba); + uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t data_size = nvme_l2b(ns, nlb); - uint64_t data_offset, fill_ofs; - + uint64_t data_offset; NvmeZone *zone; - uint32_t fill_len; - NvmeReadFillCtx rfc; - bool is_write = rw->opcode == NVME_CMD_WRITE || append; - enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; BlockBackend *blk = ns->blkconf.blk; uint16_t status; - trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode), - nvme_nsid(ns), nlb, data_size, slba); + trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), + nvme_nsid(ns), nlb, data_size, slba); status = nvme_check_mdts(n, data_size); if (status) { @@ -1530,29 +1605,21 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) if (ns->params.zoned) { zone = nvme_get_zone_by_slba(ns, slba); - if (is_write) { - status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); - if (status != NVME_SUCCESS) { - goto invalid; - } - - if (append) { - slba = zone->w_ptr; - } - - status = nvme_auto_open_zone(ns, zone); - if (status != NVME_SUCCESS) { - goto invalid; - } - - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } else { - status = nvme_check_zone_read(ns, zone, slba, nlb, &rfc); - if (status != NVME_SUCCESS) { - trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); - goto invalid; - } + status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); + if (status != NVME_SUCCESS) { + goto invalid; } + + status = nvme_auto_open_zone(ns, zone); + if (status != NVME_SUCCESS) { + goto invalid; + } + + if (append) { + slba = zone->w_ptr; + } + + req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); } else if (append) { trace_pci_nvme_err_invalid_opc(rw->opcode); status = NVME_INVALID_OPCODE; @@ -1566,56 +1633,21 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append) goto invalid; } - if (ns->params.zoned) { - if (is_write) { - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } else { - if (rfc.pre_rd_fill_nlb) { - fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); - fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); - nvme_fill_read_data(req, fill_ofs, fill_len, - n->params.fill_pattern); - } - if (!rfc.read_nlb) { - /* No backend I/O necessary, only needed to fill the buffer */ - req->status = NVME_SUCCESS; - return NVME_SUCCESS; - } - if (rfc.post_rd_fill_nlb) { - req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); - req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); - } else { - req->fill_len = 0; - } - slba = rfc.read_slba; - data_size = nvme_l2b(ns, rfc.read_nlb); - } - } - data_offset = nvme_l2b(ns, slba); - block_acct_start(blk_get_stats(blk), &req->acct, data_size, acct); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); if (req->qsg.sg) { - if (is_write) { - req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); - } else { - req->aiocb = dma_blk_read(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); - } + req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); } else { - if (is_write) { - req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); - } else { - req->aiocb = blk_aio_preadv(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); - } + req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); } return NVME_NO_COMPLETE; invalid: - block_acct_invalid(blk_get_stats(blk), acct); + block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); return status | NVME_DNR; } @@ -2096,10 +2128,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_WRITE_ZEROES: return nvme_write_zeroes(n, req); case NVME_CMD_ZONE_APPEND: - return nvme_rw(n, req, true); + return nvme_write(n, req, true); case NVME_CMD_WRITE: + return nvme_write(n, req, false); case NVME_CMD_READ: - return nvme_rw(n, req, false); + return nvme_read(n, req); case NVME_CMD_ZONE_MGMT_SEND: return nvme_zone_mgmt_send(n, req); case NVME_CMD_ZONE_MGMT_RECV: diff --git a/hw/block/trace-events b/hw/block/trace-events index 962084e40c..7ee90a50c3 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -40,7 +40,8 @@ pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, pci_nvme_map_sgl(uint16_t cid, uint8_t typ, uint64_t len) "cid %"PRIu16" type 0x%"PRIx8" len %"PRIu64"" pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'" pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'" -pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" +pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" +pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" From patchwork Tue Oct 13 21:42:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 11836133 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E126139F for ; Tue, 13 Oct 2020 21:52:49 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D4D6B20872 for ; Tue, 13 Oct 2020 21:52:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="EvyngeXf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D4D6B20872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:49104 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kSSDv-0000Wj-Sv for patchwork-qemu-devel@patchwork.kernel.org; Tue, 13 Oct 2020 17:52:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47278) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS4I-0005ls-OR; Tue, 13 Oct 2020 17:42:54 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:45668) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kSS4D-0001pq-3Y; Tue, 13 Oct 2020 17:42:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1602625365; x=1634161365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Jmy6lOf0uhPRj3UWh5BdGhIe9Lpks0HLFlW28E7Zylw=; b=EvyngeXf0gRCNsfAxRoRid59TLEilYc/jbGlN73vX9I855yAEZ13ZvlD ASokwTcZXi0sgPA3ISqhcUolvUW9QryJXhNag/Yt6crnrO9AxmIwsI+lw B1dn2Jmw+88frTCJtuefh+B5cM+011u3LoIU1qNqWvJo00yGWHiKkSs7L Kjjw9nj70cS889JKPDCOFqJmetzEOFb4m4mBNYP7JLsmnmfQOsdzGjd+c dSrWeuh9QWK8ByxmVkHH2hlGZiRzrWsoBrtFp47B8teCtlADy7re6Ih9I 09eB5SCORGl+JPFRovRqhcpAnKfa+DRJO2WE/l3Wwum2Nt8/Hchfgc1dy w==; IronPort-SDR: nepO8RAW8Qr71kyF8b5qliNdHOQ8x/uz5wJBGdf7apL7yW3fo3nNliIu9uip/IRhtgMocN4Nud cppwdRfjGDT4xc2mtAfymsHVWazdRy8GpVms38jufMA0i9N8osqTMa94Gp4LDoC29wi6JLCKtL TWmLg1hG0CakN60HIgypkAQfaHEXYMCazW10YSphuO0geSd+aulCAkh/jXmOhbCmPO+Vl3KRJJ P4Awwfu4R1b/Ml8EPZJBJ3VXw9cITdsavWZq/T5qqySHhWFbsNeQcADbCn8ZK43w2zpp4AOYQW a0M= X-IronPort-AV: E=Sophos;i="5.77,371,1596470400"; d="scan'208";a="154185945" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 14 Oct 2020 05:42:43 +0800 IronPort-SDR: Lv3u0XX1CYqpEUqHnDjOh2F47HOxS37GGkCiWiuv2T23ghiPqqSsoskwrkYY2s17AKMmnTW9ko utDA+ZNoc+1ToxfQm5eSTXqDE+fW7SKIVHLQv1mifh8TkiUYQ8LJ3DUFPvd5PTZhI+ebmAEBS1 CDL1c3l85VqlxCc0y7CdselMO4GkgLbwiAHq3DJMtwVDXJvEyjMkZabU8yGjjV7InYCFDjipkr GYuyaX02j2djPhPI2H7YPXmVWfbViujt8mIGu9/XSLmrQuQwhW9I6MFdBmpl0Ys3OORZNcKpaM D17Q+lpT1VhkkBfV6EZxAW4y Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 14:29:17 -0700 IronPort-SDR: sDx5smsGZ+Sl6YcIw8XjQrL+ylFlfRjgVZ+mqavS+T8kMkSE3DK5lX//ZOUKy15NEV0ZoEkea9 ICxD4cM7GM8ziY4b95M4xjmWbM557Pda7i3D7eZNdM+q4Hr/ZZGr+sRGMrGcND/JFWyqOpMl3L JA3z5nuyiCpqJ+VT4BYKXD98+d0HEXzEKXezC3UWBadAGDK1WmWB6ZgePrO0ZONUEsOAlce/kZ fZFR0WNGm6VlBSJXNboGp71SjmyOdtL+UzmInaQaznr2qcnfuM5SOMa1W9brdLG7++6PNMIG8I aww= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip02.wdc.com with ESMTP; 13 Oct 2020 14:42:41 -0700 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Maxim Levitsky , Fam Zheng Subject: [PATCH v6 11/11] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write() Date: Wed, 14 Oct 2020 06:42:12 +0900 Message-Id: <20201013214212.2152-12-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201013214212.2152-1-dmitry.fomichev@wdc.com> References: <20201013214212.2152-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.153.141; envelope-from=prvs=5487bf209=dmitry.fomichev@wdc.com; helo=esa3.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/13 17:42:19 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND. Signed-off-by: Dmitry Fomichev --- hw/block/nvme.c | 95 +++++++++++++------------------------------ hw/block/trace-events | 1 - 2 files changed, 28 insertions(+), 68 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 1186c16b50..1c164f1d80 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1529,53 +1529,7 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeNamespace *ns = req->ns; - uint64_t slba = le64_to_cpu(rw->slba); - uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; - NvmeZone *zone; - uint64_t offset = nvme_l2b(ns, slba); - uint32_t count = nvme_l2b(ns, nlb); - BlockBackend *blk = ns->blkconf.blk; - uint16_t status; - - trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); - - status = nvme_check_bounds(n, ns, slba, nlb); - if (status) { - trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); - return status; - } - - if (ns->params.zoned) { - zone = nvme_get_zone_by_slba(ns, slba); - - status = nvme_check_zone_write(n, ns, zone, slba, nlb, false); - if (status != NVME_SUCCESS) { - goto invalid; - } - - status = nvme_auto_open_zone(ns, zone); - if (status != NVME_SUCCESS) { - goto invalid; - } - - req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb); - } - - block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); - req->aiocb = blk_aio_pwrite_zeroes(blk, offset, count, - BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); - return NVME_NO_COMPLETE; - -invalid: - block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); - return status | NVME_DNR; -} - -static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append, bool wrz) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; @@ -1590,10 +1544,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), nvme_nsid(ns), nlb, data_size, slba); - status = nvme_check_mdts(n, data_size); - if (status) { - trace_pci_nvme_err_mdts(nvme_cid(req), data_size); - goto invalid; + if (!wrz) { + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + goto invalid; + } } status = nvme_check_bounds(n, ns, slba, nlb); @@ -1628,21 +1584,26 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append) data_offset = nvme_l2b(ns, slba); - status = nvme_map_dptr(n, data_size, req); - if (status) { - goto invalid; - } + if (!wrz) { + status = nvme_map_dptr(n, data_size, req); + if (status) { + goto invalid; + } - data_offset = nvme_l2b(ns, slba); - - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - if (req->qsg.sg) { - req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + if (req->qsg.sg) { + req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } else { + req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); + } } else { - req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); + block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, + BDRV_REQ_MAY_UNMAP, nvme_rw_cb, + req); } return NVME_NO_COMPLETE; @@ -2126,11 +2087,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_FLUSH: return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: - return nvme_write_zeroes(n, req); + return nvme_write(n, req, false, true); case NVME_CMD_ZONE_APPEND: - return nvme_write(n, req, true); + return nvme_write(n, req, true, false); case NVME_CMD_WRITE: - return nvme_write(n, req, false); + return nvme_write(n, req, false, false); case NVME_CMD_READ: return nvme_read(n, req); case NVME_CMD_ZONE_MGMT_SEND: diff --git a/hw/block/trace-events b/hw/block/trace-events index 7ee90a50c3..5a3cd4c5dc 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -43,7 +43,6 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opna pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" -pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""