From patchwork Tue Jun 30 10:01:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D784113BD for ; Tue, 30 Jun 2020 10:03:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A86BC207F5 for ; Tue, 30 Jun 2020 10:03:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A86BC207F5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:57726 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqD6w-0001aB-UA for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:03:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57882) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5T-0007hE-2P; Tue, 30 Jun 2020 06:01:59 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47540) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5O-0004J9-Ur; Tue, 30 Jun 2020 06:01:58 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id A460FBF717; Tue, 30 Jun 2020 10:01:52 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 01/10] hw/block/nvme: support I/O Command Sets Date: Tue, 30 Jun 2020 12:01:30 +0200 Message-Id: <20200630100139.1483002-2-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" From: Klaus Jensen Implement support for TP 4056 ("Namespace Types"). This adds the 'iocs' (I/O Command Set) device parameter to the nvme-ns device. Signed-off-by: Klaus Jensen --- block/nvme.c | 6 +- hw/block/nvme-ns.c | 24 +++-- hw/block/nvme-ns.h | 11 +- hw/block/nvme.c | 226 +++++++++++++++++++++++++++++++++--------- hw/block/nvme.h | 52 ++++++---- hw/block/trace-events | 6 +- include/block/nvme.h | 53 ++++++++-- 7 files changed, 285 insertions(+), 93 deletions(-) diff --git a/block/nvme.c b/block/nvme.c index 05485fdd1189..e7fe0c7accd1 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c) { uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; if (status) { - trace_nvme_error(le32_to_cpu(c->result), + trace_nvme_error(le32_to_cpu(c->dw0), le16_to_cpu(c->sq_head), le16_to_cpu(c->sq_id), le16_to_cpu(c->cid), @@ -495,7 +495,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp) { BDRVNVMeState *s = bs->opaque; NvmeIdCtrl *idctrl; - NvmeIdNs *idns; + NvmeIdNsNvm *idns; NvmeLBAF *lbaf; uint8_t *resp; uint16_t oncs; @@ -512,7 +512,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp) goto out; } idctrl = (NvmeIdCtrl *)resp; - idns = (NvmeIdNs *)resp; + idns = (NvmeIdNsNvm *)resp; r = qemu_vfio_dma_map(s->vfio, resp, sizeof(NvmeIdCtrl), true, &iova); if (r) { error_setg(errp, "Cannot map buffer for DMA"); diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7c825c38c69d..ae051784caaf 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -59,8 +59,16 @@ static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp) static void nvme_ns_init(NvmeNamespace *ns) { - NvmeIdNs *id_ns = &ns->id_ns; + NvmeIdNsNvm *id_ns; + int unmap = blk_get_flags(ns->blk) & BDRV_O_UNMAP; + + ns->id_ns[NVME_IOCS_NVM] = g_new0(NvmeIdNsNvm, 1); + id_ns = nvme_ns_id_nvm(ns); + + ns->iocs = ns->params.iocs; + + id_ns->dlfeat = unmap ? 0x9 : 0x0; id_ns->lbaf[0].ds = ns->params.lbads; id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); @@ -130,8 +138,7 @@ static int nvme_ns_init_blk_state(NvmeNamespace *ns, Error **errp) return 0; } -static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id, - Error **errp) +static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) { uint64_t perm, shared_perm; @@ -174,7 +181,8 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id, return 0; } -static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) +static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error + **errp) { if (!ns->blk) { error_setg(errp, "block backend not configured"); @@ -191,11 +199,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) { - if (nvme_ns_check_constraints(ns, errp)) { + if (nvme_ns_check_constraints(n, ns, errp)) { return -1; } - if (nvme_ns_init_blk(n, ns, &n->id_ctrl, errp)) { + if (nvme_ns_init_blk(n, ns, errp)) { return -1; } @@ -210,7 +218,8 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) * With a state file in place we can enable the Deallocated or * Unwritten Logical Block Error feature. */ - ns->id_ns.nsfeat |= 0x4; + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); + id_ns->nsfeat |= 0x4; } if (nvme_register_namespace(n, ns, errp)) { @@ -239,6 +248,7 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UINT8("lbads", NvmeNamespace, params.lbads, BDRV_SECTOR_BITS), DEFINE_PROP_DRIVE("state", NvmeNamespace, blk_state), + DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index eb901acc912b..4124f20f1cef 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + uint8_t iocs; uint8_t lbads; } NvmeNamespaceParams; @@ -30,8 +31,9 @@ typedef struct NvmeNamespace { BlockBackend *blk_state; int32_t bootindex; int64_t size; + uint8_t iocs; - NvmeIdNs id_ns; + void *id_ns[256]; NvmeNamespaceParams params; unsigned long *utilization; @@ -50,9 +52,14 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns) return -1; } +static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns) +{ + return ns->id_ns[NVME_IOCS_NVM]; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { - NvmeIdNs *id_ns = &ns->id_ns; + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)]; } diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 25d79bcd0bc9..1662c11a4cf3 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -854,7 +854,7 @@ static void nvme_process_aers(void *opaque) req = n->aer_reqs[n->outstanding_aers]; - result = (NvmeAerResult *) &req->cqe.result; + result = (NvmeAerResult *) &req->cqe.dw0; result->event_type = event->result.event_type; result->event_info = event->result.event_info; result->log_page = event->result.log_page; @@ -916,7 +916,8 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len) static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns, uint64_t slba, uint32_t nlb) { - uint64_t nsze = le64_to_cpu(ns->id_ns.nsze); + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); + uint64_t nsze = le64_to_cpu(id_ns->nsze); if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) { return NVME_LBA_RANGE | NVME_DNR; @@ -951,8 +952,9 @@ static uint16_t nvme_check_rw(NvmeCtrl *n, NvmeRequest *req) status = nvme_check_bounds(n, ns, req->slba, req->nlb); if (status) { + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb, - ns->id_ns.nsze); + id_ns->nsze); return status; } @@ -1154,8 +1156,9 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) status = nvme_check_bounds(n, ns, req->slba, req->nlb); if (status) { + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb, - ns->id_ns.nsze); + id_ns->nsze); return status; } @@ -1481,14 +1484,19 @@ static uint16_t nvme_effects_log(NvmeCtrl *n, uint32_t buf_len, uint64_t off, NvmeRequest *req) { uint32_t trans_len; + uint8_t csi = le32_to_cpu(req->cmd.cdw14) >> 24; - if (off > sizeof(nvme_effects)) { + if (!(n->iocscs[n->features.iocsci] & (1 << csi))) { return NVME_INVALID_FIELD | NVME_DNR; } - trans_len = MIN(sizeof(nvme_effects) - off, buf_len); + if (off > sizeof(NvmeEffectsLog)) { + return NVME_INVALID_FIELD | NVME_DNR; + } - return nvme_dma(n, (uint8_t *)&nvme_effects + off, trans_len, + trans_len = MIN(sizeof(NvmeEffectsLog) - off, buf_len); + + return nvme_dma(n, (uint8_t *)&nvme_effects[csi] + off, trans_len, DMA_DIRECTION_FROM_DEVICE, req); } @@ -1648,69 +1656,129 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) return NVME_SUCCESS; } -static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ctrl(NvmeCtrl *n, uint8_t cns, uint8_t csi, + NvmeRequest *req) { + NvmeIdCtrl empty = { 0 }; + NvmeIdCtrl *id_ctrl = ∅ + trace_pci_nvme_identify_ctrl(); - return nvme_dma(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), + switch (cns) { + case NVME_ID_CNS_CTRL: + id_ctrl = &n->id_ctrl; + + break; + + case NVME_ID_CNS_CTRL_IOCS: + if (!(n->iocscs[n->features.iocsci] & (1 << csi))) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (n->id_ctrl_iocss[csi]) { + id_ctrl = n->id_ctrl_iocss[csi]; + } + + break; + + default: + assert(cns); + } + + return nvme_dma(n, (uint8_t *)id_ctrl, sizeof(*id_ctrl), DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, uint8_t cns, uint8_t csi, + NvmeRequest *req) { + NvmeIdNsNvm empty = { 0 }; + void *id_ns = ∅ + uint32_t nsid = le32_to_cpu(req->cmd.nsid); NvmeNamespace *ns; - NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - NvmeIdNs *id_ns, inactive = { 0 }; - uint32_t nsid = le32_to_cpu(c->nsid); - trace_pci_nvme_identify_ns(nsid); + trace_pci_nvme_identify_ns(nsid, csi); if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) { return NVME_INVALID_NSID | NVME_DNR; } ns = nvme_ns(n, nsid); - if (unlikely(!ns)) { - id_ns = &inactive; - } else { - id_ns = &ns->id_ns; + if (ns) { + switch (cns) { + case NVME_ID_CNS_NS: + id_ns = ns->id_ns[NVME_IOCS_NVM]; + if (!id_ns) { + return NVME_INVALID_IOCS | NVME_DNR; + } + + break; + + case NVME_ID_CNS_NS_IOCS: + if (csi == NVME_IOCS_NVM) { + break; + } + + id_ns = ns->id_ns[csi]; + if (!id_ns) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + break; + + default: + assert(cns); + } } - return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs), + return nvme_dma(n, (uint8_t *)id_ns, NVME_IDENTIFY_DATA_SIZE, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, uint8_t cns, uint8_t csi, + NvmeRequest *req) { - NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - static const int data_len = NVME_IDENTIFY_DATA_SIZE; - uint32_t min_nsid = le32_to_cpu(c->nsid); + static const int len = NVME_IDENTIFY_DATA_SIZE; + uint32_t min_nsid = le32_to_cpu(req->cmd.nsid); uint32_t *list; uint16_t ret; int j = 0; - trace_pci_nvme_identify_nslist(min_nsid); + trace_pci_nvme_identify_nslist(min_nsid, csi); - list = g_malloc0(data_len); + if (min_nsid == 0xfffffffe || min_nsid == 0xffffffff) { + return NVME_INVALID_NSID | NVME_DNR; + } + + if (cns == NVME_ID_CNS_NS_ACTIVE_LIST_IOCS && !csi) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + list = g_malloc0(len); for (int i = 1; i <= n->num_namespaces; i++) { - if (i <= min_nsid || !nvme_ns(n, i)) { + NvmeNamespace *ns = nvme_ns(n, i); + if (i <= min_nsid || !ns) { continue; } + + if (cns == NVME_ID_CNS_NS_ACTIVE_LIST_IOCS && csi && csi != ns->iocs) { + continue; + } + list[j++] = cpu_to_le32(i); - if (j == data_len / sizeof(uint32_t)) { + if (j == len / sizeof(uint32_t)) { break; } } - ret = nvme_dma(n, (uint8_t *)list, data_len, DMA_DIRECTION_FROM_DEVICE, - req); + ret = nvme_dma(n, (uint8_t *)list, len, DMA_DIRECTION_FROM_DEVICE, req); g_free(list); return ret; } static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) { - NvmeIdentify *c = (NvmeIdentify *)&req->cmd; - uint32_t nsid = le32_to_cpu(c->nsid); + NvmeNamespace *ns; + uint32_t nsid = le32_to_cpu(req->cmd.nsid); uint8_t list[NVME_IDENTIFY_DATA_SIZE]; struct data { @@ -1718,6 +1786,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) NvmeIdNsDescr hdr; uint8_t v[16]; } uuid; + + struct { + NvmeIdNsDescr hdr; + uint8_t v; + } iocs; }; struct data *ns_descrs = (struct data *)list; @@ -1728,7 +1801,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } - if (unlikely(!nvme_ns(n, nsid))) { + ns = nvme_ns(n, nsid); + if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1744,25 +1818,45 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; stl_be_p(&ns_descrs->uuid.v, nsid); + ns_descrs->iocs.hdr.nidt = NVME_NIDT_CSI; + ns_descrs->iocs.hdr.nidl = NVME_NIDT_CSI_LEN; + stb_p(&ns_descrs->iocs.v, ns->iocs); + return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_identify_iocs(NvmeCtrl *n, uint16_t cntid, + NvmeRequest *req) +{ + return nvme_dma(n, (uint8_t *) n->iocscs, sizeof(n->iocscs), + DMA_DIRECTION_FROM_DEVICE, req); +} + static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) { - NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + NvmeIdentify *id = (NvmeIdentify *) &req->cmd; - switch (le32_to_cpu(c->cns)) { + trace_pci_nvme_identify(nvme_cid(req), le32_to_cpu(req->cmd.nsid), + le16_to_cpu(id->cntid), id->cns, id->csi, + le16_to_cpu(id->nvmsetid)); + + switch (le32_to_cpu(id->cns)) { case NVME_ID_CNS_NS: - return nvme_identify_ns(n, req); + case NVME_ID_CNS_NS_IOCS: + return nvme_identify_ns(n, id->cns, id->csi, req); case NVME_ID_CNS_CTRL: - return nvme_identify_ctrl(n, req); + case NVME_ID_CNS_CTRL_IOCS: + return nvme_identify_ctrl(n, id->cns, id->csi, req); case NVME_ID_CNS_NS_ACTIVE_LIST: - return nvme_identify_nslist(n, req); + case NVME_ID_CNS_NS_ACTIVE_LIST_IOCS: + return nvme_identify_nslist(n, id->cns, id->csi, req); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); + case NVME_ID_CNS_IOCS: + return nvme_identify_iocs(n, id->cntid, req); default: - trace_pci_nvme_err_invalid_identify_cns(le32_to_cpu(c->cns)); + trace_pci_nvme_err_invalid_identify_cns(id->cns); return NVME_INVALID_FIELD | NVME_DNR; } } @@ -1771,7 +1865,7 @@ static uint16_t nvme_abort(NvmeCtrl *n, NvmeRequest *req) { uint16_t sqid = le32_to_cpu(req->cmd.cdw10) & 0xffff; - req->cqe.result = 1; + req->cqe.dw0 = 1; if (nvme_check_sqid(n, sqid)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1954,13 +2048,17 @@ defaults: result = cpu_to_le32(result); break; + case NVME_COMMAND_SET_PROFILE: + result = cpu_to_le32(n->features.iocsci & 0x1ff); + break; default: result = cpu_to_le32(nvme_feature_default[fid]); break; } out: - req->cqe.result = result; + req->cqe.dw0 = result; + return NVME_SUCCESS; } @@ -1983,6 +2081,7 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns = NULL; + NvmeIdNsNvm *id_ns; NvmeCmd *cmd = &req->cmd; uint32_t dw10 = le32_to_cpu(cmd->cdw10); @@ -2059,7 +2158,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) continue; } - if (NVME_ID_NS_NSFEAT_DULBE(ns->id_ns.nsfeat)) { + id_ns = nvme_ns_id_nvm(ns); + if (NVME_ID_NS_NSFEAT_DULBE(id_ns->nsfeat)) { ns->features.err_rec = dw11; } } @@ -2075,6 +2175,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) for (int i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); + if (!ns) { continue; } @@ -2105,14 +2206,34 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) ((dw11 >> 16) & 0xFFFF) + 1, n->params.max_ioqpairs, n->params.max_ioqpairs); - req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) | - ((n->params.max_ioqpairs - 1) << 16)); + req->cqe.dw0 = cpu_to_le32((n->params.max_ioqpairs - 1) | + ((n->params.max_ioqpairs - 1) << 16)); break; case NVME_ASYNCHRONOUS_EVENT_CONF: n->features.async_config = dw11; break; case NVME_TIMESTAMP: return nvme_set_feature_timestamp(n, req); + case NVME_COMMAND_SET_PROFILE: + if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ALL) { + uint16_t iocsci = dw11 & 0x1ff; + uint64_t iocsc = n->iocscs[iocsci]; + + for (int i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + if (!(iocsc & (1 << ns->iocs))) { + return NVME_IOCS_COMB_REJECTED | NVME_DNR; + } + } + + n->features.iocsci = iocsci; + } + + break; default: return NVME_FEAT_NOT_CHANGABLE | NVME_DNR; } @@ -2265,6 +2386,8 @@ static int nvme_start_ctrl(NvmeCtrl *n) uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12; uint32_t page_size = 1 << page_bits; + NvmeIdCtrl *id_ctrl = &n->id_ctrl; + if (unlikely(n->cq[0])) { trace_pci_nvme_err_startfail_cq(); return -1; @@ -2304,28 +2427,28 @@ static int nvme_start_ctrl(NvmeCtrl *n) return -1; } if (unlikely(NVME_CC_IOCQES(n->bar.cc) < - NVME_CTRL_CQES_MIN(n->id_ctrl.cqes))) { + NVME_CTRL_CQES_MIN(id_ctrl->cqes))) { trace_pci_nvme_err_startfail_cqent_too_small( NVME_CC_IOCQES(n->bar.cc), NVME_CTRL_CQES_MIN(n->bar.cap)); return -1; } if (unlikely(NVME_CC_IOCQES(n->bar.cc) > - NVME_CTRL_CQES_MAX(n->id_ctrl.cqes))) { + NVME_CTRL_CQES_MAX(id_ctrl->cqes))) { trace_pci_nvme_err_startfail_cqent_too_large( NVME_CC_IOCQES(n->bar.cc), NVME_CTRL_CQES_MAX(n->bar.cap)); return -1; } if (unlikely(NVME_CC_IOSQES(n->bar.cc) < - NVME_CTRL_SQES_MIN(n->id_ctrl.sqes))) { + NVME_CTRL_SQES_MIN(id_ctrl->sqes))) { trace_pci_nvme_err_startfail_sqent_too_small( NVME_CC_IOSQES(n->bar.cc), NVME_CTRL_SQES_MIN(n->bar.cap)); return -1; } if (unlikely(NVME_CC_IOSQES(n->bar.cc) > - NVME_CTRL_SQES_MAX(n->id_ctrl.sqes))) { + NVME_CTRL_SQES_MAX(id_ctrl->sqes))) { trace_pci_nvme_err_startfail_sqent_too_large( NVME_CC_IOSQES(n->bar.cc), NVME_CTRL_SQES_MAX(n->bar.cap)); @@ -2774,6 +2897,8 @@ static void nvme_init_state(NvmeCtrl *n) n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING; n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); + n->iocscs[0] = 1 << NVME_IOCS_NVM; + n->features.iocsci = 0; } int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) @@ -2977,7 +3102,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); NVME_CAP_SET_CQR(n->bar.cap, 1); NVME_CAP_SET_TO(n->bar.cap, 0xf); - NVME_CAP_SET_CSS(n->bar.cap, 1); + NVME_CAP_SET_CSS(n->bar.cap, (NVME_CAP_CSS_NVM | NVME_CAP_CSS_CSI)); NVME_CAP_SET_MPSMAX(n->bar.cap, 4); n->bar.vs = NVME_SPEC_VER; @@ -3037,6 +3162,11 @@ static void nvme_exit(PCIDevice *pci_dev) if (n->pmrdev) { host_memory_backend_set_mapped(n->pmrdev, false); } + + for (int i = 0; i < 256; i++) { + g_free(n->id_ctrl_iocss[i]); + } + msix_uninit_exclusive_bar(pci_dev); } diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e62bcd12a7a8..69be47963f5d 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -18,28 +18,33 @@ typedef struct NvmeParams { bool use_intel_id; } NvmeParams; -static const NvmeEffectsLog nvme_effects = { - .acs = { - [NVME_ADM_CMD_DELETE_SQ] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_CREATE_SQ] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_DELETE_CQ] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_CREATE_CQ] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_IDENTIFY] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_ABORT] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_SET_FEATURES] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_CCC | - NVME_EFFECTS_NIC | NVME_EFFECTS_NCC, - [NVME_ADM_CMD_GET_FEATURES] = NVME_EFFECTS_CSUPP, - [NVME_ADM_CMD_FORMAT_NVM] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC | - NVME_EFFECTS_NCC | NVME_EFFECTS_NIC | NVME_EFFECTS_CSE_MULTI, - [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_EFFECTS_CSUPP, - }, +static const NvmeEffectsLog nvme_effects[] = { + [NVME_IOCS_NVM] = { + .acs = { + [NVME_ADM_CMD_DELETE_SQ] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_CREATE_SQ] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_GET_LOG_PAGE] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_DELETE_CQ] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_CREATE_CQ] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_IDENTIFY] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_ABORT] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_SET_FEATURES] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_CCC | NVME_EFFECTS_NIC | NVME_EFFECTS_NCC, + [NVME_ADM_CMD_GET_FEATURES] = NVME_EFFECTS_CSUPP, + [NVME_ADM_CMD_FORMAT_NVM] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_LBCC | NVME_EFFECTS_NCC | NVME_EFFECTS_NIC | + NVME_EFFECTS_CSE_MULTI, + [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_EFFECTS_CSUPP, + }, - .iocs = { - [NVME_CMD_FLUSH] = NVME_EFFECTS_CSUPP, - [NVME_CMD_WRITE] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC, - [NVME_CMD_READ] = NVME_EFFECTS_CSUPP, - [NVME_CMD_WRITE_ZEROES] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC, + .iocs = { + [NVME_CMD_FLUSH] = NVME_EFFECTS_CSUPP, + [NVME_CMD_WRITE] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_LBCC, + [NVME_CMD_READ] = NVME_EFFECTS_CSUPP, + [NVME_CMD_WRITE_ZEROES] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_LBCC, + }, }, }; @@ -193,6 +198,7 @@ typedef struct NvmeFeatureVal { }; uint32_t async_config; uint32_t vwc; + uint32_t iocsci; } NvmeFeatureVal; static const uint32_t nvme_feature_cap[0x100] = { @@ -202,6 +208,7 @@ static const uint32_t nvme_feature_cap[0x100] = { [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE, [NVME_ASYNCHRONOUS_EVENT_CONF] = NVME_FEAT_CAP_CHANGE, [NVME_TIMESTAMP] = NVME_FEAT_CAP_CHANGE, + [NVME_COMMAND_SET_PROFILE] = NVME_FEAT_CAP_CHANGE, }; static const uint32_t nvme_feature_default[0x100] = { @@ -220,6 +227,7 @@ static const bool nvme_feature_support[0x100] = { [NVME_WRITE_ATOMICITY] = true, [NVME_ASYNCHRONOUS_EVENT_CONF] = true, [NVME_TIMESTAMP] = true, + [NVME_COMMAND_SET_PROFILE] = true, }; typedef struct NvmeCtrl { @@ -247,6 +255,7 @@ typedef struct NvmeCtrl { uint64_t timestamp_set_qemu_clock_ms; /* QEMU clock time */ uint64_t starttime_ms; uint16_t temperature; + uint64_t iocscs[512]; HostMemoryBackend *pmrdev; @@ -262,6 +271,7 @@ typedef struct NvmeCtrl { NvmeSQueue admin_sq; NvmeCQueue admin_cq; NvmeIdCtrl id_ctrl; + void *id_ctrl_iocss[256]; NvmeFeatureVal features; } NvmeCtrl; diff --git a/hw/block/trace-events b/hw/block/trace-events index ed21609f1a4f..4cf0236631d2 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -51,10 +51,12 @@ pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16"" pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16"" +pci_nvme_identify(uint16_t cid, uint32_t nsid, uint16_t cntid, uint8_t cns, uint8_t csi, uint16_t nvmsetid) "cid %"PRIu16" nsid %"PRIu32" cntid 0x%"PRIx16" cns 0x%"PRIx8" csi 0x%"PRIx8" nvmsetid %"PRIu16"" pci_nvme_identify_ctrl(void) "identify controller" -pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32"" -pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_ns(uint32_t ns, uint8_t csi) "nsid %"PRIu32" csi 0x%"PRIx8"" +pci_nvme_identify_nslist(uint32_t ns, uint8_t csi) "nsid %"PRIu32" csi 0x%"PRIx8"" pci_nvme_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32"" +pci_nvme_identify_io_cmd_set(uint16_t cid) "cid %"PRIu16"" pci_nvme_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" len %"PRIu32" off %"PRIu64"" pci_nvme_getfeat(uint16_t cid, uint8_t fid, uint8_t sel, uint32_t cdw11) "cid %"PRIu16" fid 0x%"PRIx8" sel 0x%"PRIx8" cdw11 0x%"PRIx32"" pci_nvme_setfeat(uint16_t cid, uint8_t fid, uint8_t save, uint32_t cdw11) "cid %"PRIu16" fid 0x%"PRIx8" save 0x%"PRIx8" cdw11 0x%"PRIx32"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 040e4ef36ddc..637be0ddd2fc 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -93,6 +93,11 @@ enum NvmeCapMask { #define NVME_CAP_SET_CMBS(cap, val) (cap |= (uint64_t)(val & CAP_CMBS_MASK)\ << CAP_CMBS_SHIFT) +enum NvmeCapCss { + NVME_CAP_CSS_NVM = 1 << 0, + NVME_CAP_CSS_CSI = 1 << 6, +}; + enum NvmeCcShift { CC_EN_SHIFT = 0, CC_CSS_SHIFT = 4, @@ -121,6 +126,11 @@ enum NvmeCcMask { #define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK) #define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK) +enum NvmeCcCss { + NVME_CC_CSS_NVM = 0x0, + NVME_CC_CSS_ALL = 0x6, +}; + enum NvmeCstsShift { CSTS_RDY_SHIFT = 0, CSTS_CFS_SHIFT = 1, @@ -454,6 +464,10 @@ enum NvmeCmbmscMask { #define NVME_CMBSTS_CBAI(cmbsts) (cmsts & 0x1) +enum NvmeCommandSet { + NVME_IOCS_NVM = 0x0, +}; + enum NvmeSglDescriptorType { NVME_SGL_DESCR_TYPE_DATA_BLOCK = 0x0, NVME_SGL_DESCR_TYPE_BIT_BUCKET = 0x1, @@ -604,7 +618,8 @@ typedef struct NvmeIdentify { uint8_t rsvd3; uint16_t cntid; uint16_t nvmsetid; - uint16_t rsvd4; + uint8_t rsvd4; + uint8_t csi; uint32_t rsvd11[4]; } NvmeIdentify; @@ -697,8 +712,15 @@ typedef struct NvmeAerResult { } NvmeAerResult; typedef struct NvmeCqe { - uint32_t result; - uint32_t rsvd; + union { + struct { + uint32_t dw0; + uint32_t dw1; + }; + + uint64_t qw0; + }; + uint16_t sq_head; uint16_t sq_id; uint16_t cid; @@ -746,6 +768,10 @@ enum NvmeStatusCodes { NVME_FEAT_NOT_CHANGABLE = 0x010e, NVME_FEAT_NOT_NS_SPEC = 0x010f, NVME_FW_REQ_SUSYSTEM_RESET = 0x0110, + NVME_IOCS_NOT_SUPPORTED = 0x0127, + NVME_IOCS_NOT_ENABLED = 0x0128, + NVME_IOCS_COMB_REJECTED = 0x0129, + NVME_INVALID_IOCS = 0x0126, NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO = 0x0182, @@ -890,10 +916,14 @@ typedef struct NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 enum { - NVME_ID_CNS_NS = 0x0, - NVME_ID_CNS_CTRL = 0x1, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x2, - NVME_ID_CNS_NS_DESCR_LIST = 0x3, + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_NS_IOCS = 0x05, + NVME_ID_CNS_CTRL_IOCS = 0x06, + NVME_ID_CNS_NS_ACTIVE_LIST_IOCS = 0x07, + NVME_ID_CNS_IOCS = 0x1c, }; typedef struct NvmeIdCtrl { @@ -1058,6 +1088,7 @@ enum NvmeFeatureIds { NVME_WRITE_ATOMICITY = 0xa, NVME_ASYNCHRONOUS_EVENT_CONF = 0xb, NVME_TIMESTAMP = 0xe, + NVME_COMMAND_SET_PROFILE = 0x19, NVME_SOFTWARE_PROGRESS_MARKER = 0x80 }; @@ -1105,7 +1136,7 @@ typedef struct NvmeLBAF { #define NVME_NSID_BROADCAST 0xffffffff -typedef struct NvmeIdNs { +typedef struct NvmeIdNsNvm { uint64_t nsze; uint64_t ncap; uint64_t nuse; @@ -1143,7 +1174,7 @@ typedef struct NvmeIdNs { NvmeLBAF lbaf[16]; uint8_t rsvd192[192]; uint8_t vs[3712]; -} NvmeIdNs; +} NvmeIdNsNvm; typedef struct NvmeIdNsDescr { uint8_t nidt; @@ -1154,11 +1185,13 @@ typedef struct NvmeIdNsDescr { #define NVME_NIDT_EUI64_LEN 8 #define NVME_NIDT_NGUID_LEN 16 #define NVME_NIDT_UUID_LEN 16 +#define NVME_NIDT_CSI_LEN 1 enum { NVME_NIDT_EUI64 = 0x1, NVME_NIDT_NGUID = 0x2, NVME_NIDT_UUID = 0x3, + NVME_NIDT_CSI = 0x4, }; /*Deallocate Logical Block Features*/ @@ -1211,7 +1244,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEnduranceGroupLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); - QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeNvmSetAttributes) != 128); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNvmSetList) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096); From patchwork Tue Jun 30 10:01:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C43446C1 for ; Tue, 30 Jun 2020 10:07:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A55A820780 for ; Tue, 30 Jun 2020 10:07:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A55A820780 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:49068 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqDBG-0001SM-Tx for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:07:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57968) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5d-0007mi-Mt; Tue, 30 Jun 2020 06:02:10 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47550) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5P-0004JF-IW; Tue, 30 Jun 2020 06:02:09 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 5C087BF735; Tue, 30 Jun 2020 10:01:53 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 02/10] hw/block/nvme: add zns specific fields and types Date: Tue, 30 Jun 2020 12:01:31 +0200 Message-Id: <20200630100139.1483002-3-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add new fields, types and data structures for TP 4053 ("Zoned Namespaces"). Signed-off-by: Klaus Jensen --- include/block/nvme.h | 186 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 180 insertions(+), 6 deletions(-) diff --git a/include/block/nvme.h b/include/block/nvme.h index 637be0ddd2fc..ddf948132272 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -465,7 +465,8 @@ enum NvmeCmbmscMask { #define NVME_CMBSTS_CBAI(cmbsts) (cmsts & 0x1) enum NvmeCommandSet { - NVME_IOCS_NVM = 0x0, + NVME_IOCS_NVM = 0x0, + NVME_IOCS_ZONED = 0x2, }; enum NvmeSglDescriptorType { @@ -552,6 +553,11 @@ enum NvmeIoCommands { NVME_CMD_COMPARE = 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM = 0x09, + + /* Zoned Command Set */ + NVME_CMD_ZONE_MGMT_SEND = 0x79, + NVME_CMD_ZONE_MGMT_RECV = 0x7a, + NVME_CMD_ZONE_APPEND = 0x7d, }; typedef struct NvmeDeleteQ { @@ -664,6 +670,82 @@ enum { NVME_RW_PRINFO_PRCHK_REF = 1 << 10, }; +typedef struct NvmeZoneAppendCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint32_t rsvd8[2]; + uint64_t mptr; + NvmeCmdDptr dptr; + uint64_t zslba; + uint16_t nlb; + uint8_t rsvd50; + uint8_t control; + uint32_t ilbrt; + uint16_t lbat; + uint16_t lbatm; +} NvmeZoneAppendCmd; + +typedef struct NvmeZoneManagementSendCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint32_t rsvd8[4]; + NvmeCmdDptr dptr; + uint64_t slba; + uint32_t rsvd48; + uint8_t zsa; + uint8_t zsflags; + uint16_t rsvd54; + uint32_t rsvd56[2]; +} NvmeZoneManagementSendCmd; + +#define NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(zsflags) ((zsflags) & 0x1) + +typedef enum NvmeZoneManagementSendAction { + NVME_CMD_ZONE_MGMT_SEND_CLOSE = 0x1, + NVME_CMD_ZONE_MGMT_SEND_FINISH = 0x2, + NVME_CMD_ZONE_MGMT_SEND_OPEN = 0x3, + NVME_CMD_ZONE_MGMT_SEND_RESET = 0x4, + NVME_CMD_ZONE_MGMT_SEND_OFFLINE = 0x5, + NVME_CMD_ZONE_MGMT_SEND_SET_ZDE = 0x10, +} NvmeZoneManagementSendAction; + +typedef struct NvmeZoneManagementRecvCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint8_t rsvd8[16]; + NvmeCmdDptr dptr; + uint64_t slba; + uint32_t numdw; + uint8_t zra; + uint8_t zrasp; + uint8_t zrasf; + uint8_t rsvd55[9]; +} NvmeZoneManagementRecvCmd; + +typedef enum NvmeZoneManagementRecvAction { + NVME_CMD_ZONE_MGMT_RECV_REPORT_ZONES = 0x0, + NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES = 0x1, +} NvmeZoneManagementRecvAction; + +typedef enum NvmeZoneManagementRecvActionSpecificField { + NVME_CMD_ZONE_MGMT_RECV_LIST_ALL = 0x0, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE = 0x1, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO = 0x2, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO = 0x3, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC = 0x4, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF = 0x5, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO = 0x6, + NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO = 0x7, +} NvmeZoneManagementRecvActionSpecificField; + +#define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1 + typedef struct NvmeDsmCmd { uint8_t opcode; uint8_t flags; @@ -702,13 +784,15 @@ enum NvmeAsyncEventRequest { NVME_AER_INFO_SMART_RELIABILITY = 0, NVME_AER_INFO_SMART_TEMP_THRESH = 1, NVME_AER_INFO_SMART_SPARE_THRESH = 2, + NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED = 0xef, }; typedef struct NvmeAerResult { - uint8_t event_type; - uint8_t event_info; - uint8_t log_page; - uint8_t resv; + uint8_t event_type; + uint8_t event_info; + uint8_t log_page; + uint8_t resv; + uint32_t nsid; } NvmeAerResult; typedef struct NvmeCqe { @@ -775,6 +859,14 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO = 0x0182, + NVME_ZONE_BOUNDARY_ERROR = 0x01b8, + NVME_ZONE_IS_FULL = 0x01b9, + NVME_ZONE_IS_READ_ONLY = 0x01ba, + NVME_ZONE_IS_OFFLINE = 0x01bb, + NVME_ZONE_INVALID_WRITE = 0x01bc, + NVME_TOO_MANY_ACTIVE_ZONES = 0x01bd, + NVME_TOO_MANY_OPEN_ZONES = 0x01be, + NVME_INVALID_ZONE_STATE_TRANSITION = 0x01bf, NVME_WRITE_FAULT = 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR = 0x0282, @@ -868,6 +960,46 @@ enum { NVME_EFFECTS_UUID_SEL = 1 << 19, }; +typedef enum NvmeZoneType { + NVME_ZT_SEQ = 0x2, +} NvmeZoneType; + +typedef enum NvmeZoneState { + NVME_ZS_ZSE = 0x1, + NVME_ZS_ZSIO = 0x2, + NVME_ZS_ZSEO = 0x3, + NVME_ZS_ZSC = 0x4, + NVME_ZS_ZSRO = 0xd, + NVME_ZS_ZSF = 0xe, + NVME_ZS_ZSO = 0xf, +} NvmeZoneState; + +typedef struct NvmeZoneDescriptor { + uint8_t zt; + uint8_t zs; + uint8_t za; + uint8_t rsvd3[5]; + uint64_t zcap; + uint64_t zslba; + uint64_t wp; + uint8_t rsvd32[32]; +} NvmeZoneDescriptor; + +#define NVME_ZS(zs) (((zs) >> 4) & 0xf) +#define NVME_ZS_SET(zs, state) ((zs) = ((state) << 4)) + +#define NVME_ZA_ZFC(za) ((za) & (1 << 0)) +#define NVME_ZA_FZR(za) ((za) & (1 << 1)) +#define NVME_ZA_RZR(za) ((za) & (1 << 2)) +#define NVME_ZA_ZDEV(za) ((za) & (1 << 7)) + +#define NVME_ZA_SET_ZFC(za, val) ((za) |= (((val) & 1) << 0)) +#define NVME_ZA_SET_FZR(za, val) ((za) |= (((val) & 1) << 1)) +#define NVME_ZA_SET_RZR(za, val) ((za) |= (((val) & 1) << 2)) +#define NVME_ZA_SET_ZDEV(za, val) ((za) |= (((val) & 1) << 7)) + +#define NVME_ZA_CLEAR(za) ((za) = 0x0) + enum NvmeSmartWarn { NVME_SMART_SPARE = 1 << 0, NVME_SMART_TEMPERATURE = 1 << 1, @@ -899,6 +1031,7 @@ enum NvmeLogIdentifier { NVME_LOG_SMART_INFO = 0x02, NVME_LOG_FW_SLOT_INFO = 0x03, NVME_LOG_EFFECTS = 0x05, + NVME_LOG_CHANGED_ZONE_LIST = 0xbf, }; typedef struct NvmePSD { @@ -1008,6 +1141,10 @@ typedef struct NvmeIdCtrl { uint8_t vs[1024]; } NvmeIdCtrl; +enum NvmeIdCtrlOaes { + NVME_OAES_ZDCN = 1 << 27, +}; + enum NvmeIdCtrlOacs { NVME_OACS_SECURITY = 1 << 0, NVME_OACS_FORMAT = 1 << 1, @@ -1048,6 +1185,11 @@ enum NvmeIdCtrlLpa { #define NVME_CTRL_SGLS_MPTR_SGL (0x1 << 19) #define NVME_CTRL_SGLS_ADDR_OFFSET (0x1 << 20) +typedef struct NvmeIdCtrlZns { + uint8_t zasl; + uint8_t rsvd1[4095]; +} NvmeIdCtrlZns; + #define NVME_ARB_AB(arb) (arb & 0x7) #define NVME_ARB_AB_NOLIMIT 0x7 #define NVME_ARB_LPW(arb) ((arb >> 8) & 0xff) @@ -1071,6 +1213,7 @@ enum NvmeIdCtrlLpa { #define NVME_AEC_SMART(aec) (aec & 0xff) #define NVME_AEC_NS_ATTR(aec) ((aec >> 8) & 0x1) #define NVME_AEC_FW_ACTIVATION(aec) ((aec >> 9) & 0x1) +#define NVME_AEC_ZDCN(aec) ((aec >> 27) & 0x1) #define NVME_ERR_REC_TLER(err_rec) (err_rec & 0xffff) #define NVME_ERR_REC_DULBE(err_rec) (err_rec & 0x10000) @@ -1226,9 +1369,33 @@ enum NvmeIdNsDps { DPS_FIRST_EIGHT = 8, }; +typedef struct NvmeLBAFE { + uint64_t zsze; + uint8_t zdes; + uint8_t rsvd9[7]; +} NvmeLBAFE; + +typedef struct NvmeIdNsZns { + uint16_t zoc; + uint16_t ozcs; + uint32_t mar; + uint32_t mor; + uint32_t rrl; + uint32_t frl; + uint8_t rsvd20[2795]; + NvmeLBAFE lbafe[16]; + uint8_t rsvd3072[768]; + uint8_t vs[256]; +} NvmeIdNsZns; + +#define NVME_ID_NS_ZNS_ZOC_VZC (1 << 0) +#define NVME_ID_NS_ZNS_ZOC_ZAE (1 << 1) + +#define NVME_ID_NS_ZNS_OZCS_RAZB (1 << 0) + static inline void _nvme_check_size(void) { - QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 8); QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64); @@ -1237,17 +1404,24 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeCreateSq) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneAppendCmd) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementSendCmd) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEnduranceGroupLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZns) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZns) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeNvmSetAttributes) != 128); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNvmSetList) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescriptor) != 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16); } #endif From patchwork Tue Jun 30 10:01:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAD14138C for ; Tue, 30 Jun 2020 10:05:22 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7B163206CB for ; Tue, 30 Jun 2020 10:05:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7B163206CB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:37230 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqD8j-0004qj-Mv for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:05:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5a-0007lN-6C; Tue, 30 Jun 2020 06:02:07 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47556) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5P-0004JS-T2; Tue, 30 Jun 2020 06:02:05 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id EC92BBF758; Tue, 30 Jun 2020 10:01:53 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 03/10] hw/block/nvme: add basic read/write for zoned namespaces Date: Tue, 30 Jun 2020 12:01:32 +0200 Message-Id: <20200630100139.1483002-4-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This adds basic read and write for zoned namespaces. A zoned namespace is created by setting the iocs parameter to 0x2 and supplying a zero-sized blockdev for zone info persistent state (zns.zoneinfo parameter) and the zns.zcap parameter to specify the individual zone capacities. The namespace device will compute the resulting zone size to be the next power of two and fit in as many zones as possible on the underlying namespace blockdev. If the zone info blockdev pointed to by zns.zoneinfo is non-zero in size it will be assumed to contain existing zone state. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 227 +++++++++++++++++++++++++- hw/block/nvme-ns.h | 103 ++++++++++++ hw/block/nvme.c | 361 +++++++++++++++++++++++++++++++++++++++--- hw/block/nvme.h | 1 + hw/block/trace-events | 10 ++ 5 files changed, 677 insertions(+), 25 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index ae051784caaf..9a08b2ba0fb2 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -28,6 +28,26 @@ #include "nvme.h" #include "nvme-ns.h" +const char *nvme_zs_str(NvmeZone *zone) +{ + return nvme_zs_to_str(nvme_zs(zone)); +} + +const char *nvme_zs_to_str(NvmeZoneState zs) +{ + switch (zs) { + case NVME_ZS_ZSE: return "ZSE"; + case NVME_ZS_ZSIO: return "ZSIO"; + case NVME_ZS_ZSEO: return "ZSEO"; + case NVME_ZS_ZSC: return "ZSC"; + case NVME_ZS_ZSRO: return "ZSRO"; + case NVME_ZS_ZSF: return "ZSF"; + case NVME_ZS_ZSO: return "ZSO"; + } + + return NULL; +} + static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp) { Error *local_err = NULL; @@ -57,6 +77,171 @@ static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp) return 0; } +static int nvme_ns_init_blk_zoneinfo(NvmeNamespace *ns, size_t len, + Error **errp) +{ + NvmeZone *zone; + NvmeZoneDescriptor *zd; + uint64_t zslba; + int ret; + + BlockBackend *blk = ns->zns.info.blk; + + Error *local_err = NULL; + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zslba = i * nvme_ns_zsze(ns); + zone = nvme_ns_get_zone(ns, zslba); + zd = &zone->zd; + + zd->zt = NVME_ZT_SEQ; + nvme_zs_set(zone, NVME_ZS_ZSE); + zd->zcap = ns->params.zns.zcap; + zone->wp_staging = zslba; + zd->wp = zd->zslba = cpu_to_le64(zslba); + } + + ret = nvme_ns_blk_resize(blk, len, &local_err); + if (ret) { + error_propagate_prepend(errp, local_err, + "could not resize zoneinfo blockdev: "); + return ret; + } + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zd = &ns->zns.info.zones[i].zd; + + ret = blk_pwrite(blk, i * sizeof(NvmeZoneDescriptor), zd, + sizeof(NvmeZoneDescriptor), 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "blk_pwrite: "); + return ret; + } + } + + return 0; +} + +static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp) +{ + NvmeZone *zone; + NvmeZoneDescriptor *zd; + BlockBackend *blk = ns->zns.info.blk; + uint64_t perm, shared_perm; + int64_t len, zoneinfo_len; + + Error *local_err = NULL; + int ret; + + perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE; + shared_perm = BLK_PERM_ALL; + + ret = blk_set_perm(blk, perm, shared_perm, &local_err); + if (ret) { + error_propagate_prepend(errp, local_err, "blk_set_perm: "); + return ret; + } + + zoneinfo_len = ROUND_UP(ns->zns.info.num_zones * + sizeof(NvmeZoneDescriptor), BDRV_SECTOR_SIZE); + + len = blk_getlength(blk); + if (len < 0) { + error_setg_errno(errp, -len, "blk_getlength: "); + return len; + } + + if (len) { + if (len != zoneinfo_len) { + error_setg(errp, "zoneinfo size mismatch " + "(expected %"PRIu64" bytes; was %"PRIu64" bytes)", + zoneinfo_len, len); + error_append_hint(errp, "Did you change the zone size or " + "zone descriptor size?\n"); + return -1; + } + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zd = &zone->zd; + + ret = blk_pread(blk, i * sizeof(NvmeZoneDescriptor), zd, + sizeof(NvmeZoneDescriptor)); + if (ret < 0) { + error_setg_errno(errp, -ret, "blk_pread: "); + return ret; + } else if (ret != sizeof(NvmeZoneDescriptor)) { + error_setg(errp, "blk_pread: short read"); + return -1; + } + + zone->wp_staging = nvme_wp(zone); + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSF: + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + continue; + + case NVME_ZS_ZSC: + if (nvme_wp(zone) == nvme_zslba(zone)) { + nvme_zs_set(zone, NVME_ZS_ZSE); + continue; + } + + /* fallthrough */ + + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + nvme_zs_set(zone, NVME_ZS_ZSF); + NVME_ZA_SET_ZFC(zd->za, 0x1); + } + } + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zd = &ns->zns.info.zones[i].zd; + + ret = blk_pwrite(blk, i * sizeof(NvmeZoneDescriptor), zd, + sizeof(NvmeZoneDescriptor), 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "blk_pwrite: "); + return ret; + } + } + + return 0; + } + + if (nvme_ns_init_blk_zoneinfo(ns, zoneinfo_len, &local_err)) { + error_propagate_prepend(errp, local_err, + "could not initialize zoneinfo blockdev: "); + } + + return 0; +} + +static void nvme_ns_init_zoned(NvmeNamespace *ns) +{ + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); + NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns); + + id_ns_zns->zoc = cpu_to_le16(ns->params.zns.zoc); + id_ns_zns->ozcs = cpu_to_le16(ns->params.zns.ozcs); + + for (int i = 0; i <= id_ns->nlbaf; i++) { + id_ns_zns->lbafe[i].zsze = cpu_to_le64(pow2ceil(ns->params.zns.zcap)); + } + + ns->zns.info.num_zones = nvme_ns_nlbas(ns) / nvme_ns_zsze(ns); + ns->zns.info.zones = g_malloc0_n(ns->zns.info.num_zones, sizeof(NvmeZone)); + + id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap; + + id_ns_zns->mar = 0xffffffff; + id_ns_zns->mor = 0xffffffff; +} + static void nvme_ns_init(NvmeNamespace *ns) { NvmeIdNsNvm *id_ns; @@ -69,12 +254,20 @@ static void nvme_ns_init(NvmeNamespace *ns) ns->iocs = ns->params.iocs; id_ns->dlfeat = unmap ? 0x9 : 0x0; + if (!nvme_ns_zoned(ns)) { + id_ns->dlfeat = unmap ? 0x9 : 0x0; + } id_ns->lbaf[0].ds = ns->params.lbads; id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); + id_ns->ncap = id_ns->nsze; + + if (ns->iocs == NVME_IOCS_ZONED) { + ns->id_ns[NVME_IOCS_ZONED] = g_new0(NvmeIdNsZns, 1); + nvme_ns_init_zoned(ns); + } /* no thin provisioning */ - id_ns->ncap = id_ns->nsze; id_ns->nuse = id_ns->ncap; } @@ -194,6 +387,28 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error return -1; } + switch (ns->params.iocs) { + case NVME_IOCS_NVM: + break; + + case NVME_IOCS_ZONED: + if (!ns->zns.info.blk) { + error_setg(errp, "zone info block backend not configured"); + return -1; + } + + if (!ns->params.zns.zcap) { + error_setg(errp, "zero zone capacity"); + return -1; + } + + break; + + default: + error_setg(errp, "unsupported I/O command set"); + return -1; + } + return 0; } @@ -222,6 +437,12 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) id_ns->nsfeat |= 0x4; } + if (nvme_ns_zoned(ns)) { + if (nvme_ns_setup_blk_zoneinfo(ns, errp)) { + return -1; + } + } + if (nvme_register_namespace(n, ns, errp)) { return -1; } @@ -249,6 +470,10 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT8("lbads", NvmeNamespace, params.lbads, BDRV_SECTOR_BITS), DEFINE_PROP_DRIVE("state", NvmeNamespace, blk_state), DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0), + DEFINE_PROP_DRIVE("zns.zoneinfo", NvmeNamespace, zns.info.blk), + DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0), + DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0), + DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 4124f20f1cef..7dcf0f02a07f 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -23,8 +23,20 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; uint8_t iocs; uint8_t lbads; + + struct { + uint64_t zcap; + uint16_t zoc; + uint16_t ozcs; + } zns; } NvmeNamespaceParams; +typedef struct NvmeZone { + NvmeZoneDescriptor zd; + + uint64_t wp_staging; +} NvmeZone; + typedef struct NvmeNamespace { DeviceState parent_obj; BlockBackend *blk; @@ -41,8 +53,22 @@ typedef struct NvmeNamespace { struct { uint32_t err_rec; } features; + + struct { + struct { + BlockBackend *blk; + + uint64_t num_zones; + NvmeZone *zones; + } info; + } zns; } NvmeNamespace; +static inline bool nvme_ns_zoned(NvmeNamespace *ns) +{ + return ns->iocs == NVME_IOCS_ZONED; +} + static inline uint32_t nvme_nsid(NvmeNamespace *ns) { if (ns) { @@ -57,17 +83,39 @@ static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns) return ns->id_ns[NVME_IOCS_NVM]; } +static inline NvmeIdNsZns *nvme_ns_id_zoned(NvmeNamespace *ns) +{ + return ns->id_ns[NVME_IOCS_ZONED]; +} + static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns) { NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)]; } +static inline NvmeLBAFE *nvme_ns_lbafe(NvmeNamespace *ns) +{ + NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); + NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns); + return &id_ns_zns->lbafe[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)]; +} + static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns) { return nvme_ns_lbaf(ns)->ds; } +static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns) +{ + return nvme_ns_lbafe(ns)->zsze; +} + +static inline uint64_t nvme_ns_zsze_bytes(NvmeNamespace *ns) +{ + return nvme_ns_zsze(ns) << nvme_ns_lbads(ns); +} + /* calculate the number of LBAs that the namespace can accomodate */ static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns) { @@ -79,8 +127,63 @@ static inline size_t nvme_ns_blk_state_len(NvmeNamespace *ns) return ROUND_UP(DIV_ROUND_UP(nvme_ns_nlbas(ns), 8), BDRV_SECTOR_SIZE); } +static inline uint64_t nvme_ns_zone_idx(NvmeNamespace *ns, uint64_t lba) +{ + return lba / nvme_ns_zsze(ns); +} + +static inline NvmeZone *nvme_ns_get_zone(NvmeNamespace *ns, uint64_t lba) +{ + uint64_t idx = nvme_ns_zone_idx(ns, lba); + if (unlikely(idx >= ns->zns.info.num_zones)) { + return NULL; + } + + return &ns->zns.info.zones[idx]; +} + +static inline NvmeZoneState nvme_zs(NvmeZone *zone) +{ + return (zone->zd.zs >> 4) & 0xf; +} + +static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs) +{ + zone->zd.zs = zs << 4; +} + +static inline bool nvme_ns_zone_wp_valid(NvmeZone *zone) +{ + switch (nvme_zs(zone)) { + case NVME_ZS_ZSF: + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + return false; + default: + return false; + } +} + +static inline uint64_t nvme_zslba(NvmeZone *zone) +{ + return le64_to_cpu(zone->zd.zslba); +} + +static inline uint64_t nvme_zcap(NvmeZone *zone) +{ + return le64_to_cpu(zone->zd.zcap); +} + +static inline uint64_t nvme_wp(NvmeZone *zone) +{ + return le64_to_cpu(zone->zd.wp); +} + typedef struct NvmeCtrl NvmeCtrl; +const char *nvme_zs_str(NvmeZone *zone); +const char *nvme_zs_to_str(NvmeZoneState zs); + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); #endif /* NVME_NS_H */ diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 1662c11a4cf3..4ec3b3029388 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -902,6 +902,115 @@ static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type) } } +static uint16_t nvme_check_zone_readable(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeZoneState zs = nvme_zs(zone); + uint64_t zslba = nvme_zslba(zone); + + if (zs == NVME_ZS_ZSO) { + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba, + NVME_ZS_ZSO); + return NVME_ZONE_IS_OFFLINE | NVME_DNR; + } + + return NVME_SUCCESS; +} + +static uint16_t nvme_check_zone_read(NvmeCtrl *n, uint64_t slba, uint32_t nlb, + NvmeRequest *req, NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns); + uint64_t zslba = nvme_zslba(zone); + uint64_t zsze = nvme_ns_zsze(ns); + uint16_t status; + + status = nvme_check_zone_readable(n, req, zone); + if (status) { + return status; + } + + if ((slba + nlb) > (zslba + zsze)) { + if (!(id_ns_zns->ozcs & NVME_ID_NS_ZNS_OZCS_RAZB)) { + trace_pci_nvme_err_zone_boundary(nvme_cid(req), slba, nlb, zsze); + return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR; + } + } + + return NVME_SUCCESS; +} + +static uint16_t nvme_check_zone_writeable(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeZoneState zs = nvme_zs(zone); + uint64_t zslba = nvme_zslba(zone); + + if (zs == NVME_ZS_ZSO) { + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba, + NVME_ZS_ZSO); + return NVME_ZONE_IS_OFFLINE | NVME_DNR; + } + + switch (zs) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSC: + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + return NVME_SUCCESS; + case NVME_ZS_ZSF: + trace_pci_nvme_err_zone_is_full(nvme_cid(req), req->slba); + return NVME_ZONE_IS_FULL | NVME_DNR; + case NVME_ZS_ZSRO: + trace_pci_nvme_err_zone_is_read_only(nvme_cid(req), req->slba); + return NVME_ZONE_IS_READ_ONLY | NVME_DNR; + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), zslba, zs); + return NVME_INTERNAL_DEV_ERROR | NVME_DNR; +} + +static uint16_t nvme_check_zone_write(NvmeCtrl *n, uint64_t slba, uint32_t nlb, + NvmeRequest *req, NvmeZone *zone) +{ + uint64_t zslba, wp, zcap; + uint16_t status; + + zslba = nvme_zslba(zone); + wp = zone->wp_staging; + zcap = nvme_zcap(zone); + + status = nvme_check_zone_writeable(n, req, zone); + if (status) { + return status; + } + + if ((wp - zslba) + nlb > zcap) { + trace_pci_nvme_err_zone_boundary(nvme_cid(req), slba, nlb, zcap); + return NVME_ZONE_BOUNDARY_ERROR | NVME_DNR; + } + + if (slba != wp) { + trace_pci_nvme_err_zone_invalid_write(nvme_cid(req), slba, wp); + return NVME_ZONE_INVALID_WRITE | NVME_DNR; + } + + return NVME_SUCCESS; +} + +static inline uint16_t nvme_check_rwz_zone(NvmeCtrl *n, uint64_t slba, + uint32_t nlb, NvmeRequest *req, NvmeZone *zone) +{ + if (nvme_req_is_write(req)) { + return nvme_check_zone_write(n, slba, nlb, req, zone); + } + + return nvme_check_zone_read(n, slba, nlb, req, zone); +} + static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len) { uint8_t mdts = n->params.mdts; @@ -995,6 +1104,44 @@ static void nvme_ns_update_util(NvmeNamespace *ns, uint64_t slba, nvme_req_add_aio(req, aio); } +static void nvme_update_zone_info(NvmeNamespace *ns, NvmeRequest *req, + NvmeZone *zone) +{ + uint64_t zslba = -1; + + QEMUIOVector *iov = g_new0(QEMUIOVector, 1); + NvmeAIO *aio = g_new0(NvmeAIO, 1); + + *aio = (NvmeAIO) { + .opc = NVME_AIO_OPC_WRITE, + .blk = ns->zns.info.blk, + .payload = iov, + .req = req, + .flags = NVME_AIO_INTERNAL, + }; + + qemu_iovec_init(iov, 1); + + if (zone) { + zslba = nvme_zslba(zone); + trace_pci_nvme_update_zone_info(nvme_cid(req), ns->params.nsid, zslba); + + aio->offset = nvme_ns_zone_idx(ns, zslba) * sizeof(NvmeZoneDescriptor); + qemu_iovec_add(iov, &zone->zd, sizeof(NvmeZoneDescriptor)); + } else { + trace_pci_nvme_update_zone_info(nvme_cid(req), ns->params.nsid, zslba); + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + qemu_iovec_add(iov, &ns->zns.info.zones[i].zd, + sizeof(NvmeZoneDescriptor)); + } + } + + aio->len = iov->size; + + nvme_req_add_aio(req, aio); +} + static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret) { NvmeRequest *req = aio->req; @@ -1009,6 +1156,44 @@ static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret) } } +static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb, + NvmeRequest *req) +{ + NvmeZoneDescriptor *zd = &zone->zd; + uint64_t wp = nvme_wp(zone); + uint64_t zslba = nvme_zslba(zone); + + trace_pci_nvme_zone_advance_wp(nvme_cid(req), zslba, nlb, wp, wp + nlb); + + wp += nlb; + if (wp == zslba + nvme_zcap(zone)) { + nvme_zs_set(zone, NVME_ZS_ZSF); + } + + zd->wp = cpu_to_le64(wp); +} + +static void nvme_aio_zone_write_cb(NvmeAIO *aio, void *opaque, int ret) +{ + NvmeZone *zone = opaque; + NvmeRequest *req = aio->req; + NvmeNamespace *ns = req->ns; + uint32_t nlb = req->nlb; + uint64_t zslba = nvme_zslba(zone); + uint64_t wp = nvme_wp(zone); + + trace_pci_nvme_aio_zone_write_cb(nvme_cid(req), zslba, nlb, wp); + + if (ret) { + return; + } + + nvme_aio_write_cb(aio, opaque, ret); + nvme_zone_advance_wp(zone, nlb, req); + + nvme_update_zone_info(ns, req, zone); +} + static void nvme_rw_cb(NvmeRequest *req, void *opaque) { NvmeNamespace *ns = req->ns; @@ -1045,6 +1230,7 @@ static void nvme_aio_cb(void *opaque, int ret) block_acct_failed(stats, acct); if (req) { + NvmeNamespace *ns = req->ns; uint16_t status; switch (aio->opc) { @@ -1075,6 +1261,16 @@ static void nvme_aio_cb(void *opaque, int ret) if (!req->status || (status & 0xfff) == NVME_INTERNAL_DEV_ERROR) { req->status = status; } + + /* transition the zone to offline state */ + if (nvme_ns_zoned(ns)) { + NvmeZone *zone = nvme_ns_get_zone(ns, req->slba); + + nvme_zs_set(zone, NVME_ZS_ZSO); + NVME_ZA_CLEAR(zone->zd.za); + + nvme_update_zone_info(ns, req, zone); + } } } @@ -1098,7 +1294,8 @@ static void nvme_aio_cb(void *opaque, int ret) } static void nvme_aio_rw(NvmeNamespace *ns, NvmeAIOOp opc, - NvmeAIOCompletionFunc *cb, NvmeRequest *req) + NvmeAIOCompletionFunc *cb, void *cb_arg, + NvmeRequest *req) { NvmeAIO *aio = g_new(NvmeAIO, 1); @@ -1108,6 +1305,7 @@ static void nvme_aio_rw(NvmeNamespace *ns, NvmeAIOOp opc, .offset = req->slba << nvme_ns_lbads(ns), .req = req, .cb = cb, + .cb_arg = cb_arg, }; if (req->qsg.sg) { @@ -1138,33 +1336,59 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } -static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req) { - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeNamespace *ns = req->ns; NvmeAIO *aio; + NvmeAIOCompletionFunc *cb = nvme_aio_write_cb; + void *cb_arg = NULL; + + NvmeNamespace *ns = req->ns; int64_t offset; size_t count; uint16_t status; - req->slba = le64_to_cpu(rw->slba); - req->nlb = le16_to_cpu(rw->nlb) + 1; - trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), req->slba, req->nlb); status = nvme_check_bounds(n, ns, req->slba, req->nlb); if (status) { NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns); - trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb, - id_ns->nsze); - return status; + trace_pci_nvme_err_invalid_lba_range(req->slba, req->nlb, id_ns->nsze); + + goto invalid; } offset = req->slba << nvme_ns_lbads(ns); count = req->nlb << nvme_ns_lbads(ns); + if (nvme_ns_zoned(ns)) { + NvmeZone *zone = nvme_ns_get_zone(ns, req->slba); + if (!zone) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba); + status = NVME_INVALID_FIELD | NVME_DNR; + goto invalid; + } + + status = nvme_check_zone_write(n, req->slba, req->nlb, req, zone); + if (status) { + goto invalid; + } + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSC: + nvme_zs_set(zone, NVME_ZS_ZSIO); + default: + break; + } + + cb = nvme_aio_zone_write_cb; + cb_arg = zone; + + zone->wp_staging += req->nlb; + } + aio = g_new0(NvmeAIO, 1); *aio = (NvmeAIO) { @@ -1173,25 +1397,33 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) .offset = offset, .len = count, .req = req, - .cb = nvme_aio_write_cb, + .cb = cb, + .cb_arg = cb_arg, }; nvme_req_add_aio(req, aio); + nvme_req_set_cb(req, nvme_rw_cb, NULL); + return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE); + return status; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req) { - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeAIOCompletionFunc *cb = NULL; + void *cb_arg = NULL; + NvmeNamespace *ns = req->ns; - uint32_t len; - int status; + size_t len; + uint16_t status; enum BlockAcctType acct = BLOCK_ACCT_READ; NvmeAIOOp opc = NVME_AIO_OPC_READ; - NvmeAIOCompletionFunc *cb = NULL; if (nvme_req_is_write(req)) { acct = BLOCK_ACCT_WRITE; @@ -1199,8 +1431,6 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) cb = nvme_aio_write_cb; } - req->nlb = le16_to_cpu(rw->nlb) + 1; - req->slba = le64_to_cpu(rw->slba); len = req->nlb << nvme_ns_lbads(ns); trace_pci_nvme_rw(nvme_cid(req), nvme_req_is_write(req) ? "write" : "read", @@ -1216,7 +1446,38 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - nvme_aio_rw(ns, opc, cb, req); + if (nvme_ns_zoned(ns)) { + NvmeZone *zone = nvme_ns_get_zone(ns, req->slba); + if (!zone) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba); + status = NVME_INVALID_FIELD | NVME_DNR; + goto invalid; + } + + status = nvme_check_rwz_zone(n, req->slba, req->nlb, req, zone); + if (status) { + goto invalid; + } + + if (nvme_req_is_write(req)) { + switch (nvme_zs(zone)) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSC: + nvme_zs_set(zone, NVME_ZS_ZSIO); + default: + break; + } + + cb = nvme_aio_zone_write_cb; + cb_arg = zone; + + zone->wp_staging += req->nlb; + } + } else if (nvme_req_is_write(req)) { + cb = nvme_aio_write_cb; + } + + nvme_aio_rw(ns, opc, cb, cb_arg, req); nvme_req_set_cb(req, nvme_rw_cb, NULL); return NVME_NO_COMPLETE; @@ -1226,6 +1487,47 @@ invalid: return status; } +static uint16_t nvme_rwz(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd; + NvmeNamespace *ns = req->ns; + NvmeZone *zone; + + req->nlb = le16_to_cpu(rw->nlb) + 1; + req->slba = le64_to_cpu(rw->slba); + + if (nvme_ns_zoned(ns) && nvme_req_is_write(req)) { + zone = nvme_ns_get_zone(ns, req->slba); + if (!zone) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (zone->wp_staging != nvme_wp(zone)) { + NVME_GUEST_ERR(pci_nvme_zone_pending_writes, + "cid %"PRIu16"; zone (zslba 0x%"PRIx64") has " + "pending writes " + "(wp 0x%"PRIx64" wp_staging 0x%"PRIx64"; " + "additional writes should not be submitted", + nvme_cid(req), nvme_zslba(zone), nvme_wp(zone), + zone->wp_staging); + + if (n->params.defensive) { + return NVME_ZONE_INVALID_WRITE; + } + } + } + + switch (req->cmd.opcode) { + case NVME_CMD_WRITE_ZEROES: + return nvme_do_write_zeroes(n, req); + default: + break; + } + + return nvme_do_rw(n, req); +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { uint32_t nsid = le32_to_cpu(req->cmd.nsid); @@ -1245,11 +1547,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) switch (req->cmd.opcode) { case NVME_CMD_FLUSH: return nvme_flush(n, req); - case NVME_CMD_WRITE_ZEROES: - return nvme_write_zeroes(n, req); - case NVME_CMD_WRITE: case NVME_CMD_READ: - return nvme_rw(n, req); + case NVME_CMD_WRITE: + case NVME_CMD_WRITE_ZEROES: + return nvme_rwz(n, req); default: trace_pci_nvme_err_invalid_opc(req->cmd.opcode); return NVME_INVALID_OPCODE | NVME_DNR; @@ -2342,6 +2643,10 @@ static void nvme_clear_ctrl(NvmeCtrl *n) if (ns->blk_state) { blk_drain(ns->blk_state); } + + if (nvme_ns_zoned(ns)) { + blk_drain(ns->zns.info.blk); + } } for (i = 0; i < n->params.max_ioqpairs + 1; i++) { @@ -2376,6 +2681,10 @@ static void nvme_clear_ctrl(NvmeCtrl *n) if (ns->blk_state) { blk_flush(ns->blk_state); } + + if (nvme_ns_zoned(ns)) { + blk_flush(ns->zns.info.blk); + } } n->bar.cc = 0; @@ -2897,7 +3206,7 @@ static void nvme_init_state(NvmeCtrl *n) n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING; n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); - n->iocscs[0] = 1 << NVME_IOCS_NVM; + n->iocscs[0] = (1 << NVME_IOCS_NVM) | (1 << NVME_IOCS_ZONED); n->features.iocsci = 0; } @@ -3047,6 +3356,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) NvmeIdCtrl *id = &n->id_ctrl; uint8_t *pci_conf = pci_dev->config; + n->id_ctrl_iocss[NVME_IOCS_NVM] = g_new0(NvmeIdCtrl, 1); + n->id_ctrl_iocss[NVME_IOCS_ZONED] = g_new0(NvmeIdCtrl, 1); + id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' '); @@ -3183,6 +3495,7 @@ static Property nvme_props[] = { DEFINE_PROP_UINT8("aerl", NvmeCtrl, params.aerl, 3), DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64), DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7), + DEFINE_PROP_BOOL("defensive", NvmeCtrl, params.defensive, false), DEFINE_PROP_BOOL("x-use-intel-id", NvmeCtrl, params.use_intel_id, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 69be47963f5d..1ec1af8d6291 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -7,6 +7,7 @@ #define NVME_MAX_NAMESPACES 256 typedef struct NvmeParams { + bool defensive; char *serial; uint32_t num_queues; /* deprecated since 5.1 */ uint32_t max_ioqpairs; diff --git a/hw/block/trace-events b/hw/block/trace-events index 4cf0236631d2..9e0b848186c8 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -42,6 +42,8 @@ pci_nvme_req_add_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offs pci_nvme_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, const char *opc, void *req) "cid %"PRIu16" aio %p blk \"%s\" offset %"PRIu64" opc \"%s\" req %p" pci_nvme_aio_discard_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" pci_nvme_aio_write_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" +pci_nvme_aio_zone_write_cb(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp 0x%"PRIx64"" +pci_nvme_zone_advance_wp(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp_old, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp_old 0x%"PRIx64" wp 0x%"PRIx64"" pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8"" pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8"" pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" @@ -80,6 +82,8 @@ pci_nvme_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 0x%"PRI pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" new_head %"PRIu16"" pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_tail %"PRIu16"" pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" +pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64"" +pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64"" @@ -99,6 +103,10 @@ pci_nvme_err_aio(uint16_t cid, void *aio, const char *blkname, uint64_t offset, pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8"" pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64"" pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64"" +pci_nvme_err_zone_is_full(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64"" +pci_nvme_err_zone_is_read_only(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64"" +pci_nvme_err_zone_invalid_write(uint16_t cid, uint64_t slba, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" wp 0x%"PRIx64"" +pci_nvme_err_zone_boundary(uint16_t cid, uint64_t slba, uint32_t nlb, uint64_t zcap) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" zcap 0x%"PRIx64"" pci_nvme_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8"" pci_nvme_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8"" pci_nvme_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16"" @@ -127,6 +135,8 @@ pci_nvme_err_invalid_identify_cns(uint16_t cns) "identify, invalid cns=0x%"PRIx1 pci_nvme_err_invalid_getfeat(int dw10) "invalid get features, dw10=0x%"PRIx32"" pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx32"" pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16"" +pci_nvme_err_invalid_zone(uint16_t cid, uint64_t lba) "cid %"PRIu16" lba 0x%"PRIx64"" +pci_nvme_err_invalid_zone_condition(uint16_t cid, uint64_t zslba, uint8_t condition) "cid %"PRIu16" zslba 0x%"PRIx64" condition 0x%"PRIx8"" pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues" pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues" pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null" From patchwork Tue Jun 30 10:01:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85154739 for ; Tue, 30 Jun 2020 10:04:49 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 56C3F20675 for ; Tue, 30 Jun 2020 10:04:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 56C3F20675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:35520 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqD8C-000487-Iv for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:04:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57958) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5b-0007mB-TJ; Tue, 30 Jun 2020 06:02:09 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47566) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5Q-0004Jb-DJ; Tue, 30 Jun 2020 06:02:07 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 90E29BF762; Tue, 30 Jun 2020 10:01:54 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 04/10] hw/block/nvme: add the zone management receive command Date: Tue, 30 Jun 2020 12:01:33 +0200 Message-Id: <20200630100139.1483002-5-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add the Zone Management Receive command. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 33 +++++++++-- hw/block/nvme-ns.h | 9 ++- hw/block/nvme.c | 130 ++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme.h | 6 ++ hw/block/trace-events | 1 + include/block/nvme.h | 5 ++ 6 files changed, 179 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 9a08b2ba0fb2..68996c2f0e72 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -99,6 +99,10 @@ static int nvme_ns_init_blk_zoneinfo(NvmeNamespace *ns, size_t len, zd->zcap = ns->params.zns.zcap; zone->wp_staging = zslba; zd->wp = zd->zslba = cpu_to_le64(zslba); + + if (ns->params.zns.zdes) { + zone->zde = g_malloc0(nvme_ns_zdes_bytes(ns)); + } } ret = nvme_ns_blk_resize(blk, len, &local_err); @@ -128,7 +132,7 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp) NvmeZoneDescriptor *zd; BlockBackend *blk = ns->zns.info.blk; uint64_t perm, shared_perm; - int64_t len, zoneinfo_len; + int64_t len, zoneinfo_len, zone_len; Error *local_err = NULL; int ret; @@ -142,8 +146,9 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp) return ret; } - zoneinfo_len = ROUND_UP(ns->zns.info.num_zones * - sizeof(NvmeZoneDescriptor), BDRV_SECTOR_SIZE); + zone_len = sizeof(NvmeZoneDescriptor) + nvme_ns_zdes_bytes(ns); + zoneinfo_len = ROUND_UP(ns->zns.info.num_zones * zone_len, + BDRV_SECTOR_SIZE); len = blk_getlength(blk); if (len < 0) { @@ -177,6 +182,23 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp) zone->wp_staging = nvme_wp(zone); + if (ns->params.zns.zdes) { + uint16_t zde_bytes = nvme_ns_zdes_bytes(ns); + int64_t offset = ns->zns.info.num_zones * + sizeof(NvmeZoneDescriptor); + ns->zns.info.zones[i].zde = g_malloc(zde_bytes); + + ret = blk_pread(blk, offset + i * zde_bytes, + ns->zns.info.zones[i].zde, zde_bytes); + if (ret < 0) { + error_setg_errno(errp, -ret, "blk_pread: "); + return ret; + } else if (ret != zde_bytes) { + error_setg(errp, "blk_pread: short read"); + return -1; + } + } + switch (nvme_zs(zone)) { case NVME_ZS_ZSE: case NVME_ZS_ZSF: @@ -185,7 +207,8 @@ static int nvme_ns_setup_blk_zoneinfo(NvmeNamespace *ns, Error **errp) continue; case NVME_ZS_ZSC: - if (nvme_wp(zone) == nvme_zslba(zone)) { + if (nvme_wp(zone) == nvme_zslba(zone) && + !NVME_ZA_ZDEV(zd->za)) { nvme_zs_set(zone, NVME_ZS_ZSE); continue; } @@ -231,6 +254,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) for (int i = 0; i <= id_ns->nlbaf; i++) { id_ns_zns->lbafe[i].zsze = cpu_to_le64(pow2ceil(ns->params.zns.zcap)); + id_ns_zns->lbafe[i].zdes = ns->params.zns.zdes; } ns->zns.info.num_zones = nvme_ns_nlbas(ns) / nvme_ns_zsze(ns); @@ -472,6 +496,7 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT8("iocs", NvmeNamespace, params.iocs, 0x0), DEFINE_PROP_DRIVE("zns.zoneinfo", NvmeNamespace, zns.info.blk), DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0), + DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0), DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0), DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0), DEFINE_PROP_END_OF_LIST(), diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 7dcf0f02a07f..5940fb73e72b 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -26,13 +26,15 @@ typedef struct NvmeNamespaceParams { struct { uint64_t zcap; + uint8_t zdes; uint16_t zoc; uint16_t ozcs; } zns; } NvmeNamespaceParams; typedef struct NvmeZone { - NvmeZoneDescriptor zd; + NvmeZoneDescriptor zd; + uint8_t *zde; uint64_t wp_staging; } NvmeZone; @@ -152,6 +154,11 @@ static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs) zone->zd.zs = zs << 4; } +static inline size_t nvme_ns_zdes_bytes(NvmeNamespace *ns) +{ + return ns->params.zns.zdes << 6; +} + static inline bool nvme_ns_zone_wp_valid(NvmeZone *zone) { switch (nvme_zs(zone)) { diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 4ec3b3029388..7e943dece352 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1528,6 +1528,134 @@ static uint16_t nvme_rwz(NvmeCtrl *n, NvmeRequest *req) return nvme_do_rw(n, req); } +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeZoneManagementRecvCmd *recv; + NvmeZoneManagementRecvAction zra; + NvmeZoneManagementRecvActionSpecificField zrasp; + NvmeNamespace *ns = req->ns; + NvmeZone *zone; + + uint8_t *buf, *bufp, zs_list; + uint64_t slba, num_zones = 0, zidx = 0, zidx_begin; + uint16_t zes, status; + size_t len; + + recv = (NvmeZoneManagementRecvCmd *) &req->cmd; + + zra = recv->zra; + zrasp = recv->zrasp; + slba = le64_to_cpu(recv->slba); + len = (le32_to_cpu(recv->numdw) + 1) << 2; + + if (!nvme_ns_zoned(ns)) { + return NVME_INVALID_OPCODE | NVME_DNR; + } + + trace_pci_nvme_zone_mgmt_recv(nvme_cid(req), nvme_nsid(ns), slba, len, + zra, zrasp, recv->zrasf); + + if (!len) { + return NVME_SUCCESS; + } + + switch (zrasp) { + case NVME_CMD_ZONE_MGMT_RECV_LIST_ALL: + zs_list = 0; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE: + zs_list = NVME_ZS_ZSE; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO: + zs_list = NVME_ZS_ZSIO; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO: + zs_list = NVME_ZS_ZSEO; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC: + zs_list = NVME_ZS_ZSC; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF: + zs_list = NVME_ZS_ZSF; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO: + zs_list = NVME_ZS_ZSRO; + break; + + case NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO: + zs_list = NVME_ZS_ZSO; + break; + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + + status = nvme_check_mdts(n, len); + if (status) { + return status; + } + + if (!nvme_ns_get_zone(ns, slba)) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), slba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + zidx_begin = zidx = nvme_ns_zone_idx(ns, slba); + zes = sizeof(NvmeZoneDescriptor); + if (zra == NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES) { + zes += nvme_ns_zdes_bytes(ns); + } + + buf = bufp = g_malloc0(len); + bufp += sizeof(NvmeZoneReportHeader); + + while ((bufp + zes) - buf <= len && zidx < ns->zns.info.num_zones) { + zone = &ns->zns.info.zones[zidx++]; + + if (zs_list && zs_list != nvme_zs(zone)) { + continue; + } + + num_zones++; + + memcpy(bufp, &zone->zd, sizeof(NvmeZoneDescriptor)); + + if (zra == NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES) { + memcpy(bufp + sizeof(NvmeZoneDescriptor), zone->zde, + nvme_ns_zdes_bytes(ns)); + } + + bufp += zes; + } + + if (!(recv->zrasf & NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL)) { + if (!zs_list) { + num_zones = ns->zns.info.num_zones - zidx_begin; + } else { + num_zones = 0; + for (int i = zidx_begin; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + + if (zs_list == nvme_zs(zone)) { + num_zones++; + } + } + } + } + + stq_le_p(buf, num_zones); + + status = nvme_dma(n, buf, len, DMA_DIRECTION_FROM_DEVICE, req); + g_free(buf); + + return status; +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { uint32_t nsid = le32_to_cpu(req->cmd.nsid); @@ -1551,6 +1679,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_WRITE: case NVME_CMD_WRITE_ZEROES: return nvme_rwz(n, req); + case NVME_CMD_ZONE_MGMT_RECV: + return nvme_zone_mgmt_recv(n, req); default: trace_pci_nvme_err_invalid_opc(req->cmd.opcode); return NVME_INVALID_OPCODE | NVME_DNR; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 1ec1af8d6291..92aebb6a6416 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -47,6 +47,12 @@ static const NvmeEffectsLog nvme_effects[] = { NVME_EFFECTS_LBCC, }, }, + + [NVME_IOCS_ZONED] = { + .iocs = { + [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP, + } + }, }; typedef struct NvmeAsyncEvent { diff --git a/hw/block/trace-events b/hw/block/trace-events index 9e0b848186c8..9d2a7c2766b6 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -49,6 +49,7 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" s pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" +pci_nvme_zone_mgmt_recv(uint16_t cid, uint32_t nsid, uint64_t slba, uint64_t len, uint8_t zra, uint8_t zrasp, uint8_t zrasf) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" len %"PRIu64" zra 0x%"PRIx8" zrasp 0x%"PRIx8" zrasf 0x%"PRIx8"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16"" diff --git a/include/block/nvme.h b/include/block/nvme.h index ddf948132272..68dac2582b06 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -746,6 +746,11 @@ typedef enum NvmeZoneManagementRecvActionSpecificField { #define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1 +typedef struct NvmeZoneReportHeader { + uint64_t num_zones; + uint8_t rsvd[56]; +} NvmeZoneReportHeader; + typedef struct NvmeDsmCmd { uint8_t opcode; uint8_t flags; From patchwork Tue Jun 30 10:01:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F06B739 for ; Tue, 30 Jun 2020 10:03:10 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3AB220675 for ; Tue, 30 Jun 2020 10:03:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3AB220675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56100 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqD6b-0000wX-3Q for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:03:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57926) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5Y-0007kP-9O; Tue, 30 Jun 2020 06:02:05 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47600) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5T-0004Jy-1E; Tue, 30 Jun 2020 06:02:03 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 19135BF7EC; Tue, 30 Jun 2020 10:01:55 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 05/10] hw/block/nvme: add the zone management send command Date: Tue, 30 Jun 2020 12:01:34 +0200 Message-Id: <20200630100139.1483002-6-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add the Zone Management Send command. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 461 ++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme.h | 4 + hw/block/trace-events | 12 ++ 3 files changed, 477 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 7e943dece352..a4527ad9840e 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -748,6 +748,11 @@ static void nvme_submit_aio(NvmeAIO *aio) } break; + + case NVME_AIO_OPC_DISCARD: + aio->aiocb = blk_aio_pdiscard(blk, aio->offset, aio->len, nvme_aio_cb, + aio); + break; } } @@ -1142,6 +1147,46 @@ static void nvme_update_zone_info(NvmeNamespace *ns, NvmeRequest *req, nvme_req_add_aio(req, aio); } +static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, + NvmeZone *zone) +{ + uint64_t zslba = -1; + QEMUIOVector *iov = g_new0(QEMUIOVector, 1); + NvmeAIO *aio = g_new0(NvmeAIO, 1); + + *aio = (NvmeAIO) { + .opc = NVME_AIO_OPC_WRITE, + .blk = ns->zns.info.blk, + .payload = iov, + .offset = ns->zns.info.num_zones * sizeof(NvmeZoneDescriptor), + .req = req, + .flags = NVME_AIO_INTERNAL, + }; + + qemu_iovec_init(iov, 1); + + if (zone) { + zslba = nvme_zslba(zone); + trace_pci_nvme_update_zone_descr(nvme_cid(req), ns->params.nsid, + zslba); + + aio->offset += nvme_ns_zone_idx(ns, zslba) * nvme_ns_zdes_bytes(ns); + qemu_iovec_add(iov, zone->zde, nvme_ns_zdes_bytes(ns)); + } else { + trace_pci_nvme_update_zone_descr(nvme_cid(req), ns->params.nsid, + zslba); + + for (int i = 0; i < ns->zns.info.num_zones; i++) { + qemu_iovec_add(iov, ns->zns.info.zones[i].zde, + nvme_ns_zdes_bytes(ns)); + } + } + + aio->len = iov->size; + + nvme_req_add_aio(req, aio); +} + static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret) { NvmeRequest *req = aio->req; @@ -1206,6 +1251,49 @@ static void nvme_rw_cb(NvmeRequest *req, void *opaque) nvme_enqueue_req_completion(cq, req); } +static void nvme_zone_mgmt_send_reset_cb(NvmeRequest *req, void *opaque) +{ + NvmeSQueue *sq = req->sq; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + NvmeNamespace *ns = req->ns; + + trace_pci_nvme_zone_mgmt_send_reset_cb(nvme_cid(req), nvme_nsid(ns)); + + g_free(opaque); + + nvme_enqueue_req_completion(cq, req); +} + +static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret) +{ + NvmeRequest *req = aio->req; + NvmeZone *zone = opaque; + NvmeNamespace *ns = req->ns; + + uint64_t zslba = nvme_zslba(zone); + uint64_t zcap = nvme_zcap(zone); + + if (ret) { + return; + } + + trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba); + + nvme_zs_set(zone, NVME_ZS_ZSE); + NVME_ZA_CLEAR(zone->zd.za); + + zone->zd.wp = zone->zd.zslba; + zone->wp_staging = zslba; + + nvme_update_zone_info(ns, req, zone); + + if (ns->blk_state) { + bitmap_clear(ns->utilization, zslba, zcap); + nvme_ns_update_util(ns, zslba, zcap, req); + } +} + static void nvme_aio_cb(void *opaque, int ret) { NvmeAIO *aio = opaque; @@ -1336,6 +1424,377 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + + trace_pci_nvme_zone_mgmt_send_close(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + nvme_zs_set(zone, NVME_ZS_ZSC); + + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; + + case NVME_ZS_ZSC: + return NVME_SUCCESS; + + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; +} + +static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + + trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + case NVME_ZS_ZSC: + case NVME_ZS_ZSE: + nvme_zs_set(zone, NVME_ZS_ZSF); + + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; + + case NVME_ZS_ZSF: + return NVME_SUCCESS; + + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; +} + +static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + + trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSC: + case NVME_ZS_ZSIO: + nvme_zs_set(zone, NVME_ZS_ZSEO); + + nvme_update_zone_info(ns, req, zone); + return NVME_NO_COMPLETE; + + case NVME_ZS_ZSEO: + return NVME_SUCCESS; + + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; +} + +static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeAIO *aio; + NvmeNamespace *ns = req->ns; + uint64_t zslba = nvme_zslba(zone); + uint64_t zcap = nvme_zcap(zone); + uint8_t lbads = nvme_ns_lbads(ns); + + trace_pci_nvme_zone_mgmt_send_reset(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + case NVME_ZS_ZSC: + case NVME_ZS_ZSF: + aio = g_new0(NvmeAIO, 1); + + *aio = (NvmeAIO) { + .opc = NVME_AIO_OPC_DISCARD, + .blk = ns->blk, + .offset = zslba << lbads, + .len = zcap << lbads, + .req = req, + .cb = nvme_aio_zone_reset_cb, + .cb_arg = zone, + }; + + nvme_req_add_aio(req, aio); + nvme_req_set_cb(req, nvme_zone_mgmt_send_reset_cb, NULL); + + return NVME_NO_COMPLETE; + + case NVME_ZS_ZSE: + return NVME_SUCCESS; + + case NVME_ZS_ZSRO: + nvme_zs_set(zone, NVME_ZS_ZSO); + + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; + + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; +} + +static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + + trace_pci_nvme_zone_mgmt_send_offline(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSRO: + nvme_zs_set(zone, NVME_ZS_ZSO); + + nvme_update_zone_info(ns, req, zone); + return NVME_NO_COMPLETE; + + case NVME_ZS_ZSO: + return NVME_SUCCESS; + + default: + break; + } + + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; +} + +static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeNamespace *ns = req->ns; + uint16_t status; + + trace_pci_nvme_zone_mgmt_send_set_zde(nvme_cid(req), nvme_nsid(ns), + nvme_zslba(zone), nvme_zs_str(zone)); + + if (nvme_zs(zone) != NVME_ZS_ZSE) { + trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), + nvme_zslba(zone), + nvme_zs(zone)); + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + nvme_zs_set(zone, NVME_ZS_ZSEO); + + status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns), + DMA_DIRECTION_TO_DEVICE, req); + if (status) { + return status; + } + + NVME_ZA_SET_ZDEV(zone->zd.za, 0x1); + nvme_update_zone_descr(ns, req, zone); + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; +} + +static uint16_t nvme_zone_mgmt_send_all(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeZoneManagementSendCmd *send = (NvmeZoneManagementSendCmd *) &req->cmd; + NvmeNamespace *ns = req->ns; + NvmeZone *zone; + NvmeZoneState zs; + + uint16_t status = NVME_SUCCESS; + + trace_pci_nvme_zone_mgmt_send_all(nvme_cid(req), nvme_nsid(ns), send->zsa); + + switch (send->zsa) { + case NVME_CMD_ZONE_MGMT_SEND_SET_ZDE: + return NVME_INVALID_FIELD | NVME_DNR; + + case NVME_CMD_ZONE_MGMT_SEND_CLOSE: + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zs = nvme_zs(zone); + + switch (zs) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + status = nvme_zone_mgmt_send_close(n, req, zone); + if (status && status != NVME_NO_COMPLETE) { + goto err_out; + } + + default: + continue; + } + } + + break; + + case NVME_CMD_ZONE_MGMT_SEND_FINISH: + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zs = nvme_zs(zone); + + switch (zs) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + case NVME_ZS_ZSC: + status = nvme_zone_mgmt_send_finish(n, req, zone); + if (status && status != NVME_NO_COMPLETE) { + goto err_out; + } + + default: + continue; + } + } + + break; + + case NVME_CMD_ZONE_MGMT_SEND_OPEN: + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zs = nvme_zs(zone); + + if (zs == NVME_ZS_ZSC) { + status = nvme_zone_mgmt_send_open(n, req, zone); + if (status && status != NVME_NO_COMPLETE) { + goto err_out; + } + } + } + + break; + + case NVME_CMD_ZONE_MGMT_SEND_RESET: + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zs = nvme_zs(zone); + + switch (zs) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + case NVME_ZS_ZSC: + case NVME_ZS_ZSF: + status = nvme_zone_mgmt_send_reset(n, req, zone); + if (status && status != NVME_NO_COMPLETE) { + goto err_out; + } + + default: + continue; + } + } + + break; + + case NVME_CMD_ZONE_MGMT_SEND_OFFLINE: + for (int i = 0; i < ns->zns.info.num_zones; i++) { + zone = &ns->zns.info.zones[i]; + zs = nvme_zs(zone); + + if (zs == NVME_ZS_ZSRO) { + status = nvme_zone_mgmt_send_offline(n, req, zone); + if (status && status != NVME_NO_COMPLETE) { + goto err_out; + } + } + } + + break; + } + + return status; + +err_out: + req->status = status; + + if (!QTAILQ_EMPTY(&req->aio_tailq)) { + return NVME_NO_COMPLETE; + } + + return status; +} + +static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeZoneManagementSendCmd *send = (NvmeZoneManagementSendCmd *) &req->cmd; + NvmeZoneManagementSendAction zsa = send->zsa; + NvmeNamespace *ns = req->ns; + NvmeZone *zone; + uint64_t zslba = le64_to_cpu(send->slba); + + if (!nvme_ns_zoned(ns)) { + return NVME_INVALID_OPCODE | NVME_DNR; + } + + trace_pci_nvme_zone_mgmt_send(nvme_cid(req), ns->params.nsid, zslba, zsa, + send->zsflags); + + if (NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(send->zsflags)) { + return nvme_zone_mgmt_send_all(n, req); + } + + if (zslba & (nvme_ns_zsze(ns) - 1)) { + trace_pci_nvme_err_invalid_zslba(nvme_cid(req), zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + zone = nvme_ns_get_zone(ns, zslba); + if (!zone) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + switch (zsa) { + case NVME_CMD_ZONE_MGMT_SEND_CLOSE: + return nvme_zone_mgmt_send_close(n, req, zone); + case NVME_CMD_ZONE_MGMT_SEND_FINISH: + return nvme_zone_mgmt_send_finish(n, req, zone); + case NVME_CMD_ZONE_MGMT_SEND_OPEN: + return nvme_zone_mgmt_send_open(n, req, zone); + case NVME_CMD_ZONE_MGMT_SEND_RESET: + return nvme_zone_mgmt_send_reset(n, req, zone); + case NVME_CMD_ZONE_MGMT_SEND_OFFLINE: + return nvme_zone_mgmt_send_offline(n, req, zone); + case NVME_CMD_ZONE_MGMT_SEND_SET_ZDE: + return nvme_zone_mgmt_send_set_zde(n, req, zone); + } + + return NVME_INVALID_FIELD | NVME_DNR; +} + static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req) { NvmeAIO *aio; @@ -1679,6 +2138,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_WRITE: case NVME_CMD_WRITE_ZEROES: return nvme_rwz(n, req); + case NVME_CMD_ZONE_MGMT_SEND: + return nvme_zone_mgmt_send(n, req); case NVME_CMD_ZONE_MGMT_RECV: return nvme_zone_mgmt_recv(n, req); default: diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 92aebb6a6416..757277d339bf 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -51,6 +51,8 @@ static const NvmeEffectsLog nvme_effects[] = { [NVME_IOCS_ZONED] = { .iocs = { [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP, + [NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_LBCC, } }, }; @@ -127,6 +129,7 @@ typedef enum NvmeAIOOp { NVME_AIO_OPC_READ = 0x2, NVME_AIO_OPC_WRITE = 0x3, NVME_AIO_OPC_WRITE_ZEROES = 0x4, + NVME_AIO_OPC_DISCARD = 0x5, } NvmeAIOOp; typedef enum NvmeAIOFlags { @@ -164,6 +167,7 @@ static inline const char *nvme_aio_opc_str(NvmeAIO *aio) case NVME_AIO_OPC_READ: return "NVME_AIO_OP_READ"; case NVME_AIO_OPC_WRITE: return "NVME_AIO_OP_WRITE"; case NVME_AIO_OPC_WRITE_ZEROES: return "NVME_AIO_OP_WRITE_ZEROES"; + case NVME_AIO_OPC_DISCARD: return "NVME_AIO_OP_DISCARD"; default: return "NVME_AIO_OP_UNKNOWN"; } } diff --git a/hw/block/trace-events b/hw/block/trace-events index 9d2a7c2766b6..1da48d1c29d0 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -43,12 +43,22 @@ pci_nvme_aio_cb(uint16_t cid, void *aio, const char *blkname, uint64_t offset, c pci_nvme_aio_discard_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" pci_nvme_aio_write_cb(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" pci_nvme_aio_zone_write_cb(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp 0x%"PRIx64"" +pci_nvme_aio_zone_reset_cb(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" pci_nvme_zone_advance_wp(uint16_t cid, uint64_t lba, uint32_t nlb, uint64_t wp_old, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" wp_old 0x%"PRIx64" wp 0x%"PRIx64"" pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8"" pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8"" pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" +pci_nvme_zone_mgmt_send(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t zsa, uint8_t zsflags) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zsa 0x%"PRIx8" zsflags 0x%"PRIx8"" +pci_nvme_zone_mgmt_send_all(uint16_t cid, uint32_t nsid, uint8_t za) "cid %"PRIu16" nsid %"PRIu32" za 0x%"PRIx8"" +pci_nvme_zone_mgmt_send_close(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_zone_mgmt_send_finish(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_zone_mgmt_send_open(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_zone_mgmt_send_reset(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_zone_mgmt_send_reset_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" +pci_nvme_zone_mgmt_send_offline(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_zone_mgmt_send_set_zde(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" pci_nvme_zone_mgmt_recv(uint16_t cid, uint32_t nsid, uint64_t slba, uint64_t len, uint8_t zra, uint8_t zrasp, uint8_t zrasf) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" len %"PRIu64" zra 0x%"PRIx8" zrasp 0x%"PRIx8" zrasf 0x%"PRIx8"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" @@ -85,6 +95,7 @@ pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" new_ pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64"" pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" +pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64"" @@ -138,6 +149,7 @@ pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set features, dw10=0x%"PRIx pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 0x%"PRIx16"" pci_nvme_err_invalid_zone(uint16_t cid, uint64_t lba) "cid %"PRIu16" lba 0x%"PRIx64"" pci_nvme_err_invalid_zone_condition(uint16_t cid, uint64_t zslba, uint8_t condition) "cid %"PRIu16" zslba 0x%"PRIx64" condition 0x%"PRIx8"" +pci_nvme_err_invalid_zslba(uint16_t cid, uint64_t zslba) "cid %"PRIu16" zslba 0x%"PRIx64"" pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are non-admin completion queues" pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are non-admin submission queues" pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin submission queue address is null" From patchwork Tue Jun 30 10:01:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 91592739 for ; Tue, 30 Jun 2020 10:07:25 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 31FF72073E for ; Tue, 30 Jun 2020 10:07:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31FF72073E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:45872 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqDAi-0008Qm-6M for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:07:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58018) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5h-0007pV-Gy; Tue, 30 Jun 2020 06:02:13 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47608) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5T-0004K0-1n; Tue, 30 Jun 2020 06:02:13 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id BA508BF7F2; Tue, 30 Jun 2020 10:01:55 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 06/10] hw/block/nvme: add the zone append command Date: Tue, 30 Jun 2020 12:01:35 +0200 Message-Id: <20200630100139.1483002-7-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add the Zone Append command. Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 106 ++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme.h | 3 ++ hw/block/trace-events | 2 + 3 files changed, 111 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index a4527ad9840e..6b394d374c8e 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1294,6 +1294,12 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret) } } +static void nvme_zone_append_cb(NvmeRequest *req, void *opaque) +{ + trace_pci_nvme_zone_append_cb(nvme_cid(req), le64_to_cpu(req->cqe.qw0)); + nvme_rw_cb(req, opaque); +} + static void nvme_aio_cb(void *opaque, int ret) { NvmeAIO *aio = opaque; @@ -1424,6 +1430,104 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, + NvmeZone *zone) +{ + NvmeAIO *aio; + NvmeNamespace *ns = req->ns; + + uint64_t zslba = nvme_zslba(zone); + uint64_t wp = zone->wp_staging; + + size_t len; + uint16_t status; + + req->cqe.qw0 = cpu_to_le64(wp); + req->slba = wp; + + len = req->nlb << nvme_ns_lbads(ns); + + trace_pci_nvme_zone_append(nvme_cid(req), zslba, wp, req->nlb); + + status = nvme_check_rw(n, req); + if (status) { + goto invalid; + } + + status = nvme_check_zone_write(n, req->slba, req->nlb, req, zone); + if (status) { + goto invalid; + } + + switch (nvme_zs(zone)) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSC: + nvme_zs_set(zone, NVME_ZS_ZSIO); + default: + break; + } + + status = nvme_map(n, len, req); + if (status) { + goto invalid; + } + + aio = g_new0(NvmeAIO, 1); + *aio = (NvmeAIO) { + .opc = NVME_AIO_OPC_WRITE, + .blk = ns->blk, + .offset = req->slba << nvme_ns_lbads(ns), + .req = req, + .cb = nvme_aio_zone_write_cb, + .cb_arg = zone, + }; + + if (req->qsg.sg) { + aio->len = req->qsg.size; + aio->flags |= NVME_AIO_DMA; + } else { + aio->len = req->iov.size; + } + + nvme_req_add_aio(req, aio); + nvme_req_set_cb(req, nvme_zone_append_cb, zone); + + zone->wp_staging += req->nlb; + + return NVME_NO_COMPLETE; + +invalid: + block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE); + return status; +} + +static uint16_t nvme_zone_append(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeZone *zone; + NvmeZoneAppendCmd *zappend = (NvmeZoneAppendCmd *) &req->cmd; + NvmeNamespace *ns = req->ns; + uint64_t zslba = le64_to_cpu(zappend->zslba); + + if (!nvme_ns_zoned(ns)) { + return NVME_INVALID_OPCODE | NVME_DNR; + } + + if (zslba & (nvme_ns_zsze(ns) - 1)) { + trace_pci_nvme_err_invalid_zslba(nvme_cid(req), zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + req->nlb = le16_to_cpu(zappend->nlb) + 1; + + zone = nvme_ns_get_zone(ns, zslba); + if (!zone) { + trace_pci_nvme_err_invalid_zone(nvme_cid(req), req->slba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + return nvme_do_zone_append(n, req, zone); +} + static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req, NvmeZone *zone) { @@ -2142,6 +2246,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return nvme_zone_mgmt_send(n, req); case NVME_CMD_ZONE_MGMT_RECV: return nvme_zone_mgmt_recv(n, req); + case NVME_CMD_ZONE_APPEND: + return nvme_zone_append(n, req); default: trace_pci_nvme_err_invalid_opc(req->cmd.opcode); return NVME_INVALID_OPCODE | NVME_DNR; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 757277d339bf..6b4eb0098450 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -53,6 +53,8 @@ static const NvmeEffectsLog nvme_effects[] = { [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP, [NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP | NVME_EFFECTS_LBCC, + [NVME_CMD_ZONE_APPEND] = NVME_EFFECTS_CSUPP | + NVME_EFFECTS_LBCC, } }, }; @@ -177,6 +179,7 @@ static inline bool nvme_req_is_write(NvmeRequest *req) switch (req->cmd.opcode) { case NVME_CMD_WRITE: case NVME_CMD_WRITE_ZEROES: + case NVME_CMD_ZONE_APPEND: return true; default: return false; diff --git a/hw/block/trace-events b/hw/block/trace-events index 1da48d1c29d0..0dfc6e22008e 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -50,6 +50,8 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" s pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" %s nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" +pci_nvme_zone_append(uint16_t cid, uint64_t zslba, uint64_t wp, uint16_t nlb) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" nlb %"PRIu16"" +pci_nvme_zone_append_cb(uint16_t cid, uint64_t slba) "cid %"PRIu16" slba 0x%"PRIx64"" pci_nvme_zone_mgmt_send(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t zsa, uint8_t zsflags) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zsa 0x%"PRIx8" zsflags 0x%"PRIx8"" pci_nvme_zone_mgmt_send_all(uint16_t cid, uint32_t nsid, uint8_t za) "cid %"PRIu16" nsid %"PRIu32" za 0x%"PRIx8"" pci_nvme_zone_mgmt_send_close(uint16_t cid, uint32_t nsid, uint64_t zslba, const char *zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" From patchwork Tue Jun 30 10:01:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A37013BD for ; Tue, 30 Jun 2020 10:05:32 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5B30C206CB for ; Tue, 30 Jun 2020 10:05:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B30C206CB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:37696 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqD8t-000522-Gg for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:05:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58004) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5f-0007oR-KA; Tue, 30 Jun 2020 06:02:13 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47646) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5U-0004LX-SH; Tue, 30 Jun 2020 06:02:11 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 66748BF7F3; Tue, 30 Jun 2020 10:01:56 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 07/10] hw/block/nvme: track and enforce zone resources Date: Tue, 30 Jun 2020 12:01:36 +0200 Message-Id: <20200630100139.1483002-8-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Move all zone transition rules to a single state machine that also manages zone resources. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 17 ++- hw/block/nvme-ns.h | 7 ++ hw/block/nvme.c | 304 ++++++++++++++++++++++++++++++++------------- 3 files changed, 242 insertions(+), 86 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 68996c2f0e72..5a55a0191f55 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -262,8 +262,13 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap; - id_ns_zns->mar = 0xffffffff; - id_ns_zns->mor = 0xffffffff; + id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar); + id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor); + + ns->zns.resources.active = ns->params.zns.mar != 0xffffffff ? + ns->params.zns.mar + 1 : ns->zns.info.num_zones; + ns->zns.resources.open = ns->params.zns.mor != 0xffffffff ? + ns->params.zns.mor + 1 : ns->zns.info.num_zones; } static void nvme_ns_init(NvmeNamespace *ns) @@ -426,6 +431,12 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, Error return -1; } + if (ns->params.zns.mor > ns->params.zns.mar) { + error_setg(errp, "maximum open resources (MOR) must be less " + "than or equal to maximum active resources (MAR)"); + return -1; + } + break; default: @@ -499,6 +510,8 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0), DEFINE_PROP_UINT16("zns.zoc", NvmeNamespace, params.zns.zoc, 0), DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0), + DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0xffffffff), + DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0xffffffff), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 5940fb73e72b..5660934d6199 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -29,6 +29,8 @@ typedef struct NvmeNamespaceParams { uint8_t zdes; uint16_t zoc; uint16_t ozcs; + uint32_t mar; + uint32_t mor; } zns; } NvmeNamespaceParams; @@ -63,6 +65,11 @@ typedef struct NvmeNamespace { uint64_t num_zones; NvmeZone *zones; } info; + + struct { + uint32_t open; + uint32_t active; + } resources; } zns; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6b394d374c8e..d5d521954cfc 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1187,6 +1187,155 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, nvme_req_add_aio(req, aio); } +/* + * nvme_zrm_transition validates zone state transitions under the constraint of + * the Number of Active and Open Resources (NAR and NOR) limits as reported by + * the Identify Namespace Data Structure. + * + * The function does NOT change the Zone Attribute field; this must be done by + * the caller. + */ +static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, + NvmeZoneState to) +{ + NvmeZoneState from = nvme_zs(zone); + + /* fast path */ + if (from == to) { + return NVME_SUCCESS; + } + + switch (from) { + case NVME_ZS_ZSE: + switch (to) { + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + case NVME_ZS_ZSF: + nvme_zs_set(zone, to); + return NVME_SUCCESS; + + case NVME_ZS_ZSC: + if (!ns->zns.resources.active) { + return NVME_TOO_MANY_ACTIVE_ZONES; + } + + ns->zns.resources.active--; + + nvme_zs_set(zone, to); + + return NVME_SUCCESS; + + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + if (!ns->zns.resources.active) { + return NVME_TOO_MANY_ACTIVE_ZONES; + } + + if (!ns->zns.resources.open) { + return NVME_TOO_MANY_OPEN_ZONES; + } + + ns->zns.resources.active--; + ns->zns.resources.open--; + + nvme_zs_set(zone, to); + + return NVME_SUCCESS; + + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + case NVME_ZS_ZSEO: + switch (to) { + case NVME_ZS_ZSIO: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + default: + break; + } + + /* fallthrough */ + + case NVME_ZS_ZSIO: + switch (to) { + case NVME_ZS_ZSEO: + nvme_zs_set(zone, to); + return NVME_SUCCESS; + + case NVME_ZS_ZSE: + case NVME_ZS_ZSF: + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + ns->zns.resources.active++; + + /* fallthrough */ + + case NVME_ZS_ZSC: + ns->zns.resources.open++; + + nvme_zs_set(zone, to); + + return NVME_SUCCESS; + + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + case NVME_ZS_ZSC: + switch (to) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSF: + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + ns->zns.resources.active++; + nvme_zs_set(zone, to); + + return NVME_SUCCESS; + + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: + if (!ns->zns.resources.open) { + return NVME_TOO_MANY_OPEN_ZONES; + } + + ns->zns.resources.open--; + + nvme_zs_set(zone, to); + + return NVME_SUCCESS; + + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + case NVME_ZS_ZSRO: + switch (to) { + case NVME_ZS_ZSO: + nvme_zs_set(zone, to); + return NVME_SUCCESS; + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + case NVME_ZS_ZSF: + switch (to) { + case NVME_ZS_ZSE: + case NVME_ZS_ZSRO: + case NVME_ZS_ZSO: + nvme_zs_set(zone, to); + return NVME_SUCCESS; + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } + + case NVME_ZS_ZSO: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } +} + static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret) { NvmeRequest *req = aio->req; @@ -1212,7 +1361,8 @@ static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb, wp += nlb; if (wp == zslba + nvme_zcap(zone)) { - nvme_zs_set(zone, NVME_ZS_ZSF); + /* if we cannot transition to ZFS something is horribly wrong */ + assert(nvme_zrm_transition(req->ns, zone, NVME_ZS_ZSF) == NVME_SUCCESS); } zd->wp = cpu_to_le64(wp); @@ -1280,7 +1430,8 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret) trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba); - nvme_zs_set(zone, NVME_ZS_ZSE); + /* if we cannot transition to ZSE something is horribly wrong */ + assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSE) == NVME_SUCCESS); NVME_ZA_CLEAR(zone->zd.za); zone->zd.wp = zone->zd.zslba; @@ -1360,7 +1511,7 @@ static void nvme_aio_cb(void *opaque, int ret) if (nvme_ns_zoned(ns)) { NvmeZone *zone = nvme_ns_get_zone(ns, req->slba); - nvme_zs_set(zone, NVME_ZS_ZSO); + assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO)); NVME_ZA_CLEAR(zone->zd.za); nvme_update_zone_info(ns, req, zone); @@ -1431,10 +1582,11 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) } static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, - NvmeZone *zone) + NvmeZone *zone) { NvmeAIO *aio; NvmeNamespace *ns = req->ns; + NvmeZoneState zs_orig = nvme_zs(zone); uint64_t zslba = nvme_zslba(zone); uint64_t wp = zone->wp_staging; @@ -1459,17 +1611,20 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, goto invalid; } - switch (nvme_zs(zone)) { - case NVME_ZS_ZSE: - case NVME_ZS_ZSC: - nvme_zs_set(zone, NVME_ZS_ZSIO); - default: + switch (zs_orig) { + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: break; + default: + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO); + if (status) { + goto invalid; + } } status = nvme_map(n, len, req); if (status) { - goto invalid; + goto zrm_revert; } aio = g_new0(NvmeAIO, 1); @@ -1496,6 +1651,10 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, return NVME_NO_COMPLETE; +zrm_revert: + /* if we cannot revert the transition something is horribly wrong */ + assert(nvme_zrm_transition(ns, zone, zs_orig) == NVME_SUCCESS); + invalid: block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE); return status; @@ -1532,91 +1691,66 @@ static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req, NvmeZone *zone) { NvmeNamespace *ns = req->ns; + NvmeZoneState zs = nvme_zs(zone); + uint16_t status; trace_pci_nvme_zone_mgmt_send_close(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - - switch (nvme_zs(zone)) { - case NVME_ZS_ZSIO: - case NVME_ZS_ZSEO: - nvme_zs_set(zone, NVME_ZS_ZSC); - - nvme_update_zone_info(ns, req, zone); - - return NVME_NO_COMPLETE; - - case NVME_ZS_ZSC: - return NVME_SUCCESS; - - default: - break; + /* + * The state machine in nvme_zrm_transition allows zones to transition fram + * ZSE to ZSC. That transition is only valid if done as part Set Zone + * Descriptor, so do an early check here. + */ + if (zs == NVME_ZS_ZSE) { + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } - trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), - nvme_zs(zone)); - return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC); + if (status) { + return status; + } + + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; } static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req, NvmeZone *zone) { NvmeNamespace *ns = req->ns; + uint16_t status; trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - - switch (nvme_zs(zone)) { - case NVME_ZS_ZSIO: - case NVME_ZS_ZSEO: - case NVME_ZS_ZSC: - case NVME_ZS_ZSE: - nvme_zs_set(zone, NVME_ZS_ZSF); - - nvme_update_zone_info(ns, req, zone); - - return NVME_NO_COMPLETE; - - case NVME_ZS_ZSF: - return NVME_SUCCESS; - - default: - break; + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSF); + if (status) { + return status; } - trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), - nvme_zs(zone)); - return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + nvme_update_zone_info(ns, req, zone); + return NVME_NO_COMPLETE; } static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req, NvmeZone *zone) { NvmeNamespace *ns = req->ns; + uint16_t status; trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - switch (nvme_zs(zone)) { - case NVME_ZS_ZSE: - case NVME_ZS_ZSC: - case NVME_ZS_ZSIO: - nvme_zs_set(zone, NVME_ZS_ZSEO); - - nvme_update_zone_info(ns, req, zone); - return NVME_NO_COMPLETE; - - case NVME_ZS_ZSEO: - return NVME_SUCCESS; - - default: - break; + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSEO); + if (status) { + return status; } - trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), - nvme_zs(zone)); - return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + nvme_update_zone_info(ns, req, zone); + + return NVME_NO_COMPLETE; } static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, @@ -1624,6 +1758,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, { NvmeAIO *aio; NvmeNamespace *ns = req->ns; + NvmeZoneState zs = nvme_zs(zone); uint64_t zslba = nvme_zslba(zone); uint64_t zcap = nvme_zcap(zone); uint8_t lbads = nvme_ns_lbads(ns); @@ -1631,7 +1766,10 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, trace_pci_nvme_zone_mgmt_send_reset(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - switch (nvme_zs(zone)) { + switch (zs) { + case NVME_ZS_ZSE: + return NVME_SUCCESS; + case NVME_ZS_ZSIO: case NVME_ZS_ZSEO: case NVME_ZS_ZSC: @@ -1653,18 +1791,13 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, return NVME_NO_COMPLETE; - case NVME_ZS_ZSE: - return NVME_SUCCESS; - case NVME_ZS_ZSRO: - nvme_zs_set(zone, NVME_ZS_ZSO); - + assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSO) == NVME_SUCCESS); nvme_update_zone_info(ns, req, zone); - return NVME_NO_COMPLETE; - default: - break; + case NVME_ZS_ZSO: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } trace_pci_nvme_err_invalid_zone_condition(nvme_cid(req), nvme_zslba(zone), @@ -1682,14 +1815,10 @@ static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req, switch (nvme_zs(zone)) { case NVME_ZS_ZSRO: - nvme_zs_set(zone, NVME_ZS_ZSO); - + assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO)); nvme_update_zone_info(ns, req, zone); return NVME_NO_COMPLETE; - case NVME_ZS_ZSO: - return NVME_SUCCESS; - default: break; } @@ -1715,11 +1844,15 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req, return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } - nvme_zs_set(zone, NVME_ZS_ZSEO); + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC); + if (status) { + return status; + } status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns), DMA_DIRECTION_TO_DEVICE, req); if (status) { + assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSE)); return status; } @@ -2024,11 +2157,14 @@ static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req) if (nvme_req_is_write(req)) { switch (nvme_zs(zone)) { - case NVME_ZS_ZSE: - case NVME_ZS_ZSC: - nvme_zs_set(zone, NVME_ZS_ZSIO); - default: + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: break; + default: + status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO); + if (status) { + return status; + } } cb = nvme_aio_zone_write_cb; From patchwork Tue Jun 30 10:01:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0D3F6C1 for ; Tue, 30 Jun 2020 10:07:09 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7217A2073E for ; Tue, 30 Jun 2020 10:07:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7217A2073E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:45180 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqDAS-00088x-LP for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:07:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58000) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5f-0007oD-GQ; Tue, 30 Jun 2020 06:02:13 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47644) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5U-0004LW-Sy; Tue, 30 Jun 2020 06:02:09 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 110A3BF676; Tue, 30 Jun 2020 10:01:56 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 08/10] hw/block/nvme: allow open to close transitions by controller Date: Tue, 30 Jun 2020 12:01:37 +0200 Message-Id: <20200630100139.1483002-9-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Allow the controller to release open resources by transitioning implicitly and explicitly opened zones to closed. This is done using a naive "least recently opened" strategy. Some workloads may behave very badly with this, but for the purpose of testing how software deals with this it is acceptable for now. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 3 + hw/block/nvme-ns.h | 5 ++ hw/block/nvme.c | 176 +++++++++++++++++++++++++++++++----------- hw/block/nvme.h | 5 ++ hw/block/trace-events | 5 ++ 5 files changed, 147 insertions(+), 47 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 5a55a0191f55..3b9fa91c7af8 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -269,6 +269,9 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) ns->params.zns.mar + 1 : ns->zns.info.num_zones; ns->zns.resources.open = ns->params.zns.mor != 0xffffffff ? ns->params.zns.mor + 1 : ns->zns.info.num_zones; + + QTAILQ_INIT(&ns->zns.resources.lru_open); + QTAILQ_INIT(&ns->zns.resources.lru_active); } static void nvme_ns_init(NvmeNamespace *ns) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 5660934d6199..6d3a6dc07cd8 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -39,6 +39,8 @@ typedef struct NvmeZone { uint8_t *zde; uint64_t wp_staging; + + QTAILQ_ENTRY(NvmeZone) lru_entry; } NvmeZone; typedef struct NvmeNamespace { @@ -69,6 +71,9 @@ typedef struct NvmeNamespace { struct { uint32_t open; uint32_t active; + + QTAILQ_HEAD(, NvmeZone) lru_open; + QTAILQ_HEAD(, NvmeZone) lru_active; } resources; } zns; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d5d521954cfc..f7b4618bc805 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1187,6 +1187,41 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, nvme_req_add_aio(req, aio); } +static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, + NvmeZone *zone, NvmeZoneState to, + NvmeRequest *req); + +static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns, + NvmeRequest *req) +{ + NvmeZone *candidate; + NvmeZoneState zs; + + trace_pci_nvme_zone_zrm_release_open(nvme_cid(req), ns->params.nsid); + + QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_open, lru_entry) { + zs = nvme_zs(candidate); + + trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid, + nvme_zslba(candidate), + nvme_wp(candidate), zs); + + /* skip explicitly opened zones */ + if (zs == NVME_ZS_ZSEO) { + continue; + } + + /* the zone cannot be closed if it is currently writing */ + if (candidate->wp_staging != nvme_wp(candidate)) { + continue; + } + + return nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req); + } + + return NVME_TOO_MANY_OPEN_ZONES; +} + /* * nvme_zrm_transition validates zone state transitions under the constraint of * the Number of Active and Open Resources (NAR and NOR) limits as reported by @@ -1195,52 +1230,59 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, * The function does NOT change the Zone Attribute field; this must be done by * the caller. */ -static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, - NvmeZoneState to) +static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, + NvmeZone *zone, NvmeZoneState to, + NvmeRequest *req) { NvmeZoneState from = nvme_zs(zone); + uint16_t status; - /* fast path */ - if (from == to) { - return NVME_SUCCESS; - } + trace_pci_nvme_zone_zrm_transition(nvme_cid(req), ns->params.nsid, + nvme_zslba(zone), nvme_zs(zone), to); switch (from) { case NVME_ZS_ZSE: switch (to) { + case NVME_ZS_ZSE: + return NVME_SUCCESS; + case NVME_ZS_ZSRO: case NVME_ZS_ZSO: case NVME_ZS_ZSF: - nvme_zs_set(zone, to); - return NVME_SUCCESS; + goto out; case NVME_ZS_ZSC: if (!ns->zns.resources.active) { + trace_pci_nvme_err_too_many_active_zones(nvme_cid(req)); return NVME_TOO_MANY_ACTIVE_ZONES; } ns->zns.resources.active--; - nvme_zs_set(zone, to); + QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry); - return NVME_SUCCESS; + goto out; case NVME_ZS_ZSIO: case NVME_ZS_ZSEO: if (!ns->zns.resources.active) { + trace_pci_nvme_err_too_many_active_zones(nvme_cid(req)); return NVME_TOO_MANY_ACTIVE_ZONES; } if (!ns->zns.resources.open) { - return NVME_TOO_MANY_OPEN_ZONES; + status = nvme_zrm_release_open(n, ns, req); + if (status) { + return status; + } } ns->zns.resources.active--; ns->zns.resources.open--; - nvme_zs_set(zone, to); + QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry); - return NVME_SUCCESS; + goto out; default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; @@ -1248,6 +1290,9 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, case NVME_ZS_ZSEO: switch (to) { + case NVME_ZS_ZSEO: + return NVME_SUCCESS; + case NVME_ZS_ZSIO: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; default: @@ -1258,24 +1303,30 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, case NVME_ZS_ZSIO: switch (to) { - case NVME_ZS_ZSEO: - nvme_zs_set(zone, to); + case NVME_ZS_ZSIO: return NVME_SUCCESS; + case NVME_ZS_ZSEO: + goto out; + case NVME_ZS_ZSE: case NVME_ZS_ZSF: case NVME_ZS_ZSRO: case NVME_ZS_ZSO: ns->zns.resources.active++; + ns->zns.resources.open++; - /* fallthrough */ + QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry); + + goto out; case NVME_ZS_ZSC: ns->zns.resources.open++; - nvme_zs_set(zone, to); + QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry); + QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry); - return NVME_SUCCESS; + goto out; default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; @@ -1283,26 +1334,33 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, case NVME_ZS_ZSC: switch (to) { + case NVME_ZS_ZSC: + return NVME_SUCCESS; + case NVME_ZS_ZSE: case NVME_ZS_ZSF: case NVME_ZS_ZSRO: case NVME_ZS_ZSO: ns->zns.resources.active++; - nvme_zs_set(zone, to); - return NVME_SUCCESS; + QTAILQ_REMOVE(&ns->zns.resources.lru_active, zone, lru_entry); + + goto out; case NVME_ZS_ZSIO: case NVME_ZS_ZSEO: if (!ns->zns.resources.open) { - return NVME_TOO_MANY_OPEN_ZONES; + status = nvme_zrm_release_open(n, ns, req); + if (status) { + return status; + } } ns->zns.resources.open--; + QTAILQ_REMOVE(&ns->zns.resources.lru_active, zone, lru_entry); + QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry); - nvme_zs_set(zone, to); - - return NVME_SUCCESS; + goto out; default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; @@ -1310,30 +1368,46 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone, case NVME_ZS_ZSRO: switch (to) { - case NVME_ZS_ZSO: - nvme_zs_set(zone, to); + case NVME_ZS_ZSRO: return NVME_SUCCESS; + + case NVME_ZS_ZSO: + goto out; + default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } case NVME_ZS_ZSF: switch (to) { + case NVME_ZS_ZSF: + return NVME_SUCCESS; + case NVME_ZS_ZSE: case NVME_ZS_ZSRO: case NVME_ZS_ZSO: - nvme_zs_set(zone, to); - return NVME_SUCCESS; + goto out; + default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } case NVME_ZS_ZSO: - return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + switch (to) { + case NVME_ZS_ZSO: + return NVME_SUCCESS; + + default: + return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; + } default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } + +out: + nvme_zs_set(zone, to); + return NVME_SUCCESS; } static void nvme_aio_write_cb(NvmeAIO *aio, void *opaque, int ret) @@ -1361,8 +1435,11 @@ static void nvme_zone_advance_wp(NvmeZone *zone, uint32_t nlb, wp += nlb; if (wp == zslba + nvme_zcap(zone)) { - /* if we cannot transition to ZFS something is horribly wrong */ - assert(nvme_zrm_transition(req->ns, zone, NVME_ZS_ZSF) == NVME_SUCCESS); + NvmeCtrl *n = nvme_ctrl(req); + + /* if we cannot transition to ZSF something is horribly wrong */ + assert(nvme_zrm_transition(n, req->ns, zone, NVME_ZS_ZSF, req) == + NVME_SUCCESS); } zd->wp = cpu_to_le64(wp); @@ -1418,6 +1495,7 @@ static void nvme_zone_mgmt_send_reset_cb(NvmeRequest *req, void *opaque) static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret) { NvmeRequest *req = aio->req; + NvmeCtrl *n = nvme_ctrl(req); NvmeZone *zone = opaque; NvmeNamespace *ns = req->ns; @@ -1431,7 +1509,7 @@ static void nvme_aio_zone_reset_cb(NvmeAIO *aio, void *opaque, int ret) trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), ns->params.nsid, zslba); /* if we cannot transition to ZSE something is horribly wrong */ - assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSE) == NVME_SUCCESS); + assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSE, req) == NVME_SUCCESS); NVME_ZA_CLEAR(zone->zd.za); zone->zd.wp = zone->zd.zslba; @@ -1476,6 +1554,7 @@ static void nvme_aio_cb(void *opaque, int ret) if (req) { NvmeNamespace *ns = req->ns; + NvmeCtrl *n = nvme_ctrl(req); uint16_t status; switch (aio->opc) { @@ -1511,7 +1590,7 @@ static void nvme_aio_cb(void *opaque, int ret) if (nvme_ns_zoned(ns)) { NvmeZone *zone = nvme_ns_get_zone(ns, req->slba); - assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO)); + assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req)); NVME_ZA_CLEAR(zone->zd.za); nvme_update_zone_info(ns, req, zone); @@ -1616,7 +1695,7 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, case NVME_ZS_ZSEO: break; default: - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req); if (status) { goto invalid; } @@ -1653,7 +1732,7 @@ static uint16_t nvme_do_zone_append(NvmeCtrl *n, NvmeRequest *req, zrm_revert: /* if we cannot revert the transition something is horribly wrong */ - assert(nvme_zrm_transition(ns, zone, zs_orig) == NVME_SUCCESS); + assert(nvme_zrm_transition(n, ns, zone, zs_orig, req) == NVME_SUCCESS); invalid: block_acct_invalid(blk_get_stats(ns->blk), BLOCK_ACCT_WRITE); @@ -1706,7 +1785,7 @@ static uint16_t nvme_zone_mgmt_send_close(NvmeCtrl *n, NvmeRequest *req, return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSC, req); if (status) { return status; } @@ -1725,7 +1804,7 @@ static uint16_t nvme_zone_mgmt_send_finish(NvmeCtrl *n, NvmeRequest *req, trace_pci_nvme_zone_mgmt_send_finish(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSF); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req); if (status) { return status; } @@ -1743,7 +1822,7 @@ static uint16_t nvme_zone_mgmt_send_open(NvmeCtrl *n, NvmeRequest *req, trace_pci_nvme_zone_mgmt_send_open(nvme_cid(req), nvme_nsid(ns), nvme_zslba(zone), nvme_zs_str(zone)); - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSEO); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSEO, req); if (status) { return status; } @@ -1792,7 +1871,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, return NVME_NO_COMPLETE; case NVME_ZS_ZSRO: - assert(nvme_zrm_transition(ns, zone, NVME_ZS_ZSO) == NVME_SUCCESS); + assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req)); nvme_update_zone_info(ns, req, zone); return NVME_NO_COMPLETE; @@ -1815,7 +1894,7 @@ static uint16_t nvme_zone_mgmt_send_offline(NvmeCtrl *n, NvmeRequest *req, switch (nvme_zs(zone)) { case NVME_ZS_ZSRO: - assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSO)); + assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req)); nvme_update_zone_info(ns, req, zone); return NVME_NO_COMPLETE; @@ -1844,7 +1923,7 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req, return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSC); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSC, req); if (status) { return status; } @@ -1852,7 +1931,7 @@ static uint16_t nvme_zone_mgmt_send_set_zde(NvmeCtrl *n, NvmeRequest *req, status = nvme_dma(n, zone->zde, nvme_ns_zdes_bytes(ns), DMA_DIRECTION_TO_DEVICE, req); if (status) { - assert(!nvme_zrm_transition(ns, zone, NVME_ZS_ZSE)); + assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSE, req)); return status; } @@ -2072,11 +2151,14 @@ static uint16_t nvme_do_write_zeroes(NvmeCtrl *n, NvmeRequest *req) } switch (nvme_zs(zone)) { - case NVME_ZS_ZSE: - case NVME_ZS_ZSC: - nvme_zs_set(zone, NVME_ZS_ZSIO); - default: + case NVME_ZS_ZSIO: + case NVME_ZS_ZSEO: break; + default: + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req); + if (status) { + return status; + } } cb = nvme_aio_zone_write_cb; @@ -2161,7 +2243,7 @@ static uint16_t nvme_do_rw(NvmeCtrl *n, NvmeRequest *req) case NVME_ZS_ZSEO: break; default: - status = nvme_zrm_transition(ns, zone, NVME_ZS_ZSIO); + status = nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSIO, req); if (status) { return status; } diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 6b4eb0098450..309fb1b94ecb 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -312,6 +312,11 @@ static inline uint16_t nvme_sqid(NvmeRequest *req) return le16_to_cpu(req->sq->sqid); } +static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req) +{ + return req->sq->ctrl; +} + int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); #endif /* HW_NVME_H */ diff --git a/hw/block/trace-events b/hw/block/trace-events index 0dfc6e22008e..4b4f2ed7605f 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -98,6 +98,9 @@ pci_nvme_ns_update_util(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu3 pci_nvme_zone_pending_writes(uint16_t cid, uint64_t zslba, uint64_t wp, uint64_t wp_staging) "cid %"PRIu16" zslba 0x%"PRIx64" wp 0x%"PRIx64" wp_staging 0x%"PRIx64"" pci_nvme_update_zone_info(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64"" +pci_nvme_zone_zrm_transition(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t from, uint8_t to) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" from 0x%"PRIx8" to 0x%"PRIx8"" +pci_nvme_zone_zrm_candidate(uint16_t cid, uint32_t nsid, uint64_t zslba, uint64_t wp, uint8_t zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" wp 0x%"PRIx64" zc 0x%"PRIx8"" +pci_nvme_zone_zrm_release_open(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64"" @@ -121,6 +124,8 @@ pci_nvme_err_zone_is_full(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PR pci_nvme_err_zone_is_read_only(uint16_t cid, uint64_t slba) "cid %"PRIu16" lba 0x%"PRIx64"" pci_nvme_err_zone_invalid_write(uint16_t cid, uint64_t slba, uint64_t wp) "cid %"PRIu16" lba 0x%"PRIx64" wp 0x%"PRIx64"" pci_nvme_err_zone_boundary(uint16_t cid, uint64_t slba, uint32_t nlb, uint64_t zcap) "cid %"PRIu16" lba 0x%"PRIx64" nlb %"PRIu32" zcap 0x%"PRIx64"" +pci_nvme_err_too_many_active_zones(uint16_t cid) "cid %"PRIu16"" +pci_nvme_err_too_many_open_zones(uint16_t cid) "cid %"PRIu16"" pci_nvme_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8"" pci_nvme_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8"" pci_nvme_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16"" From patchwork Tue Jun 30 10:01:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B00C96C1 for ; Tue, 30 Jun 2020 10:08:46 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7F9212073E for ; Tue, 30 Jun 2020 10:08:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F9212073E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:53546 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqDC1-0003Hg-Na for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:08:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58028) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5i-0007r1-7R; Tue, 30 Jun 2020 06:02:14 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47648) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5a-0004MA-5f; Tue, 30 Jun 2020 06:02:13 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 934B7BF7FB; Tue, 30 Jun 2020 10:01:57 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 09/10] hw/block/nvme: allow zone excursions Date: Tue, 30 Jun 2020 12:01:38 +0200 Message-Id: <20200630100139.1483002-10-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Allow the controller to release active resources by transitioning zones to the full state. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.h | 2 + hw/block/nvme.c | 171 ++++++++++++++++++++++++++++++++++++++---- hw/block/trace-events | 4 + include/block/nvme.h | 10 +++ 4 files changed, 174 insertions(+), 13 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 6d3a6dc07cd8..6acda5c2cf3f 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -75,6 +75,8 @@ typedef struct NvmeNamespace { QTAILQ_HEAD(, NvmeZone) lru_open; QTAILQ_HEAD(, NvmeZone) lru_active; } resources; + + NvmeChangedZoneList changed_list; } zns; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index f7b4618bc805..6db6daa62bc5 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -859,10 +859,11 @@ static void nvme_process_aers(void *opaque) req = n->aer_reqs[n->outstanding_aers]; - result = (NvmeAerResult *) &req->cqe.dw0; + result = (NvmeAerResult *) &req->cqe.qw0; result->event_type = event->result.event_type; result->event_info = event->result.event_info; result->log_page = event->result.log_page; + result->nsid = event->result.nsid; g_free(event); req->status = NVME_SUCCESS; @@ -874,8 +875,9 @@ static void nvme_process_aers(void *opaque) } } -static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type, - uint8_t event_info, uint8_t log_page) +static void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, + uint8_t event_type, uint8_t event_info, + uint8_t log_page) { NvmeAsyncEvent *event; @@ -893,6 +895,11 @@ static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type, .log_page = log_page, }; + if (event_info == NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED) { + assert(ns); + event->result.nsid = ns->params.nsid; + } + QTAILQ_INSERT_TAIL(&n->aer_queue, event, entry); n->aer_queued++; @@ -1187,15 +1194,50 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, nvme_req_add_aio(req, aio); } +static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone) +{ + uint16_t num_ids = le16_to_cpu(ns->zns.changed_list.num_ids); + + trace_pci_nvme_zone_changed(ns->params.nsid, nvme_zslba(zone)); + + if (num_ids < NVME_CHANGED_ZONE_LIST_MAX_IDS) { + ns->zns.changed_list.ids[num_ids] = zone->zd.zslba; + ns->zns.changed_list.num_ids = cpu_to_le16(num_ids + 1); + } else { + memset(&ns->zns.changed_list, 0x0, sizeof(NvmeChangedZoneList)); + ns->zns.changed_list.num_ids = cpu_to_le16(0xffff); + } + + nvme_enqueue_event(n, ns, NVME_AER_TYPE_NOTICE, + NVME_AER_INFO_NOTICE_ZONE_DESCR_CHANGED, + NVME_LOG_CHANGED_ZONE_LIST); +} + static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, NvmeZoneState to, NvmeRequest *req); +static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, + NvmeRequest *req) +{ + trace_pci_nvme_zone_excursion(ns->params.nsid, nvme_zslba(zone), + nvme_zs_str(zone)); + + assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req) == NVME_SUCCESS); + + NVME_ZA_SET_ZFC(zone->zd.za, 0x1); + + nvme_zone_changed(n, ns, zone); + + nvme_update_zone_info(ns, req, zone); +} + static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns, NvmeRequest *req) { NvmeZone *candidate; NvmeZoneState zs; + uint16_t status; trace_pci_nvme_zone_zrm_release_open(nvme_cid(req), ns->params.nsid); @@ -1216,12 +1258,73 @@ static uint16_t nvme_zrm_release_open(NvmeCtrl *n, NvmeNamespace *ns, continue; } - return nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req); + status = nvme_zrm_transition(n, ns, candidate, NVME_ZS_ZSC, req); + if (status) { + return status; + } + + nvme_update_zone_info(ns, req, candidate); + return NVME_SUCCESS; } return NVME_TOO_MANY_OPEN_ZONES; } +static uint16_t nvme_zrm_release_active(NvmeCtrl *n, NvmeNamespace *ns, + NvmeRequest *req) +{ + NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns); + NvmeZone *candidate = NULL; + NvmeZoneDescriptor *zd; + NvmeZoneState zs; + + trace_pci_nvme_zone_zrm_release_active(nvme_cid(req), ns->params.nsid); + + /* bail out if Zone Active Excursions are not permitted */ + if (!(le16_to_cpu(id_ns_zns->zoc) & NVME_ID_NS_ZNS_ZOC_ZAE)) { + trace_pci_nvme_zone_zrm_excursion_not_allowed(nvme_cid(req), + ns->params.nsid); + return NVME_TOO_MANY_ACTIVE_ZONES; + } + + QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_active, lru_entry) { + zd = &candidate->zd; + zs = nvme_zs(candidate); + + trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid, + nvme_zslba(candidate), + nvme_wp(candidate), zs); + + goto out; + } + + /* + * If all zone resources are tied up on open zones we have to transition + * one of those to full. + */ + QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_open, lru_entry) { + zd = &candidate->zd; + zs = nvme_zs(candidate); + + trace_pci_nvme_zone_zrm_candidate(nvme_cid(req), ns->params.nsid, + nvme_zslba(candidate), + nvme_wp(candidate), zs); + + /* the zone cannot be finished if it is currently writing */ + if (candidate->wp_staging != le64_to_cpu(zd->wp)) { + continue; + } + + break; + } + + assert(candidate); + +out: + nvme_zone_excursion(n, ns, candidate, req); + return NVME_SUCCESS; +} + /* * nvme_zrm_transition validates zone state transitions under the constraint of * the Number of Active and Open Resources (NAR and NOR) limits as reported by @@ -1253,8 +1356,10 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, case NVME_ZS_ZSC: if (!ns->zns.resources.active) { - trace_pci_nvme_err_too_many_active_zones(nvme_cid(req)); - return NVME_TOO_MANY_ACTIVE_ZONES; + status = nvme_zrm_release_active(n, ns, req); + if (status) { + return status; + } } ns->zns.resources.active--; @@ -1266,8 +1371,10 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, case NVME_ZS_ZSIO: case NVME_ZS_ZSEO: if (!ns->zns.resources.active) { - trace_pci_nvme_err_too_many_active_zones(nvme_cid(req)); - return NVME_TOO_MANY_ACTIVE_ZONES; + status = nvme_zrm_release_active(n, ns, req); + if (status) { + return status; + } } if (!ns->zns.resources.open) { @@ -2716,6 +2823,41 @@ static uint16_t nvme_effects_log(NvmeCtrl *n, uint32_t buf_len, uint64_t off, DMA_DIRECTION_FROM_DEVICE, req); } +static uint16_t nvme_changed_zone_info(NvmeCtrl *n, uint32_t buf_len, + uint64_t off, NvmeRequest *req) +{ + uint32_t nsid = le32_to_cpu(req->cmd.nsid); + NvmeNamespace *ns = nvme_ns(n, nsid); + uint32_t trans_len; + uint16_t status; + + if (unlikely(!ns)) { + return NVME_INVALID_NSID | NVME_DNR; + } + + if (!nvme_ns_zoned(ns)) { + return NVME_INVALID_LOG_ID | NVME_DNR; + } + + if (off > 4096) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + trans_len = MIN(4096 - off, buf_len); + + status = nvme_dma(n, (uint8_t *) &ns->zns.changed_list + off, trans_len, + DMA_DIRECTION_FROM_DEVICE, req); + if (status) { + return status; + } + + memset(&ns->zns.changed_list, 0x0, sizeof(NvmeChangedZoneList)); + + nvme_clear_events(n, NVME_AER_TYPE_NOTICE); + + return NVME_SUCCESS; +} + static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd = &req->cmd; @@ -2761,6 +2903,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) return nvme_fw_log_info(n, len, off, req); case NVME_LOG_EFFECTS: return nvme_effects_log(n, len, off, req); + case NVME_LOG_CHANGED_ZONE_LIST: + return nvme_changed_zone_info(n, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -3359,7 +3503,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) if (((n->temperature >= n->features.temp_thresh_hi) || (n->temperature <= n->features.temp_thresh_low)) && NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) { - nvme_enqueue_event(n, NVME_AER_TYPE_SMART, + nvme_enqueue_event(n, NULL, NVME_AER_TYPE_SMART, NVME_AER_INFO_SMART_TEMP_THRESH, NVME_LOG_SMART_INFO); } @@ -3924,7 +4068,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) " sqid=%"PRIu32", ignoring", qid); if (n->outstanding_aers) { - nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR, NVME_AER_INFO_ERR_INVALID_DB_REGISTER, NVME_LOG_ERROR_INFO); } @@ -3941,7 +4085,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) qid, new_head); if (n->outstanding_aers) { - nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR, NVME_AER_INFO_ERR_INVALID_DB_VALUE, NVME_LOG_ERROR_INFO); } @@ -3978,7 +4122,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) " sqid=%"PRIu32", ignoring", qid); if (n->outstanding_aers) { - nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR, NVME_AER_INFO_ERR_INVALID_DB_REGISTER, NVME_LOG_ERROR_INFO); } @@ -3995,7 +4139,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) qid, new_tail); if (n->outstanding_aers) { - nvme_enqueue_event(n, NVME_AER_TYPE_ERROR, + nvme_enqueue_event(n, NULL, NVME_AER_TYPE_ERROR, NVME_AER_INFO_ERR_INVALID_DB_VALUE, NVME_LOG_ERROR_INFO); } @@ -4286,6 +4430,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) id->mdts = n->params.mdts; id->ver = cpu_to_le32(NVME_SPEC_VER); id->cntrltype = 0x1; + id->oaes = cpu_to_le32(NVME_OAES_ZDCN); id->oacs = cpu_to_le16(0); /* diff --git a/hw/block/trace-events b/hw/block/trace-events index 4b4f2ed7605f..c4c80644f782 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -101,6 +101,10 @@ pci_nvme_update_zone_descr(uint16_t cid, uint32_t nsid, uint64_t zslba) "cid %"P pci_nvme_zone_zrm_transition(uint16_t cid, uint32_t nsid, uint64_t zslba, uint8_t from, uint8_t to) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" from 0x%"PRIx8" to 0x%"PRIx8"" pci_nvme_zone_zrm_candidate(uint16_t cid, uint32_t nsid, uint64_t zslba, uint64_t wp, uint8_t zc) "cid %"PRIu16" nsid %"PRIu32" zslba 0x%"PRIx64" wp 0x%"PRIx64" zc 0x%"PRIx8"" pci_nvme_zone_zrm_release_open(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" +pci_nvme_zone_zrm_release_active(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" +pci_nvme_zone_zrm_excursion_not_allowed(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" +pci_nvme_zone_changed(uint32_t nsid, uint64_t zslba) "nsid %"PRIu32" zslba 0x%"PRIx64"" +pci_nvme_zone_excursion(uint32_t nsid, uint64_t zslba, const char *zc) "nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 68dac2582b06..688ee5496168 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -778,6 +778,7 @@ typedef struct NvmeDsmRange { enum NvmeAsyncEventRequest { NVME_AER_TYPE_ERROR = 0, NVME_AER_TYPE_SMART = 1, + NVME_AER_TYPE_NOTICE = 2, NVME_AER_TYPE_IO_SPECIFIC = 6, NVME_AER_TYPE_VENDOR_SPECIFIC = 7, NVME_AER_INFO_ERR_INVALID_DB_REGISTER = 0, @@ -993,6 +994,14 @@ typedef struct NvmeZoneDescriptor { #define NVME_ZS(zs) (((zs) >> 4) & 0xf) #define NVME_ZS_SET(zs, state) ((zs) = ((state) << 4)) +#define NVME_CHANGED_ZONE_LIST_MAX_IDS 511 + +typedef struct NvmeChangedZoneList { + uint16_t num_ids; + uint8_t rsvd2[6]; + uint64_t ids[NVME_CHANGED_ZONE_LIST_MAX_IDS]; +} NvmeChangedZoneList; + #define NVME_ZA_ZFC(za) ((za) & (1 << 0)) #define NVME_ZA_FZR(za) ((za) & (1 << 1)) #define NVME_ZA_RZR(za) ((za) & (1 << 2)) @@ -1428,5 +1437,6 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescriptor) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16); + QEMU_BUILD_BUG_ON(sizeof(NvmeChangedZoneList) != 4096); } #endif From patchwork Tue Jun 30 10:01:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Klaus Jensen X-Patchwork-Id: 11633623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A72E9739 for ; Tue, 30 Jun 2020 10:10:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 77F852073E for ; Tue, 30 Jun 2020 10:10:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77F852073E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=irrelevant.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:34246 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jqDDd-00078q-Jz for patchwork-qemu-devel@patchwork.kernel.org; Tue, 30 Jun 2020 06:10:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58038) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5i-0007sd-Sa; Tue, 30 Jun 2020 06:02:14 -0400 Received: from charlie.dont.surf ([128.199.63.193]:47650) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jqD5a-0004M9-4z; Tue, 30 Jun 2020 06:02:14 -0400 Received: from apples.local (80-167-98-190-cable.dk.customer.tdc.net [80.167.98.190]) by charlie.dont.surf (Postfix) with ESMTPSA id 1B14DBF803; Tue, 30 Jun 2020 10:01:58 +0000 (UTC) From: Klaus Jensen To: qemu-block@nongnu.org Subject: [PATCH 10/10] hw/block/nvme: support reset/finish recommended limits Date: Tue, 30 Jun 2020 12:01:39 +0200 Message-Id: <20200630100139.1483002-11-its@irrelevant.dk> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630100139.1483002-1-its@irrelevant.dk> References: <20200630100139.1483002-1-its@irrelevant.dk> MIME-Version: 1.0 Received-SPF: pass client-ip=128.199.63.193; envelope-from=its@irrelevant.dk; helo=charlie.dont.surf X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/30 04:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Niklas Cassel , Damien Le Moal , Dmitry Fomichev , Klaus Jensen , qemu-devel@nongnu.org, Max Reitz , Klaus Jensen , Keith Busch , Javier Gonzalez , Maxim Levitsky , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Matias Bjorling Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Add the rrl and frl device parameters. The parameters specify the number of seconds before the device may perform an internal operation to "clear" the Reset Zone Recommended and Finish Zone Recommended attributes respectively. When the attibutes are set is governed by the rrld and frld parameters (Reset/Finish Recomended Limit Delay). The Reset Zone Recommended Delay starts when a zone becomes full. The Finish Zone Recommended Delay starts when the zone is first activated. When the limits are reached, the attributes are cleared again and the process is restarted. If zone excursions are enabled (they are by default), when the Finish Recommended Limit is reached, the device will finish the zone. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.c | 105 ++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme-ns.h | 13 ++++++ hw/block/nvme.c | 49 +++++++++++++------- hw/block/nvme.h | 7 +++ hw/block/trace-events | 3 +- 5 files changed, 160 insertions(+), 17 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 3b9fa91c7af8..7f9b1d526197 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -25,6 +25,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-core.h" +#include "trace.h" #include "nvme.h" #include "nvme-ns.h" @@ -48,6 +49,91 @@ const char *nvme_zs_to_str(NvmeZoneState zs) return NULL; } +static void nvme_ns_process_timer(void *opaque) +{ + NvmeNamespace *ns = opaque; + BusState *s = qdev_get_parent_bus(&ns->parent_obj); + NvmeCtrl *n = NVME(s->parent); + NvmeZone *zone; + + trace_pci_nvme_ns_process_timer(ns->params.nsid); + + int64_t next_timer = INT64_MAX, now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + + QTAILQ_FOREACH(zone, &ns->zns.resources.lru_open, lru_entry) { + int64_t activated_ns = now - zone->stats.activated_ns; + if (activated_ns < ns->zns.frld_ns) { + next_timer = MIN(next_timer, zone->stats.activated_ns + + ns->zns.frld_ns); + + break; + } + + if (activated_ns < ns->zns.frld_ns + ns->zns.frl_ns) { + NVME_ZA_SET_FZR(zone->zd.za, 0x1); + nvme_zone_changed(n, ns, zone); + + next_timer = MIN(next_timer, now + ns->zns.frl_ns); + + continue; + } + + if (zone->wp_staging != le64_to_cpu(zone->zd.wp)) { + next_timer = now + 500; + continue; + } + + nvme_zone_excursion(n, ns, zone, NULL); + } + + QTAILQ_FOREACH(zone, &ns->zns.resources.lru_active, lru_entry) { + int64_t activated_ns = now - zone->stats.activated_ns; + if (activated_ns < ns->zns.frld_ns) { + next_timer = MIN(next_timer, zone->stats.activated_ns + + ns->zns.frld_ns); + + break; + } + + if (activated_ns < ns->zns.frld_ns + ns->zns.frl_ns) { + NVME_ZA_SET_FZR(zone->zd.za, 0x1); + nvme_zone_changed(n, ns, zone); + + next_timer = MIN(next_timer, now + ns->zns.frl_ns); + + continue; + } + + nvme_zone_excursion(n, ns, zone, NULL); + } + + QTAILQ_FOREACH(zone, &ns->zns.lru_finished, lru_entry) { + int64_t finished_ns = now - zone->stats.finished_ns; + if (finished_ns < ns->zns.rrld_ns) { + next_timer = MIN(next_timer, zone->stats.finished_ns + + ns->zns.rrld_ns); + + break; + } + + if (finished_ns < ns->zns.rrld_ns + ns->zns.rrl_ns) { + NVME_ZA_SET_RZR(zone->zd.za, 0x1); + nvme_zone_changed(n, ns, zone); + + next_timer = MIN(next_timer, now + ns->zns.rrl_ns); + + nvme_zone_changed(n, ns, zone); + continue; + } + + NVME_ZA_SET_RZR(zone->zd.za, 0x0); + } + + if (next_timer != INT64_MAX) { + timer_mod(ns->zns.timer, next_timer); + } +} + static int nvme_ns_blk_resize(BlockBackend *blk, size_t len, Error **errp) { Error *local_err = NULL; @@ -262,6 +348,21 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns) id_ns->ncap = ns->zns.info.num_zones * ns->params.zns.zcap; + id_ns_zns->rrl = ns->params.zns.rrl; + id_ns_zns->frl = ns->params.zns.frl; + + if (ns->params.zns.rrl || ns->params.zns.frl) { + ns->zns.rrl_ns = ns->params.zns.rrl * NANOSECONDS_PER_SECOND; + ns->zns.rrld_ns = ns->params.zns.rrld * NANOSECONDS_PER_SECOND; + ns->zns.frl_ns = ns->params.zns.frl * NANOSECONDS_PER_SECOND; + ns->zns.frld_ns = ns->params.zns.frld * NANOSECONDS_PER_SECOND; + + ns->zns.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, + nvme_ns_process_timer, ns); + + QTAILQ_INIT(&ns->zns.lru_finished); + } + id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar); id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor); @@ -515,6 +616,10 @@ static Property nvme_ns_props[] = { DEFINE_PROP_UINT16("zns.ozcs", NvmeNamespace, params.zns.ozcs, 0), DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0xffffffff), DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0xffffffff), + DEFINE_PROP_UINT32("zns.rrl", NvmeNamespace, params.zns.rrl, 0), + DEFINE_PROP_UINT32("zns.frl", NvmeNamespace, params.zns.frl, 0), + DEFINE_PROP_UINT32("zns.rrld", NvmeNamespace, params.zns.rrld, 0), + DEFINE_PROP_UINT32("zns.frld", NvmeNamespace, params.zns.frld, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 6acda5c2cf3f..f92045f19948 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,10 @@ typedef struct NvmeNamespaceParams { uint16_t ozcs; uint32_t mar; uint32_t mor; + uint32_t rrl; + uint32_t frl; + uint32_t rrld; + uint32_t frld; } zns; } NvmeNamespaceParams; @@ -40,6 +44,11 @@ typedef struct NvmeZone { uint64_t wp_staging; + struct { + int64_t activated_ns; + int64_t finished_ns; + } stats; + QTAILQ_ENTRY(NvmeZone) lru_entry; } NvmeZone; @@ -77,6 +86,10 @@ typedef struct NvmeNamespace { } resources; NvmeChangedZoneList changed_list; + + QTAILQ_HEAD(, NvmeZone) lru_finished; + QEMUTimer *timer; + int64_t rrl_ns, rrld_ns, frl_ns, frld_ns; } zns; } NvmeNamespace; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6db6daa62bc5..f28373feb887 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -875,13 +875,13 @@ static void nvme_process_aers(void *opaque) } } -static void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, - uint8_t event_type, uint8_t event_info, - uint8_t log_page) +void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, uint8_t event_type, + uint8_t event_info, uint8_t log_page) { NvmeAsyncEvent *event; - trace_pci_nvme_enqueue_event(event_type, event_info, log_page); + trace_pci_nvme_enqueue_event(ns ? ns->params.nsid : -1, event_type, + event_info, log_page); if (n->aer_queued == n->params.aer_max_queued) { trace_pci_nvme_enqueue_event_noqueue(n->aer_queued); @@ -1194,7 +1194,7 @@ static void nvme_update_zone_descr(NvmeNamespace *ns, NvmeRequest *req, nvme_req_add_aio(req, aio); } -static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone) +void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone) { uint16_t num_ids = le16_to_cpu(ns->zns.changed_list.num_ids); @@ -1213,12 +1213,8 @@ static void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone) NVME_LOG_CHANGED_ZONE_LIST); } -static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone, NvmeZoneState to, - NvmeRequest *req); - -static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, - NvmeRequest *req) +void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, + NvmeRequest *req) { trace_pci_nvme_zone_excursion(ns->params.nsid, nvme_zslba(zone), nvme_zs_str(zone)); @@ -1226,6 +1222,7 @@ static void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, assert(nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSF, req) == NVME_SUCCESS); NVME_ZA_SET_ZFC(zone->zd.za, 0x1); + NVME_ZA_SET_FZR(zone->zd.za, 0x0); nvme_zone_changed(n, ns, zone); @@ -1333,9 +1330,8 @@ out: * The function does NOT change the Zone Attribute field; this must be done by * the caller. */ -static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, - NvmeZone *zone, NvmeZoneState to, - NvmeRequest *req) +uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, + NvmeZoneState to, NvmeRequest *req) { NvmeZoneState from = nvme_zs(zone); uint16_t status; @@ -1366,7 +1362,7 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry); - goto out; + goto activated; case NVME_ZS_ZSIO: case NVME_ZS_ZSEO: @@ -1389,7 +1385,7 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry); - goto out; + goto activated; default: return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; @@ -1512,8 +1508,28 @@ static uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, return NVME_INVALID_ZONE_STATE_TRANSITION | NVME_DNR; } +activated: + zone->stats.activated_ns = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + + if (ns->params.zns.frld && !timer_pending(ns->zns.timer)) { + int64_t next_timer = zone->stats.activated_ns + ns->zns.frld_ns; + timer_mod(ns->zns.timer, next_timer); + } + out: nvme_zs_set(zone, to); + + if (to == NVME_ZS_ZSF && ns->params.zns.rrld) { + QTAILQ_INSERT_TAIL(&ns->zns.lru_finished, zone, lru_entry); + + zone->stats.finished_ns = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + + if (!timer_pending(ns->zns.timer)) { + int64_t next_timer = zone->stats.finished_ns + ns->zns.rrld_ns; + timer_mod(ns->zns.timer, next_timer); + } + } + return NVME_SUCCESS; } @@ -1979,6 +1995,7 @@ static uint16_t nvme_zone_mgmt_send_reset(NvmeCtrl *n, NvmeRequest *req, case NVME_ZS_ZSRO: assert(!nvme_zrm_transition(n, ns, zone, NVME_ZS_ZSO, req)); + nvme_update_zone_info(ns, req, zone); return NVME_NO_COMPLETE; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 309fb1b94ecb..e51a38546080 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -318,5 +318,12 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req) } int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); +uint16_t nvme_zrm_transition(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, + NvmeZoneState to, NvmeRequest *req); +void nvme_enqueue_event(NvmeCtrl *n, NvmeNamespace *ns, uint8_t event_type, + uint8_t event_info, uint8_t log_page); +void nvme_zone_excursion(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone, + NvmeRequest *req); +void nvme_zone_changed(NvmeCtrl *n, NvmeNamespace *ns, NvmeZone *zone); #endif /* HW_NVME_H */ diff --git a/hw/block/trace-events b/hw/block/trace-events index c4c80644f782..249487ae79fc 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -85,7 +85,7 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16"" pci_nvme_aer_aerl_exceeded(void) "aerl exceeded" pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8"" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" -pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" +pci_nvme_enqueue_event(uint32_t nsid, uint8_t typ, uint8_t info, uint8_t log_page) "nsid 0x%"PRIx32" type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" pci_nvme_enqueue_event_noqueue(int queued) "queued %d" pci_nvme_enqueue_event_masked(uint8_t typ) "type 0x%"PRIx8"" pci_nvme_no_outstanding_aers(void) "ignoring event; no outstanding AERs" @@ -105,6 +105,7 @@ pci_nvme_zone_zrm_release_active(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsi pci_nvme_zone_zrm_excursion_not_allowed(uint16_t cid, uint32_t nsid) "cid %"PRIu16" nsid %"PRIu32"" pci_nvme_zone_changed(uint32_t nsid, uint64_t zslba) "nsid %"PRIu32" zslba 0x%"PRIx64"" pci_nvme_zone_excursion(uint32_t nsid, uint64_t zslba, const char *zc) "nsid %"PRIu32" zslba 0x%"PRIx64" zc \"%s\"" +pci_nvme_ns_process_timer(uint32_t nsid) "nsid %"PRIu32"" pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64"" pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller config=0x%"PRIx64""