diff mbox series

[v7,04/11] hw/block/nvme: Support allocated CNS command variants

Message ID 20201019021726.12048-5-dmitry.fomichev@wdc.com (mailing list archive)
State New, archived
Headers show
Series hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set | expand

Commit Message

Dmitry Fomichev Oct. 19, 2020, 2:17 a.m. UTC
From: Niklas Cassel <niklas.cassel@wdc.com>

Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.

While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.

However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.

In NVMe, a namespace is active if it exists and is attached to the
controller.

CAP.CSS (together with the I/O Command Set data structure) defines
what command sets are supported by the controller.

CC.CSS (together with Set Profile) can be set to enable a subset of
the available command sets.

Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces
will still be attached (and thus marked as active).
Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces
will still be attached (and thus marked as active).

However, any operation from a disabled command set will result in a
Invalid Command Opcode.

Add a new Boolean namespace property, "attached", to provide the most
basic namespace attachment support. The default value for this new
property is true. Also, implement the logic in the new CNS values to
include/exclude namespaces based on this new property. The only thing
missing is hooking up the actual Namespace Attachment command opcode,
which will allow a user to toggle the "attached" flag per namespace.

The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 hw/block/nvme-ns.c   |  1 +
 hw/block/nvme-ns.h   |  1 +
 hw/block/nvme.c      | 68 ++++++++++++++++++++++++++++++++++++--------
 include/block/nvme.h | 20 +++++++------
 4 files changed, 70 insertions(+), 20 deletions(-)

Comments

Keith Busch Oct. 19, 2020, 8:07 p.m. UTC | #1
On Mon, Oct 19, 2020 at 11:17:19AM +0900, Dmitry Fomichev wrote:
> Add a new Boolean namespace property, "attached", to provide the most
> basic namespace attachment support. The default value for this new
> property is true. Also, implement the logic in the new CNS values to
> include/exclude namespaces based on this new property. The only thing
> missing is hooking up the actual Namespace Attachment command opcode,
> which will allow a user to toggle the "attached" flag per namespace.
> 
> The reason for not hooking up this command completely is because the
> NVMe specification requires the namespace management command to be
> supported if the namespace attachment command is supported.

Huh, the spec does require that, and that seems like an odd requirement
since it prevents dynamic namespace attach states in a static namespace
setup. I'm not sure why the spec assumes those two things go together,
but it sure enough does!

The implementation looks fine.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Klaus Jensen Oct. 20, 2020, 8:21 a.m. UTC | #2
On Oct 19 11:17, Dmitry Fomichev wrote:

(snip)

> CAP.CSS (together with the I/O Command Set data structure) defines
> what command sets are supported by the controller.
> 
> CC.CSS (together with Set Profile) can be set to enable a subset of
> the available command sets.
> 
> Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces
> will still be attached (and thus marked as active).
> Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces
> will still be attached (and thus marked as active).
> 
> However, any operation from a disabled command set will result in a
> Invalid Command Opcode.
> 

This part of the commit message seems irrelevant to the patch.

> Add a new Boolean namespace property, "attached", to provide the most
> basic namespace attachment support. The default value for this new
> property is true. Also, implement the logic in the new CNS values to
> include/exclude namespaces based on this new property. The only thing
> missing is hooking up the actual Namespace Attachment command opcode,
> which will allow a user to toggle the "attached" flag per namespace.
> 

Without Namespace Attachment support, the sole purpose of this parameter
is to allow unusable namespace IDs to be reported. I have no problems
with adding support for the additional CNS values. They will return
identical responses, but I think that is good enough for now.

When it is not really needed, we should be wary of adding a parameter
that is really hard to get rid of again.

> The reason for not hooking up this command completely is because the
> NVMe specification requires the namespace management command to be
> supported if the namespace attachment command is supported.
> 

There are many ways to support Namespace Management, and there are a lot
of quirks with each of them. Do we use a big blockdev and carve out
namespaces? Then, what are the semantics of an image resize operation?

Do we dynamically create blockdev devices - thats sounds pretty nice,
but might have other quirks and the attachment is not really persistent.

I think at least the "attached" parameter should be x-prefixed, but
better, leave it out for now until we know how we want Namespace
Attachment and Management to be implemented.
Dmitry Fomichev Oct. 20, 2020, 11:09 p.m. UTC | #3
> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Tuesday, October 20, 2020 4:21 AM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky
> <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis
> <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v7 04/11] hw/block/nvme: Support allocated CNS
> command variants
> 
> On Oct 19 11:17, Dmitry Fomichev wrote:
> 
> (snip)
> 
> > CAP.CSS (together with the I/O Command Set data structure) defines
> > what command sets are supported by the controller.
> >
> > CC.CSS (together with Set Profile) can be set to enable a subset of
> > the available command sets.
> >
> > Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces
> > will still be attached (and thus marked as active).
> > Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces
> > will still be attached (and thus marked as active).
> >
> > However, any operation from a disabled command set will result in a
> > Invalid Command Opcode.
> >
> 
> This part of the commit message seems irrelevant to the patch.
> 
> > Add a new Boolean namespace property, "attached", to provide the most
> > basic namespace attachment support. The default value for this new
> > property is true. Also, implement the logic in the new CNS values to
> > include/exclude namespaces based on this new property. The only thing
> > missing is hooking up the actual Namespace Attachment command opcode,
> > which will allow a user to toggle the "attached" flag per namespace.
> >
> 
> Without Namespace Attachment support, the sole purpose of this
> parameter
> is to allow unusable namespace IDs to be reported. I have no problems
> with adding support for the additional CNS values. They will return
> identical responses, but I think that is good enough for now.
> 
> When it is not really needed, we should be wary of adding a parameter
> that is really hard to get rid of again.
> 
> > The reason for not hooking up this command completely is because the
> > NVMe specification requires the namespace management command to be
> > supported if the namespace attachment command is supported.
> >
> 
> There are many ways to support Namespace Management, and there are a
> lot
> of quirks with each of them. Do we use a big blockdev and carve out
> namespaces? Then, what are the semantics of an image resize operation?
> 
> Do we dynamically create blockdev devices - thats sounds pretty nice,
> but might have other quirks and the attachment is not really persistent.
> 
> I think at least the "attached" parameter should be x-prefixed, but
> better, leave it out for now until we know how we want Namespace
> Attachment and Management to be implemented.

I don't mind leaving this property out. I used it for testing the patch and it
could, in theory, be manipulated by an external process doing NS
Management, but, as you said, there is no certainty about now NS
Management will be implemented and any related CLI interface should
better be added as a part of this future work, not now.
diff mbox series

Patch

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c0362426cc..974aea33f7 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -132,6 +132,7 @@  static Property nvme_ns_props[] = {
     DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
     DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
     DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+    DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index d795e44bab..d6b2808b97 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@ 
 
 typedef struct NvmeNamespaceParams {
     uint32_t nsid;
+    bool     attached;
     QemuUUID uuid;
 } NvmeNamespaceParams;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ca0d0abf5c..93728e51b3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1062,6 +1062,9 @@  static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
     if (unlikely(!req->ns)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
+    if (!req->ns->params.attached) {
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
 
     if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
         trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
@@ -1222,6 +1225,7 @@  static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
     uint32_t trans_len;
     NvmeNamespace *ns;
     time_t current_ms;
+    int i;
 
     if (off >= sizeof(smart)) {
         return NVME_INVALID_FIELD | NVME_DNR;
@@ -1232,15 +1236,18 @@  static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
         if (!ns) {
             return NVME_INVALID_NSID | NVME_DNR;
         }
-        nvme_set_blk_stats(ns, &stats);
+        if (ns->params.attached) {
+            nvme_set_blk_stats(ns, &stats);
+        }
     } else {
-        int i;
-
         for (i = 1; i <= n->num_namespaces; i++) {
             ns = nvme_ns(n, i);
             if (!ns) {
                 continue;
             }
+            if (!ns->params.attached) {
+                continue;
+            }
             nvme_set_blk_stats(ns, &stats);
         }
     }
@@ -1531,7 +1538,8 @@  static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
     return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req,
+                                 bool only_active)
 {
     NvmeNamespace *ns;
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
@@ -1548,11 +1556,16 @@  static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
         return nvme_rpt_empty_id_struct(n, req);
     }
 
+    if (only_active && !ns->params.attached) {
+        return nvme_rpt_empty_id_struct(n, req);
+    }
+
     return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs),
                     DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
+                                     bool only_active)
 {
     NvmeNamespace *ns;
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
@@ -1569,6 +1582,10 @@  static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
         return nvme_rpt_empty_id_struct(n, req);
     }
 
+    if (only_active && !ns->params.attached) {
+        return nvme_rpt_empty_id_struct(n, req);
+    }
+
     if (c->csi == NVME_CSI_NVM) {
         return nvme_rpt_empty_id_struct(n, req);
     }
@@ -1576,7 +1593,8 @@  static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
     return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req,
+                                     bool only_active)
 {
     NvmeNamespace *ns;
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
@@ -1606,6 +1624,9 @@  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
         if (ns->params.nsid < min_nsid) {
             continue;
         }
+        if (only_active && !ns->params.attached) {
+            continue;
+        }
         list_ptr[j++] = cpu_to_le32(ns->params.nsid);
         if (j == data_len / sizeof(uint32_t)) {
             break;
@@ -1615,7 +1636,8 @@  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
     return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req,
+                                         bool only_active)
 {
     NvmeNamespace *ns;
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
@@ -1639,6 +1661,9 @@  static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
         if (ns->params.nsid < min_nsid) {
             continue;
         }
+        if (only_active && !ns->params.attached) {
+            continue;
+        }
         list_ptr[j++] = cpu_to_le32(ns->params.nsid);
         if (j == data_len / sizeof(uint32_t)) {
             break;
@@ -1712,17 +1737,25 @@  static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
 
     switch (le32_to_cpu(c->cns)) {
     case NVME_ID_CNS_NS:
-        return nvme_identify_ns(n, req);
+        return nvme_identify_ns(n, req, true);
     case NVME_ID_CNS_CS_NS:
-        return nvme_identify_ns_csi(n, req);
+        return nvme_identify_ns_csi(n, req, true);
+    case NVME_ID_CNS_NS_PRESENT:
+        return nvme_identify_ns(n, req, false);
+    case NVME_ID_CNS_CS_NS_PRESENT:
+        return nvme_identify_ns_csi(n, req, false);
     case NVME_ID_CNS_CTRL:
         return nvme_identify_ctrl(n, req);
     case NVME_ID_CNS_CS_CTRL:
         return nvme_identify_ctrl_csi(n, req);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist(n, req);
+        return nvme_identify_nslist(n, req, true);
     case NVME_ID_CNS_CS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist_csi(n, req);
+        return nvme_identify_nslist_csi(n, req, true);
+    case NVME_ID_CNS_NS_PRESENT_LIST:
+        return nvme_identify_nslist(n, req, false);
+    case NVME_ID_CNS_CS_NS_PRESENT_LIST:
+        return nvme_identify_nslist_csi(n, req, false);
     case NVME_ID_CNS_NS_DESCR_LIST:
         return nvme_identify_ns_descr_list(n, req);
     case NVME_ID_CNS_IO_COMMAND_SET:
@@ -1795,6 +1828,7 @@  static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
 
 static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req)
 {
+    NvmeNamespace *ns;
     NvmeCmd *cmd = &req->cmd;
     uint32_t dw10 = le32_to_cpu(cmd->cdw10);
     uint32_t dw11 = le32_to_cpu(cmd->cdw11);
@@ -1826,7 +1860,11 @@  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req)
             return NVME_INVALID_NSID | NVME_DNR;
         }
 
-        if (!nvme_ns(n, nsid)) {
+        ns = nvme_ns(n, nsid);
+        if (!ns) {
+            return NVME_INVALID_FIELD | NVME_DNR;
+        }
+        if (!ns->params.attached) {
             return NVME_INVALID_FIELD | NVME_DNR;
         }
     }
@@ -1968,6 +2006,9 @@  static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
             if (unlikely(!ns)) {
                 return NVME_INVALID_FIELD | NVME_DNR;
             }
+            if (!ns->params.attached) {
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
         }
     } else if (nsid && nsid != NVME_NSID_BROADCAST) {
         if (!nvme_nsid_valid(n, nsid)) {
@@ -2015,6 +2056,9 @@  static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
             if (!ns) {
                 continue;
             }
+            if (!ns->params.attached) {
+                continue;
+            }
 
             if (!(dw11 & 0x1) && blk_enable_write_cache(ns->blkconf.blk)) {
                 blk_flush(ns->blkconf.blk);
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f5ac9143c4..27125c9d28 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -805,14 +805,18 @@  typedef struct QEMU_PACKED NvmePSD {
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
 enum NvmeIdCns {
-    NVME_ID_CNS_NS                = 0x00,
-    NVME_ID_CNS_CTRL              = 0x01,
-    NVME_ID_CNS_NS_ACTIVE_LIST    = 0x02,
-    NVME_ID_CNS_NS_DESCR_LIST     = 0x03,
-    NVME_ID_CNS_CS_NS             = 0x05,
-    NVME_ID_CNS_CS_CTRL           = 0x06,
-    NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
-    NVME_ID_CNS_IO_COMMAND_SET    = 0x1c,
+    NVME_ID_CNS_NS                    = 0x00,
+    NVME_ID_CNS_CTRL                  = 0x01,
+    NVME_ID_CNS_NS_ACTIVE_LIST        = 0x02,
+    NVME_ID_CNS_NS_DESCR_LIST         = 0x03,
+    NVME_ID_CNS_CS_NS                 = 0x05,
+    NVME_ID_CNS_CS_CTRL               = 0x06,
+    NVME_ID_CNS_CS_NS_ACTIVE_LIST     = 0x07,
+    NVME_ID_CNS_NS_PRESENT_LIST       = 0x10,
+    NVME_ID_CNS_NS_PRESENT            = 0x11,
+    NVME_ID_CNS_CS_NS_PRESENT_LIST    = 0x1a,
+    NVME_ID_CNS_CS_NS_PRESENT         = 0x1b,
+    NVME_ID_CNS_IO_COMMAND_SET        = 0x1c,
 };
 
 typedef struct QEMU_PACKED NvmeIdCtrl {