diff mbox series

[v4,06/14] hw/block/nvme: Add support for active/inactive namespaces

Message ID 20200923182021.3724-7-dmitry.fomichev@wdc.com (mailing list archive)
State New, archived
Headers show
Series hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set | expand

Commit Message

Dmitry Fomichev Sept. 23, 2020, 6:20 p.m. UTC
From: Niklas Cassel <niklas.cassel@wdc.com>

In NVMe, a namespace is active if it exists and is attached to the
controller.

CAP.CSS (together with the I/O Command Set data structure) defines what
command sets are supported by the controller.

CC.CSS (together with Set Profile) can be set to enable a subset of the
available command sets. The namespaces belonging to a disabled command set
will not be able to attach to the controller, and will thus be inactive.

E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
marked as inactive.

The identify namespace, the identify namespace CSI specific, and the namespace
list commands have two different versions, one that only shows active
namespaces, and the other version that shows existing namespaces, regardless
of whether the namespace is attached or not.

Add an attached member to struct NvmeNamespace, and implement the missing CNS
commands.

The added functionality will also simplify the implementation of namespace
management in the future, since namespace management can also attach and
detach namespaces.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 hw/block/nvme.c      | 54 ++++++++++++++++++++++++++++++++++++--------
 hw/block/nvme.h      |  1 +
 include/block/nvme.h | 20 +++++++++-------
 3 files changed, 57 insertions(+), 18 deletions(-)

Comments

Klaus Jensen Sept. 24, 2020, 12:12 p.m. UTC | #1
On Sep 24 03:20, Dmitry Fomichev wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> In NVMe, a namespace is active if it exists and is attached to the
> controller.
> 
> CAP.CSS (together with the I/O Command Set data structure) defines what
> command sets are supported by the controller.
> 
> CC.CSS (together with Set Profile) can be set to enable a subset of the
> available command sets. The namespaces belonging to a disabled command set
> will not be able to attach to the controller, and will thus be inactive.
> 
> E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
> marked as inactive.
> 

Hmm. I'm not convinced that this is correct. Can you reference the spec?

On the specific case you mention the spec is actually pretty clear:

  "When only the Admin Command Set is supported, any command submitted
  on an I/O Submission Queue and any I/O Command Set Specific Admin
  command submitted on the Admin Submission Queue is completed with
  status Invalid Command Opcode."

My /interpretation/ (because the spec is vague on this point) is that
with TP 4056, if the host writes 0x0 to CC.CSS, you will (should) just
see Invalid Command Opcode for namespaces not supporting the NVM command
set since we are operating in a backward compatible way.

Now, if the host sets CC.CSS to 0x6, then it is obviously aware of
namespaces and other rules apply. For instance, it may set the I/O
Command Set Combination Index through a Set Features command, but TP
4056 is clear that the host will not be allowed to choose a combination
that leaves an attached namespace unsupported.

For this device, that does not implement namespace management and thus
has no notion of attaching/detaching namespaces, the controller should
by default choose an I/O Command Set Combination that indicates support
for all I/O command sets that are required to support the namespaces
configured.
Niklas Cassel Sept. 24, 2020, 6:17 p.m. UTC | #2
On Thu, Sep 24, 2020 at 02:12:03PM +0200, Klaus Jensen wrote:
> On Sep 24 03:20, Dmitry Fomichev wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
> > marked as inactive.
> > 
> 
> Hmm. I'm not convinced that this is correct. Can you reference the spec?
> 

"If the user sets CC.CSS to Admin Only, NVM namespaces should be marked as inactive."

After reading the spec several times, I agree that this statement is false,
although I really wish it wasn't :)



My interpretation was based on:

From CC.CSS:
"If bit 44 is set to ‘1’ in the Command Sets Supported (CSS) field, then the
value 111b indicates that only the Admin Command Set is supported and that no
I/O Command Set or I/O Command Set Specific Admin commands are supported."

The NVM Command Set is a Command Set, so I assumed that since the Command
Set was not supported, trying to do something like CNS 00h (Identify Namespace),
would return an zero-filled struct. (Since the namespace belongs to a
Command Set that is not supported.)

To me it seems silly that a namespace can be "active" at the same time
as the Command Set that the namespace belongs to is not enabled,
but that seems like how the spec is written...


Additionally in TP4056 section 5.19 it says:
"If an attempt is made to attach a namespace to a controller that supports the
corresponding I/O Command Set and the corresponding I/O Command Set is not
enabled by the I/O Command Set profile feature, then the command shall be
aborted with a status of I/O Command Set Not Enabled."

But if you already have a e.g. a Key Value namespace attached, and boot up with
e.g. CC.CSS = NVM, or CC.CSS = Admin, you will have a namespace belonging to a
disabled command set attached (and "active" :p).
But if you have no namespaces attached, boot up with CC.CSS = NVM or
CC.CSS = Admin, you will not be allowed to attach your Key Value namespace.
(5.19 says nothing about CC.CSS, but let's assume it isn't supposed to be
allowed.)
Will you be allowed to attach a ZNS namespace? (Since CC.CSS != ALL Selected,
I would assume that is is not supposed to be allowed.)
Now, will you be allowed to attach a NVM namespace when CC.CSS = Admin?
(I assume that the intention is that you should.)


CC.CSS can only be changed when the controller is disabled.
I assume that you can attach NVM namespaces even when CC.CSS = Admin.
(Attaching namespaces != NVM is probably not allowed when
CC.CSS != ALL Selected.)
Attaching namespaces are persistent. Even if you restart with CC.CSS = Admin,
they will still be attached, and active.
(Although you might not be able to do anything with them... see the next
section.)
You can not change to a profile (using Set Profile) that lacks a command set
that you are currently using. This change is while the controller is enabled,
so I guess it is ok that this check is stricter than CC.CSS.

Things would have been way simpler if the controller just deattached
non-supported namespaces internally at power on...


> My /interpretation/ (because the spec is vague on this point) is that
> with TP 4056, if the host writes 0x0 to CC.CSS, you will (should) just
> see Invalid Command Opcode for namespaces not supporting the NVM command
> set since we are operating in a backward compatible way.

If a controller is booted with CC.CSS = NVM Command Set:

We have an attached Key Value namespace. (It will be set as active.)
I think that we can agree that any IOCS Key Value specific command
will fail.

We have an attached Zoned Namespace. (It will be set as active.)
Sure, any IOCS Zoned Namespace Command specific command will fail.
Your interpretation is that it will allow NVM Commands, since a Zoned Namespace
implements the opcodes of the NVM Command Set.

Each Command Set has a table with the opcodes that it supports.
In e.g. Key Value Command Set, opcode 01h means "Store command"
but in NVM Command Set, opcode 01h means "Write command".

These commands are totally different, and uses completely different dwords.
So I don't think that it is obvious that CC.CSS = NVM, should allow
"NVM reserved opcodes" for certain Command Sets, but not in others.

Like so many other things in NVMe, this will probably be vendor specific.
For devices only supporting NVM + ZNS, it will probably support NVM
Command Set commands even when CC.CSS = NVM, but it's probably not
something that can be guarantee to be true for all devices supporting
a Command Set that extends the NVM Command Set.


Kind regards,
Niklas
Klaus Jensen Sept. 24, 2020, 6:55 p.m. UTC | #3
On Sep 24 18:17, Niklas Cassel wrote:
> On Thu, Sep 24, 2020 at 02:12:03PM +0200, Klaus Jensen wrote:
> > On Sep 24 03:20, Dmitry Fomichev wrote:
> > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > 
> > > E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
> > > marked as inactive.
> > > 
> > 
> > Hmm. I'm not convinced that this is correct. Can you reference the spec?
> > 
> 
> CC.CSS can only be changed when the controller is disabled.

Right. I think I see you point. While the controller is disabled, the
host obviously cannot even see what namespaces are available, so the
controller is free to only expose (aka, attach) the namespaces that
makes sense for the value of CC.CSS.

OK then, the patch is good :)
Niklas Cassel Sept. 24, 2020, 7:40 p.m. UTC | #4
On Thu, Sep 24, 2020 at 08:55:24PM +0200, Klaus Jensen wrote:
> On Sep 24 18:17, Niklas Cassel wrote:
> > On Thu, Sep 24, 2020 at 02:12:03PM +0200, Klaus Jensen wrote:
> > > On Sep 24 03:20, Dmitry Fomichev wrote:
> > > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > > 
> > > > E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
> > > > marked as inactive.
> > > > 
> > > 
> > > Hmm. I'm not convinced that this is correct. Can you reference the spec?
> > > 
> > 
> > CC.CSS can only be changed when the controller is disabled.
> 
> Right. I think I see you point. While the controller is disabled, the
> host obviously cannot even see what namespaces are available, so the
> controller is free to only expose (aka, attach) the namespaces that
> makes sense for the value of CC.CSS.
> 
> OK then, the patch is good :)

That was my thought, that the controller internally would
detach unsupported namespaces (even if the controller didn't
expose namespace management capabilities to the user).

This was how I assumed that things worked, but if we should
follow the spec strictly, we should do like you suggested
and keep them attached, and return the proper error code,
on non-admin commands.

Thank you for improving my understanding.
Considering that CC.CSS can only be changed when the controller
is disabled, I still kind of wished that the spec said that
unsupported namespaces would be automatically detached.


Kind regards,
Niklas
diff mbox series

Patch

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e0f885498d..6c231b20f9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1189,7 +1189,8 @@  static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
     return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req,
+                                 bool only_active)
 {
     NvmeNamespace *ns;
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
@@ -1207,11 +1208,16 @@  static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
     ns = &n->namespaces[nsid - 1];
     assert(nsid == ns->nsid);
 
+    if (only_active && !ns->attached) {
+        return nvme_rpt_empty_id_struct(n, prp1, prp2, req);
+    }
+
     return nvme_dma_prp(n, (uint8_t *)&ns->id_ns, sizeof(ns->id_ns), prp1,
                         prp2, DMA_DIRECTION_FROM_DEVICE, req);
 }
 
-static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
+                                     bool only_active)
 {
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
     NvmeNamespace *ns;
@@ -1229,6 +1235,10 @@  static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
     ns = &n->namespaces[nsid - 1];
     assert(nsid == ns->nsid);
 
+    if (only_active && !ns->attached) {
+        return nvme_rpt_empty_id_struct(n, prp1, prp2, req);
+    }
+
     if (c->csi == NVME_CSI_NVM) {
         return nvme_rpt_empty_id_struct(n, prp1, prp2, req);
     }
@@ -1236,7 +1246,8 @@  static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
     return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req,
+                                     bool only_active)
 {
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
     static const int data_len = NVME_IDENTIFY_DATA_SIZE;
@@ -1261,7 +1272,7 @@  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
 
     list = g_malloc0(data_len);
     for (i = 0; i < n->num_namespaces; i++) {
-        if (i < min_nsid) {
+        if (i < min_nsid || (only_active && !n->namespaces[i].attached)) {
             continue;
         }
         list[j++] = cpu_to_le32(i + 1);
@@ -1275,7 +1286,8 @@  static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
     return ret;
 }
 
-static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req,
+                                         bool only_active)
 {
     NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
     static const int data_len = NVME_IDENTIFY_DATA_SIZE;
@@ -1294,7 +1306,8 @@  static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
 
     list = g_malloc0(data_len);
     for (i = 0; i < n->num_namespaces; i++) {
-        if (i < min_nsid) {
+        if (i < min_nsid || c->csi != n->namespaces[i].csi ||
+            (only_active && !n->namespaces[i].attached)) {
             continue;
         }
         list[j++] = cpu_to_le32(i + 1);
@@ -1386,17 +1399,25 @@  static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
 
     switch (le32_to_cpu(c->cns)) {
     case NVME_ID_CNS_NS:
-        return nvme_identify_ns(n, req);
+        return nvme_identify_ns(n, req, true);
     case NVME_ID_CNS_CS_NS:
-        return nvme_identify_ns_csi(n, req);
+        return nvme_identify_ns_csi(n, req, true);
+    case NVME_ID_CNS_NS_PRESENT:
+        return nvme_identify_ns(n, req, false);
+    case NVME_ID_CNS_CS_NS_PRESENT:
+        return nvme_identify_ns_csi(n, req, false);
     case NVME_ID_CNS_CTRL:
         return nvme_identify_ctrl(n, req);
     case NVME_ID_CNS_CS_CTRL:
         return nvme_identify_ctrl_csi(n, req);
     case NVME_ID_CNS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist(n, req);
+        return nvme_identify_nslist(n, req, true);
     case NVME_ID_CNS_CS_NS_ACTIVE_LIST:
-        return nvme_identify_nslist_csi(n, req);
+        return nvme_identify_nslist_csi(n, req, true);
+    case NVME_ID_CNS_NS_PRESENT_LIST:
+        return nvme_identify_nslist(n, req, false);
+    case NVME_ID_CNS_CS_NS_PRESENT_LIST:
+        return nvme_identify_nslist_csi(n, req, false);
     case NVME_ID_CNS_NS_DESCR_LIST:
         return nvme_identify_ns_descr_list(n, req);
     case NVME_ID_CNS_IO_COMMAND_SET:
@@ -1838,6 +1859,7 @@  static int nvme_start_ctrl(NvmeCtrl *n)
 {
     uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
     uint32_t page_size = 1 << page_bits;
+    int i;
 
     if (unlikely(n->cq[0])) {
         trace_pci_nvme_err_startfail_cq();
@@ -1924,6 +1946,18 @@  static int nvme_start_ctrl(NvmeCtrl *n)
     nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0,
         NVME_AQA_ASQS(n->bar.aqa) + 1);
 
+    for (i = 0; i < n->num_namespaces; i++) {
+        n->namespaces[i].attached = false;
+        switch (n->namespaces[i].csi) {
+        case NVME_CSI_NVM:
+            if (NVME_CC_CSS(n->bar.cc) == CSS_NVM_ONLY ||
+                NVME_CC_CSS(n->bar.cc) == CSS_CSI) {
+                n->namespaces[i].attached = true;
+            }
+            break;
+        }
+    }
+
     nvme_set_timestamp(n, 0ULL);
 
     QTAILQ_INIT(&n->aer_queue);
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index c1deac9667..71e4344471 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -66,6 +66,7 @@  typedef struct NvmeNamespace {
     NvmeIdNs        id_ns;
     uint32_t        nsid;
     uint8_t         csi;
+    bool            attached;
     QemuUUID        uuid;
 } NvmeNamespace;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 74cc11782e..f8356394f5 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -804,14 +804,18 @@  typedef struct QEMU_PACKED NvmePSD {
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
 enum NvmeIdCns {
-    NVME_ID_CNS_NS                = 0x00,
-    NVME_ID_CNS_CTRL              = 0x01,
-    NVME_ID_CNS_NS_ACTIVE_LIST    = 0x02,
-    NVME_ID_CNS_NS_DESCR_LIST     = 0x03,
-    NVME_ID_CNS_CS_NS             = 0x05,
-    NVME_ID_CNS_CS_CTRL           = 0x06,
-    NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
-    NVME_ID_CNS_IO_COMMAND_SET    = 0x1c,
+    NVME_ID_CNS_NS                    = 0x00,
+    NVME_ID_CNS_CTRL                  = 0x01,
+    NVME_ID_CNS_NS_ACTIVE_LIST        = 0x02,
+    NVME_ID_CNS_NS_DESCR_LIST         = 0x03,
+    NVME_ID_CNS_CS_NS                 = 0x05,
+    NVME_ID_CNS_CS_CTRL               = 0x06,
+    NVME_ID_CNS_CS_NS_ACTIVE_LIST     = 0x07,
+    NVME_ID_CNS_NS_PRESENT_LIST       = 0x10,
+    NVME_ID_CNS_NS_PRESENT            = 0x11,
+    NVME_ID_CNS_CS_NS_PRESENT_LIST    = 0x1a,
+    NVME_ID_CNS_CS_NS_PRESENT         = 0x1b,
+    NVME_ID_CNS_IO_COMMAND_SET        = 0x1c,
 };
 
 typedef struct QEMU_PACKED NvmeIdCtrl {