diff mbox series

[1/2] nvme: make independent ns identify default

Message ID 20241008145503.987195-2-m@bjorling.me (mailing list archive)
State New
Headers show
Series nvme: add rotational support | expand

Commit Message

Matias Bjørling Oct. 8, 2024, 2:55 p.m. UTC
From: Matias Bjørling <matias.bjorling@wdc.com>

The NVMe 2.0 specification adds an independent identify namespace
data structure that contains generic attributes that apply to all
namespace types. Some attributes carry over from the NVM command set
identify namespace data structure, and others are new.

Currently, the data structure only considered when CRIMS is enabled or
when the namespace type is key-value.

However, the independent namespace data structure
is mandatory for devices that implement features from the 2.0+
specification. Therefore, we can check this data structure first. If
unavailable, retrieve the generic attributes from the NVM command set
identify namespace data structure.

Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
---
 drivers/nvme/host/core.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Hannes Reinecke Oct. 9, 2024, 6:16 a.m. UTC | #1
On 10/8/24 16:55, Matias Bjørling wrote:
> From: Matias Bjørling <matias.bjorling@wdc.com>
> 
> The NVMe 2.0 specification adds an independent identify namespace
> data structure that contains generic attributes that apply to all
> namespace types. Some attributes carry over from the NVM command set
> identify namespace data structure, and others are new.
> 
> Currently, the data structure only considered when CRIMS is enabled or
> when the namespace type is key-value.
> 
> However, the independent namespace data structure
> is mandatory for devices that implement features from the 2.0+
> specification. Therefore, we can check this data structure first. If
> unavailable, retrieve the generic attributes from the NVM command set
> identify namespace data structure.
> 
> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
> ---
>   drivers/nvme/host/core.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0dc8bcc664f2..9cbef6342c39 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3999,7 +3999,7 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   {
>   	struct nvme_ns_info info = { .nsid = nsid };
>   	struct nvme_ns *ns;
> -	int ret;
> +	int ret = 1;
>   
>   	if (nvme_identify_ns_descs(ctrl, &info))
>   		return;
> @@ -4015,10 +4015,9 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	 * data structure to find all the generic information that is needed to
>   	 * set up a namespace.  If not fall back to the legacy version.
>   	 */
> -	if ((ctrl->cap & NVME_CAP_CRMS_CRIMS) ||
> -	    (info.ids.csi != NVME_CSI_NVM && info.ids.csi != NVME_CSI_ZNS))
> +	if (!nvme_ctrl_limited_cns(ctrl))
>   		ret = nvme_ns_info_from_id_cs_indep(ctrl, &info);
> -	else
> +	if (ret > 0)
>   		ret = nvme_ns_info_from_identify(ctrl, &info);
>   
>   	if (info.is_removed)

That is a very odd coding. 'info' will only be filled out for a non-zero
return value of nvme_ns_info_from_cs_indep().
So why not check for that?
But if we get an NVME status back there is a fair chance that something 
else than 'invalid field' (or whatever indicated that the command is not 
supported). That then would cause the device to be misdetected without 
the admin knowning.
Shouldn't we add a message if we fall back to nvme_ns_info_from_identify()?

Cheers,

Hannes
Christoph Hellwig Oct. 9, 2024, 7:46 a.m. UTC | #2
On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
> However, the independent namespace data structure
> is mandatory for devices that implement features from the 2.0+
> specification. Therefore, we can check this data structure first. If
> unavailable, retrieve the generic attributes from the NVM command set
> identify namespace data structure.

I'm not a huge fan of this.  For pre-2.0 controllers this means
we'll now send a command that will fail most of them time.  And for
all the cheap low-end consumer device I'm actually worried that they'll
get it wrong and break something.
Matias Bjørling Oct. 9, 2024, 1:19 p.m. UTC | #3
On 09-10-2024 09:46, Christoph Hellwig wrote:
> On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
>> However, the independent namespace data structure
>> is mandatory for devices that implement features from the 2.0+
>> specification. Therefore, we can check this data structure first. If
>> unavailable, retrieve the generic attributes from the NVM command set
>> identify namespace data structure.
> 
> I'm not a huge fan of this.  For pre-2.0 controllers this means
> we'll now send a command that will fail most of them time.  And for
> all the cheap low-end consumer device I'm actually worried that they'll
> get it wrong and break something.
> 

It's a good point. Damien, Keith, and I were debating it during ALPSS. 
They preferred the "send command and see if it fails" approach over 
writing specific conditions where it would apply. Note that Keith did 
suggest to avoid the command on 1.0 and 1.1 devices, and they were known 
to fail with unsupported CNS ids.

If making the check conditional, I think checking if the device follows 
2.0 specification isn't sufficient, as some devices may implement a 
subset of the 2.0 features (for example the independent ns data struct), 
while reporting as a 1.4 device.

Is there maybe better way, that isn't dependent on some feature being 
implemented (such as CRIMS capability)?
Matias Bjørling Oct. 9, 2024, 1:59 p.m. UTC | #4
On 09-10-2024 08:16, Hannes Reinecke wrote:
> On 10/8/24 16:55, Matias Bjørling wrote:
>> From: Matias Bjørling <matias.bjorling@wdc.com>
>>
>> The NVMe 2.0 specification adds an independent identify namespace
>> data structure that contains generic attributes that apply to all
>> namespace types. Some attributes carry over from the NVM command set
>> identify namespace data structure, and others are new.
>>
>> Currently, the data structure only considered when CRIMS is enabled or
>> when the namespace type is key-value.
>>
>> However, the independent namespace data structure
>> is mandatory for devices that implement features from the 2.0+
>> specification. Therefore, we can check this data structure first. If
>> unavailable, retrieve the generic attributes from the NVM command set
>> identify namespace data structure.
>>
>> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
>> ---
>>   drivers/nvme/host/core.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 0dc8bcc664f2..9cbef6342c39 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3999,7 +3999,7 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, 
>> unsigned nsid)
>>   {
>>       struct nvme_ns_info info = { .nsid = nsid };
>>       struct nvme_ns *ns;
>> -    int ret;
>> +    int ret = 1;
>>       if (nvme_identify_ns_descs(ctrl, &info))
>>           return;
>> @@ -4015,10 +4015,9 @@ static void nvme_scan_ns(struct nvme_ctrl 
>> *ctrl, unsigned nsid)
>>        * data structure to find all the generic information that is 
>> needed to
>>        * set up a namespace.  If not fall back to the legacy version.
>>        */
>> -    if ((ctrl->cap & NVME_CAP_CRMS_CRIMS) ||
>> -        (info.ids.csi != NVME_CSI_NVM && info.ids.csi != NVME_CSI_ZNS))
>> +    if (!nvme_ctrl_limited_cns(ctrl))
>>           ret = nvme_ns_info_from_id_cs_indep(ctrl, &info);
>> -    else
>> +    if (ret > 0)
>>           ret = nvme_ns_info_from_identify(ctrl, &info);
>>       if (info.is_removed)
> 
> That is a very odd coding. 'info' will only be filled out for a non-zero
> return value of nvme_ns_info_from_cs_indep().

I may have misunderstood. Only if nvme_ns_info_from_cs_indep() return 0 
will the information be filled. Otherwise, if it is an NVMe error, 
nvme_ns_info_from_identify() is tried, otherwise it's a hard error, and 
it errors out completely.

> So why not check for that?
> But if we get an NVME status back there is a fair chance that something 
> else than 'invalid field' (or whatever indicated that the command is not 
> supported). That then would cause the device to be misdetected without 
> the admin knowning.
> Shouldn't we add a message if we fall back to nvme_ns_info_from_identify()?

Hmm, we could. Buuuut, at this point, there's more devices falling back 
to nvme_ns_info_from_identify(), than devices that implements the 
independent ns identify data structure. So I wouldn't mind it being 
silent. If we want to debug a potential misdetection, tracing could be 
enabled to track down what's happening.

> 
> Cheers,
> 
> Hannes
Keith Busch Oct. 9, 2024, 2:56 p.m. UTC | #5
On Wed, Oct 09, 2024 at 09:46:11AM +0200, Christoph Hellwig wrote:
> On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
> > However, the independent namespace data structure
> > is mandatory for devices that implement features from the 2.0+
> > specification. Therefore, we can check this data structure first. If
> > unavailable, retrieve the generic attributes from the NVM command set
> > identify namespace data structure.
> 
> I'm not a huge fan of this.  For pre-2.0 controllers this means
> we'll now send a command that will fail most of them time.  And for
> all the cheap low-end consumer device I'm actually worried that they'll
> get it wrong and break something.

We already send identify commands that we expect may break on pre-2.0
controllers: the Identify NS Descriptor List.

We have other quirks for disabling specific identifications (ex:
nvme_ctrl_limited_cns, NVME_QUIRK_NO_NS_DESC_LIST) in case something
really break certain identifies. But I think anything >= 1.3 should be
fine: the CNS handling is well defined from that point onward, so we
shouldn't make anything harder than necessary from assuming someone got
identication this wrong.

The only pain I can think of is that some controllers increment their
error log count, and SMART tools creates unnecessary alerts for that.
Christoph Hellwig Oct. 10, 2024, 7:53 a.m. UTC | #6
On Wed, Oct 09, 2024 at 08:56:32AM -0600, Keith Busch wrote:
> On Wed, Oct 09, 2024 at 09:46:11AM +0200, Christoph Hellwig wrote:
> > On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
> > > However, the independent namespace data structure
> > > is mandatory for devices that implement features from the 2.0+
> > > specification. Therefore, we can check this data structure first. If
> > > unavailable, retrieve the generic attributes from the NVM command set
> > > identify namespace data structure.
> > 
> > I'm not a huge fan of this.  For pre-2.0 controllers this means
> > we'll now send a command that will fail most of them time.  And for
> > all the cheap low-end consumer device I'm actually worried that they'll
> > get it wrong and break something.
> 
> We already send identify commands that we expect may break on pre-2.0
> controllers: the Identify NS Descriptor List.

Identify NS Descriptor List is mandatory starting with NVMe 1.3.  We only
issue it for 1.3 or if the controller advertises supporting multiple
command sets.
Daniel Wagner Oct. 10, 2024, 2:47 p.m. UTC | #7
On Wed, Oct 09, 2024 at 03:19:59PM GMT, Matias Bjørling wrote:
> On 09-10-2024 09:46, Christoph Hellwig wrote:
> > On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
> > > However, the independent namespace data structure
> > > is mandatory for devices that implement features from the 2.0+
> > > specification. Therefore, we can check this data structure first. If
> > > unavailable, retrieve the generic attributes from the NVM command set
> > > identify namespace data structure.
> > 
> > I'm not a huge fan of this.  For pre-2.0 controllers this means
> > we'll now send a command that will fail most of them time.  And for
> > all the cheap low-end consumer device I'm actually worried that they'll
> > get it wrong and break something.
> > 
> 
> It's a good point. Damien, Keith, and I were debating it during ALPSS. They
> preferred the "send command and see if it fails" approach over writing
> specific conditions where it would apply. Note that Keith did suggest to
> avoid the command on 1.0 and 1.1 devices, and they were known to fail with
> unsupported CNS ids.

FWIW, there are some devices out there which will log these attempts and
spam their error logs. There were plenty of reports against nvme-cli
when nvme-cli issued a command which could fail.
Matias Bjørling Oct. 10, 2024, 3:02 p.m. UTC | #8
On 10-10-2024 16:47, Daniel Wagner wrote:
> On Wed, Oct 09, 2024 at 03:19:59PM GMT, Matias Bjørling wrote:
>> On 09-10-2024 09:46, Christoph Hellwig wrote:
>>> On Tue, Oct 08, 2024 at 04:55:02PM +0200, Matias Bjørling wrote:
>>>> However, the independent namespace data structure
>>>> is mandatory for devices that implement features from the 2.0+
>>>> specification. Therefore, we can check this data structure first. If
>>>> unavailable, retrieve the generic attributes from the NVM command set
>>>> identify namespace data structure.
>>>
>>> I'm not a huge fan of this.  For pre-2.0 controllers this means
>>> we'll now send a command that will fail most of them time.  And for
>>> all the cheap low-end consumer device I'm actually worried that they'll
>>> get it wrong and break something.
>>>
>>
>> It's a good point. Damien, Keith, and I were debating it during ALPSS. They
>> preferred the "send command and see if it fails" approach over writing
>> specific conditions where it would apply. Note that Keith did suggest to
>> avoid the command on 1.0 and 1.1 devices, and they were known to fail with
>> unsupported CNS ids.
> 
> FWIW, there are some devices out there which will log these attempts and
> spam their error logs. There were plenty of reports against nvme-cli
> when nvme-cli issued a command which could fail.

Got it. So, given the feedback from you, Keith, and Christoph. It's safe 
to say it needs to be a conditional check.

Would anyone object if the

  if ((ctrl->cap & NVME_CAP_CRMS_CRIMS) ||
      (info.ids.csi != NVME_CSI_NVM && info.ids.csi != NVME_CSI_ZNS))

statement would include a check for endurance group support?

The idea being that it's mandatory for a device to implement an 
endurance group in case it exposes the rotational flag. That check would 
limit the failed command check to relatively new devices.
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0dc8bcc664f2..9cbef6342c39 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3999,7 +3999,7 @@  static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns_info info = { .nsid = nsid };
 	struct nvme_ns *ns;
-	int ret;
+	int ret = 1;
 
 	if (nvme_identify_ns_descs(ctrl, &info))
 		return;
@@ -4015,10 +4015,9 @@  static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	 * data structure to find all the generic information that is needed to
 	 * set up a namespace.  If not fall back to the legacy version.
 	 */
-	if ((ctrl->cap & NVME_CAP_CRMS_CRIMS) ||
-	    (info.ids.csi != NVME_CSI_NVM && info.ids.csi != NVME_CSI_ZNS))
+	if (!nvme_ctrl_limited_cns(ctrl))
 		ret = nvme_ns_info_from_id_cs_indep(ctrl, &info);
-	else
+	if (ret > 0)
 		ret = nvme_ns_info_from_identify(ctrl, &info);
 
 	if (info.is_removed)