diff mbox series

[ndctl,v11,6/7] cxl/list: add --media-errors option to cxl list

Message ID a6933ba82755391284368e4527154341bc4fd75f.1710386468.git.alison.schofield@intel.com (mailing list archive)
State Superseded
Headers show
Series Support poison list retrieval | expand

Commit Message

Alison Schofield March 14, 2024, 4:05 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

The --media-errors option to 'cxl list' retrieves poison lists from
memory devices supporting the capability and displays the returned
media_error records in the cxl list json. This option can apply to
memdevs or regions.

Include media-errors in the -vvv verbose option.

Example usage in the Documentation/cxl/cxl-list.txt update.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++-
 cxl/filter.h                   |  3 ++
 cxl/list.c                     |  3 ++
 3 files changed, 67 insertions(+), 1 deletion(-)

Comments

Wonjae Lee March 15, 2024, 1:09 a.m. UTC | #1
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> The --media-errors option to 'cxl list' retrieves poison lists from
> memory devices supporting the capability and displays the returned
> media_error records in the cxl list json. This option can apply to
> memdevs or regions.
>
> Include media-errors in the -vvv verbose option.
>
> Example usage in the Documentation/cxl/cxl-list.txt update.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> cxl/filter.h                    3 ++
> cxl/list.c                      3 ++
> 3 files changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 838de4086678..6d3ef92c29e8 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt

[snip]

+----
+In the above example, region mappings can be found using:
+"cxl list -p mem9 --decoders"
+----

Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
cover letter, too.

Thanks,
Wonjae
Alison Schofield March 15, 2024, 2:36 a.m. UTC | #2
On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > The --media-errors option to 'cxl list' retrieves poison lists from
> > memory devices supporting the capability and displays the returned
> > media_error records in the cxl list json. This option can apply to
> > memdevs or regions.
> >
> > Include media-errors in the -vvv verbose option.
> >
> > Example usage in the Documentation/cxl/cxl-list.txt update.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > cxl/filter.h                    3 ++
> > cxl/list.c                      3 ++
> > 3 files changed, 67 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > index 838de4086678..6d3ef92c29e8 100644
> > --- a/Documentation/cxl/cxl-list.txt
> > +++ b/Documentation/cxl/cxl-list.txt
> 
> [snip]
> 
> +----
> +In the above example, region mappings can be found using:
> +"cxl list -p mem9 --decoders"
> +----
> 
> Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> cover letter, too.

Thanks for the review! I went with -p because it gives only
the endpoint decoder while -m gives all the decoders up to
the root - more than needed to discover the region.

Here are the 2 outputs - what do you think?

# cxl list -p mem9 --decoders -u
{
  "decoder":"decoder20.0",
  "resource":"0xf110000000",
  "size":"2.00 GiB (2.15 GB)",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "region":"region5",
  "dpa_resource":"0x40000000",
  "dpa_size":"1024.00 MiB (1073.74 MB)",
  "mode":"pmem"
}

# cxl list -m mem9 --decoders -u
[
  {
    "root decoders":[
      {
        "decoder":"decoder7.1",
        "resource":"0xf050000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "max_available_extent":"2.00 GiB (2.15 GB)",
        "volatile_capable":true,
        "qos_class":42,
        "nr_targets":2
      },
      {
        "decoder":"decoder7.3",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "max_available_extent":0,
        "pmem_capable":true,
        "qos_class":42,
        "nr_targets":2
      }
    ]
  },
  {
    "port decoders":[
      {
        "decoder":"decoder9.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":1,
        "region":"region5",
        "nr_targets":1
      },
      {
        "decoder":"decoder13.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":1,
        "region":"region5",
        "nr_targets":1
      }
    ]
  },
  {
    "endpoint decoders":[
      {
        "decoder":"decoder20.0",
        "resource":"0xf110000000",
        "size":"2.00 GiB (2.15 GB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "region":"region5",
        "dpa_resource":"0x40000000",
        "dpa_size":"1024.00 MiB (1073.74 MB)",
        "mode":"pmem"
      }
    ]
  }
]

> 
> Thanks,
> Wonjae
Dan Williams March 15, 2024, 3:35 a.m. UTC | #3
Alison Schofield wrote:
> On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > memory devices supporting the capability and displays the returned
> > > media_error records in the cxl list json. This option can apply to
> > > memdevs or regions.
> > >
> > > Include media-errors in the -vvv verbose option.
> > >
> > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > cxl/filter.h                    3 ++
> > > cxl/list.c                      3 ++
> > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > index 838de4086678..6d3ef92c29e8 100644
> > > --- a/Documentation/cxl/cxl-list.txt
> > > +++ b/Documentation/cxl/cxl-list.txt
> > 
> > [snip]
> > 
> > +----
> > +In the above example, region mappings can be found using:
> > +"cxl list -p mem9 --decoders"
> > +----
> > 
> > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > cover letter, too.
> 
> Thanks for the review! I went with -p because it gives only
> the endpoint decoder while -m gives all the decoders up to
> the root - more than needed to discover the region.

The first thing that comes to mind to list memory devices with their
decoders is:

    cxl list -MD -d endpoint

...however the problem is that endpoint ports connect memdevs to their
parent port, so the above results in:

  Warning: no matching devices found

I think I want to special case "-d endpoint" when both -M and -D are
specified to also imply -E, "endpoint ports". However that also seems to
have a bug at present:

# cxl list -EDM -d endpoint -iu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2
}

That needs to be fixed up to merge:

# cxl list -ED -d endpoint -iu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2,
  "decoders:endpoint2":[
    {
      "decoder":"decoder2.0",
      "interleave_ways":1,
      "state":"disabled"
    }
  ]
}

...and:

# cxl list -EMu
{
  "endpoint":"endpoint2",
  "host":"mem0",
  "parent_dport":"0000:34:00.0",
  "depth":2,
  "memdev":{
    "memdev":"mem0",
    "pmem_size":"512.00 MiB (536.87 MB)",
    "serial":"0",
    "host":"0000:35:00.0"
  }
}

...so that one can get a nice listing of just endpoint ports, their
decoders (with media errors) and their memdevs.

The reason that "cxl list -p mem9 -D" works is subtle because it filters
the endpoint decoders by an endpoint port filter, but I think most users
would expect to not need to enable endpoint-port listings to see their
decoders the natural key to filter endpoint decoders is by memdev.
Dave Jiang March 15, 2024, 4:41 p.m. UTC | #4
On 3/13/24 9:05 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The --media-errors option to 'cxl list' retrieves poison lists from
> memory devices supporting the capability and displays the returned
> media_error records in the cxl list json. This option can apply to
> memdevs or regions.
> 
> Include media-errors in the -vvv verbose option.
> 
> Example usage in the Documentation/cxl/cxl-list.txt update.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++-
>  cxl/filter.h                   |  3 ++
>  cxl/list.c                     |  3 ++
>  3 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 838de4086678..6d3ef92c29e8 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt
> @@ -415,6 +415,66 @@ OPTIONS
>  --region::
>  	Specify CXL region device name(s), or device id(s), to filter the listing.
>  
> +-L::
> +--media-errors::
> +	Include media-error information. The poison list is retrieved from the
> +	device(s) and media_error records are added to the listing. Apply this
> +	option to memdevs and regions where devices support the poison list
> +	capability. "offset:" is relative to the region resource when listing
> +	by region and is the absolute device DPA when listing by memdev.
> +	"source:" is one of: External, Internal, Injected, Vendor Specific,
> +	or Unknown, as defined in CXL Specification v3.1 Table 8-140.
> +
> +----
> +# cxl list -m mem9 --media-errors -u
> +{
> +  "memdev":"mem9",
> +  "pmem_size":"1024.00 MiB (1073.74 MB)",
> +  "pmem_qos_class":42,
> +  "ram_size":"1024.00 MiB (1073.74 MB)",
> +  "ram_qos_class":42,
> +  "serial":"0x5",
> +  "numa_node":1,
> +  "host":"cxl_mem.5",
> +  "media_errors":[
> +    {
> +      "offset":"0x40000000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +----
> +In the above example, region mappings can be found using:
> +"cxl list -p mem9 --decoders"
> +----
> +# cxl list -r region5 --media-errors -u
> +{
> +  "region":"region5",
> +  "resource":"0xf110000000",
> +  "size":"2.00 GiB (2.15 GB)",
> +  "type":"pmem",
> +  "interleave_ways":2,
> +  "interleave_granularity":4096,
> +  "decode_state":"commit",
> +  "media_errors":[
> +    {
> +      "offset":"0x1000",
> +      "length":64,
> +      "source":"Injected"
> +    },
> +    {
> +      "offset":"0x2000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +----
> +In the above example, memdev mappings can be found using:
> +"cxl list -r region5 --targets" and "cxl list -d <decoder_name>"
> +
> +
>  -v::
>  --verbose::
>  	Increase verbosity of the output. This can be specified
> @@ -431,7 +491,7 @@ OPTIONS
>  	  devices with --idle.
>  	- *-vvv*
>  	  Everything *-vv* provides, plus enable
> -	  --health and --partition.
> +	  --health, --partition, --media-errors.
>  
>  --debug::
>  	If the cxl tool was built with debug enabled, turn on debug
> diff --git a/cxl/filter.h b/cxl/filter.h
> index 3f65990f835a..956a46e0c7a9 100644
> --- a/cxl/filter.h
> +++ b/cxl/filter.h
> @@ -30,6 +30,7 @@ struct cxl_filter_params {
>  	bool fw;
>  	bool alert_config;
>  	bool dax;
> +	bool media_errors;
>  	int verbose;
>  	struct log_ctx ctx;
>  };
> @@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
>  		flags |= UTIL_JSON_ALERT_CONFIG;
>  	if (param->dax)
>  		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
> +	if (param->media_errors)
> +		flags |= UTIL_JSON_MEDIA_ERRORS;
>  	return flags;
>  }
>  
> diff --git a/cxl/list.c b/cxl/list.c
> index 93ba51ef895c..0b25d78248d5 100644
> --- a/cxl/list.c
> +++ b/cxl/list.c
> @@ -57,6 +57,8 @@ static const struct option options[] = {
>  		    "include memory device firmware information"),
>  	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
>  		    "include alert configuration information"),
> +	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
> +		    "include media-error information "),
>  	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
>  #ifdef ENABLE_DEBUG
>  	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
> @@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
>  		param.fw = true;
>  		param.alert_config = true;
>  		param.dax = true;
> +		param.media_errors = true;
>  		/* fallthrough */
>  	case 2:
>  		param.idle = true;
Alison Schofield March 20, 2024, 8:40 p.m. UTC | #5
On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > alison.schofield@intel.com wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > memory devices supporting the capability and displays the returned
> > > > media_error records in the cxl list json. This option can apply to
> > > > memdevs or regions.
> > > >
> > > > Include media-errors in the -vvv verbose option.
> > > >
> > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > cxl/filter.h                    3 ++
> > > > cxl/list.c                      3 ++
> > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > index 838de4086678..6d3ef92c29e8 100644
> > > > --- a/Documentation/cxl/cxl-list.txt
> > > > +++ b/Documentation/cxl/cxl-list.txt
> > > 
> > > [snip]
> > > 
> > > +----
> > > +In the above example, region mappings can be found using:
> > > +"cxl list -p mem9 --decoders"
> > > +----
> > > 
> > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > cover letter, too.
> > 
> > Thanks for the review! I went with -p because it gives only
> > the endpoint decoder while -m gives all the decoders up to
> > the root - more than needed to discover the region.
> 
> The first thing that comes to mind to list memory devices with their
> decoders is:
> 
>     cxl list -MD -d endpoint
> 
> ...however the problem is that endpoint ports connect memdevs to their
> parent port, so the above results in:
> 
>   Warning: no matching devices found
> 
> I think I want to special case "-d endpoint" when both -M and -D are
> specified to also imply -E, "endpoint ports". However that also seems to
> have a bug at present:
> 
> # cxl list -EDM -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2
> }
> 
> That needs to be fixed up to merge:
> 
> # cxl list -ED -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "decoders:endpoint2":[
>     {
>       "decoder":"decoder2.0",
>       "interleave_ways":1,
>       "state":"disabled"
>     }
>   ]
> }
> 
> ...and:
> 
> # cxl list -EMu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "memdev":{
>     "memdev":"mem0",
>     "pmem_size":"512.00 MiB (536.87 MB)",
>     "serial":"0",
>     "host":"0000:35:00.0"
>   }
> }
> 
> ...so that one can get a nice listing of just endpoint ports, their
> decoders (with media errors) and their memdevs.
> 
> The reason that "cxl list -p mem9 -D" works is subtle because it filters
> the endpoint decoders by an endpoint port filter, but I think most users
> would expect to not need to enable endpoint-port listings to see their
> decoders the natural key to filter endpoint decoders is by memdev.

Wonjae, Dan,

This feedback inspires me to seek more input from future users. This
tool should be adding a convenience and I don't want to proceed without
more user feedback confirming this implementation is more convenient
than the currently available method (trace & trigger). We also want to
avoid working with or around some awkward json output for eternity.

I'm following this response with a reply to the cover letter seeking
more inputs.

Thanks,
Alison
Alison Schofield March 27, 2024, 7:48 p.m. UTC | #6
On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > alison.schofield@intel.com wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > memory devices supporting the capability and displays the returned
> > > > media_error records in the cxl list json. This option can apply to
> > > > memdevs or regions.
> > > >
> > > > Include media-errors in the -vvv verbose option.
> > > >
> > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > cxl/filter.h                    3 ++
> > > > cxl/list.c                      3 ++
> > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > index 838de4086678..6d3ef92c29e8 100644
> > > > --- a/Documentation/cxl/cxl-list.txt
> > > > +++ b/Documentation/cxl/cxl-list.txt
> > > 
> > > [snip]
> > > 
> > > +----
> > > +In the above example, region mappings can be found using:
> > > +"cxl list -p mem9 --decoders"
> > > +----
> > > 
> > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > cover letter, too.
> > 
> > Thanks for the review! I went with -p because it gives only
> > the endpoint decoder while -m gives all the decoders up to
> > the root - more than needed to discover the region.
> 
> The first thing that comes to mind to list memory devices with their
> decoders is:
> 
>     cxl list -MD -d endpoint
> 
> ...however the problem is that endpoint ports connect memdevs to their
> parent port, so the above results in:
> 
>   Warning: no matching devices found
> 
> I think I want to special case "-d endpoint" when both -M and -D are
> specified to also imply -E, "endpoint ports". However that also seems to
> have a bug at present:
> 
> # cxl list -EDM -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2
> }
> 
> That needs to be fixed up to merge:

What's to fix up? Doesn't filtering by '-d endpoint' exclude the
objects you specified in -EDM.  It becomes the equivalent of
of 'cxl list -E'

> 
> # cxl list -ED -d endpoint -iu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "decoders:endpoint2":[
>     {
>       "decoder":"decoder2.0",
>       "interleave_ways":1,
>       "state":"disabled"
>     }
>   ]
> }
> 
> ...and:
> 
> # cxl list -EMu
> {
>   "endpoint":"endpoint2",
>   "host":"mem0",
>   "parent_dport":"0000:34:00.0",
>   "depth":2,
>   "memdev":{
>     "memdev":"mem0",
>     "pmem_size":"512.00 MiB (536.87 MB)",
>     "serial":"0",
>     "host":"0000:35:00.0"
>   }
> }
>

Some of the examples above that use "-d endpoint", filtering on endpoint
decoders, and so are, by design, excluding memdev info.  Filtering on
endpoint ports, ie -p endpoint, supports a listing of the endpoint
memdevs and decoders. 

> ...so that one can get a nice listing of just endpoint ports, their
> decoders (with media errors) and their memdevs.
> 

Dissecting the above sentence:
"of just endpoint ports"  --> -p endpoint
"their decoders" --> -DE
"their memdevs"  --> -M
"(with media errors)" --media-errors

Yields this query:
cxl list -p endpoint -DEM --media-errors

You wrote (with media errors) after 'decoders' and that is of concern,
but maybe just a typo?  ATM --media-errors applies to memdev or region
objects, not to decoder objects.

> The reason that "cxl list -p mem9 -D" works is subtle because it filters
> the endpoint decoders by an endpoint port filter, but I think most users
> would expect to not need to enable endpoint-port listings to see their
> decoders the natural key to filter endpoint decoders is by memdev.

Not following this subtle comment. I find it to be an exacting filter
targeting exactly a memdev that may be of interest and supplying
the decoder and region mappings. It would be best suggested in one
step, and that's is an update in the v12 man page:
cxl list -p mem9 -DEM --media-errors

I don't understand the desire to use endpoint decoders as a filter when
using endpoint ports which have memdevs and endpoint decoders as
children works, and flows with the whole top down cxl list filtering 
design. I also don't see a need to special case, and 'imply' endpoint
ports, when use can explicitly add -p endpoint to their query.
(the special case seems like it would add confusion to the cxl list
usage)

I'm following this w a v12 that does update the man page suggestions.
Let's continue this conversation there.

Thanks,
Alison
Alison Schofield April 18, 2024, 8:12 p.m. UTC | #7
Hi Dan,

Here's where I believe we last left off.

I thought we had closure on the json format of the media error records,
and on the fact that those objects are appended to memdev or region
objects.

The open is on how to use 'cxl list' to view the poison records.

Can we pick up that discussion below in this v11 thread?

The v12 that I refer to below is here:
https://lore.kernel.org/cover.1711519822.git.alison.schofield@intel.com/

-- Alison


On Wed, Mar 27, 2024 at 12:48:12PM -0700, Alison Schofield wrote:
> On Thu, Mar 14, 2024 at 08:35:01PM -0700, Dan Williams wrote:
> > Alison Schofield wrote:
> > > On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > > > alison.schofield@intel.com wrote:
> > > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > >
> > > > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > > > memory devices supporting the capability and displays the returned
> > > > > media_error records in the cxl list json. This option can apply to
> > > > > memdevs or regions.
> > > > >
> > > > > Include media-errors in the -vvv verbose option.
> > > > >
> > > > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > > > >
> > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > > ---
> > > > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > > > cxl/filter.h                    3 ++
> > > > > cxl/list.c                      3 ++
> > > > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > > > index 838de4086678..6d3ef92c29e8 100644
> > > > > --- a/Documentation/cxl/cxl-list.txt
> > > > > +++ b/Documentation/cxl/cxl-list.txt
> > > > 
> > > > [snip]
> > > > 
> > > > +----
> > > > +In the above example, region mappings can be found using:
> > > > +"cxl list -p mem9 --decoders"
> > > > +----
> > > > 
> > > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > > > cover letter, too.
> > > 
> > > Thanks for the review! I went with -p because it gives only
> > > the endpoint decoder while -m gives all the decoders up to
> > > the root - more than needed to discover the region.
> > 
> > The first thing that comes to mind to list memory devices with their
> > decoders is:
> > 
> >     cxl list -MD -d endpoint
> > 
> > ...however the problem is that endpoint ports connect memdevs to their
> > parent port, so the above results in:
> > 
> >   Warning: no matching devices found
> > 
> > I think I want to special case "-d endpoint" when both -M and -D are
> > specified to also imply -E, "endpoint ports". However that also seems to
> > have a bug at present:
> > 
> > # cxl list -EDM -d endpoint -iu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2
> > }
> > 
> > That needs to be fixed up to merge:
> 
> What's to fix up? Doesn't filtering by '-d endpoint' exclude the
> objects you specified in -EDM.  It becomes the equivalent of
> of 'cxl list -E'
> 
> > 
> > # cxl list -ED -d endpoint -iu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2,
> >   "decoders:endpoint2":[
> >     {
> >       "decoder":"decoder2.0",
> >       "interleave_ways":1,
> >       "state":"disabled"
> >     }
> >   ]
> > }
> > 
> > ...and:
> > 
> > # cxl list -EMu
> > {
> >   "endpoint":"endpoint2",
> >   "host":"mem0",
> >   "parent_dport":"0000:34:00.0",
> >   "depth":2,
> >   "memdev":{
> >     "memdev":"mem0",
> >     "pmem_size":"512.00 MiB (536.87 MB)",
> >     "serial":"0",
> >     "host":"0000:35:00.0"
> >   }
> > }
> >
> 
> Some of the examples above that use "-d endpoint", filtering on endpoint
> decoders, and so are, by design, excluding memdev info.  Filtering on
> endpoint ports, ie -p endpoint, supports a listing of the endpoint
> memdevs and decoders. 
> 
> > ...so that one can get a nice listing of just endpoint ports, their
> > decoders (with media errors) and their memdevs.
> > 
> 
> Dissecting the above sentence:
> "of just endpoint ports"  --> -p endpoint
> "their decoders" --> -DE
> "their memdevs"  --> -M
> "(with media errors)" --media-errors
> 
> Yields this query:
> cxl list -p endpoint -DEM --media-errors
> 
> You wrote (with media errors) after 'decoders' and that is of concern,
> but maybe just a typo?  ATM --media-errors applies to memdev or region
> objects, not to decoder objects.
> 
> > The reason that "cxl list -p mem9 -D" works is subtle because it filters
> > the endpoint decoders by an endpoint port filter, but I think most users
> > would expect to not need to enable endpoint-port listings to see their
> > decoders the natural key to filter endpoint decoders is by memdev.
> 
> Not following this subtle comment. I find it to be an exacting filter
> targeting exactly a memdev that may be of interest and supplying
> the decoder and region mappings. It would be best suggested in one
> step, and that's is an update in the v12 man page:
> cxl list -p mem9 -DEM --media-errors
> 
> I don't understand the desire to use endpoint decoders as a filter when
> using endpoint ports which have memdevs and endpoint decoders as
> children works, and flows with the whole top down cxl list filtering 
> design. I also don't see a need to special case, and 'imply' endpoint
> ports, when use can explicitly add -p endpoint to their query.
> (the special case seems like it would add confusion to the cxl list
> usage)
> 
> I'm following this w a v12 that does update the man page suggestions.
> Let's continue this conversation there.
> 
> Thanks,
> Alison
> 
> 
> 
> 
>
diff mbox series

Patch

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 838de4086678..6d3ef92c29e8 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -415,6 +415,66 @@  OPTIONS
 --region::
 	Specify CXL region device name(s), or device id(s), to filter the listing.
 
+-L::
+--media-errors::
+	Include media-error information. The poison list is retrieved from the
+	device(s) and media_error records are added to the listing. Apply this
+	option to memdevs and regions where devices support the poison list
+	capability. "offset:" is relative to the region resource when listing
+	by region and is the absolute device DPA when listing by memdev.
+	"source:" is one of: External, Internal, Injected, Vendor Specific,
+	or Unknown, as defined in CXL Specification v3.1 Table 8-140.
+
+----
+# cxl list -m mem9 --media-errors -u
+{
+  "memdev":"mem9",
+  "pmem_size":"1024.00 MiB (1073.74 MB)",
+  "pmem_qos_class":42,
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x5",
+  "numa_node":1,
+  "host":"cxl_mem.5",
+  "media_errors":[
+    {
+      "offset":"0x40000000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+----
+In the above example, region mappings can be found using:
+"cxl list -p mem9 --decoders"
+----
+# cxl list -r region5 --media-errors -u
+{
+  "region":"region5",
+  "resource":"0xf110000000",
+  "size":"2.00 GiB (2.15 GB)",
+  "type":"pmem",
+  "interleave_ways":2,
+  "interleave_granularity":4096,
+  "decode_state":"commit",
+  "media_errors":[
+    {
+      "offset":"0x1000",
+      "length":64,
+      "source":"Injected"
+    },
+    {
+      "offset":"0x2000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+----
+In the above example, memdev mappings can be found using:
+"cxl list -r region5 --targets" and "cxl list -d <decoder_name>"
+
+
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
@@ -431,7 +491,7 @@  OPTIONS
 	  devices with --idle.
 	- *-vvv*
 	  Everything *-vv* provides, plus enable
-	  --health and --partition.
+	  --health, --partition, --media-errors.
 
 --debug::
 	If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 3f65990f835a..956a46e0c7a9 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -30,6 +30,7 @@  struct cxl_filter_params {
 	bool fw;
 	bool alert_config;
 	bool dax;
+	bool media_errors;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -88,6 +89,8 @@  static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_ALERT_CONFIG;
 	if (param->dax)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
+	if (param->media_errors)
+		flags |= UTIL_JSON_MEDIA_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 93ba51ef895c..0b25d78248d5 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -57,6 +57,8 @@  static const struct option options[] = {
 		    "include memory device firmware information"),
 	OPT_BOOLEAN('A', "alert-config", &param.alert_config,
 		    "include alert configuration information"),
+	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
+		    "include media-error information "),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
@@ -121,6 +123,7 @@  int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
 		param.fw = true;
 		param.alert_config = true;
 		param.dax = true;
+		param.media_errors = true;
 		/* fallthrough */
 	case 2:
 		param.idle = true;