mbox series

[ndctl,v6,0/7] Support poison list retrieval

Message ID cover.1705534719.git.alison.schofield@intel.com
Headers show
Series Support poison list retrieval | expand

Message

Alison Schofield Jan. 18, 2024, 12:27 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Changes since v5:
- Use a private parser for cxl_poison events. (Dan)
  Previously used the default parser and re-parsed per the cxl-list
  needs. Replace that with a private parsing method for cxl_poison.
- Add a private context to support private parsers. 
- Add helpers to use with the cxl_poison parser.
- cxl list json: drop nr_records field (Dan)
- cxl list option: replace "poison" w "media-errors" (Dan)
- cxl list json: replace "poison" w "media_errors" (Dan)
- Link to v5: https://lore.kernel.org/linux-cxl/cover.1700615159.git.alison.schofield@intel.com/


Begin cover letter:

Add the option to add a memory devices poison list to the cxl-list
json output. Offer the option by memdev and by region. Sample usage:

# cxl list -m mem1 --media-errors
[
  {
    "memdev":"mem1",
    "pmem_size":1073741824,
    "ram_size":1073741824,
    "serial":1,
    "numa_node":1,
    "host":"cxl_mem.1",
    "media_errors":[
      {
        "dpa":0,
        "dpa_length":64,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "region":"region5",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035355553792,
    "size":2147483648,
    "type":"pmem",
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":[
      {
        "memdev":"mem1",
        "dpa":1073741824,
        "dpa_length":64,
        "hpa":1035355557888,
        "source":"Injected"
      },
      {
        "memdev":"mem1",
        "dpa":1073745920,
        "dpa_length":64,
        "hpa":1035355566080,
        "source":"Injected"
      }
    ]
  }
]

Alison Schofield (7):
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl: add an optional pid check to event parsing
  cxl/event_trace: add a private context for private parsers
  cxl/event_trace: add helpers get_field_[string|data]()
  cxl/list: collect and parse media_error records
  cxl/list: add --media-errors option to cxl list
  cxl/test: add cxl-poison.sh unit test

 Documentation/cxl/cxl-list.txt |  71 +++++++++++
 cxl/event_trace.c              |  53 +++++++-
 cxl/event_trace.h              |   9 +-
 cxl/filter.h                   |   3 +
 cxl/json.c                     | 218 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  47 +++++++
 cxl/lib/libcxl.sym             |   6 +
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   2 +
 test/cxl-poison.sh             | 133 ++++++++++++++++++++
 test/meson.build               |   2 +
 11 files changed, 543 insertions(+), 3 deletions(-)
 create mode 100644 test/cxl-poison.sh


base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c

Comments

Dan Williams Jan. 18, 2024, 9:56 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Changes since v5:
> - Use a private parser for cxl_poison events. (Dan)
>   Previously used the default parser and re-parsed per the cxl-list
>   needs. Replace that with a private parsing method for cxl_poison.
> - Add a private context to support private parsers. 
> - Add helpers to use with the cxl_poison parser.
> - cxl list json: drop nr_records field (Dan)
> - cxl list option: replace "poison" w "media-errors" (Dan)
> - cxl list json: replace "poison" w "media_errors" (Dan)
> - Link to v5: https://lore.kernel.org/linux-cxl/cover.1700615159.git.alison.schofield@intel.com/
> 
> 
> Begin cover letter:
> 
> Add the option to add a memory devices poison list to the cxl-list
> json output. Offer the option by memdev and by region. Sample usage:
> 
> # cxl list -m mem1 --media-errors
> [
>   {
>     "memdev":"mem1",
>     "pmem_size":1073741824,
>     "ram_size":1073741824,
>     "serial":1,
>     "numa_node":1,
>     "host":"cxl_mem.1",
>     "media_errors":[
>       {
>         "dpa":0,
>         "dpa_length":64,
>         "source":"Injected"
>       },
>       {
>         "region":"region5",

It feels odd to list the region here. I feel like what really matters is
to list the endpoint decoder and if someone wants to associate endpoint
decoder to region, or endpoint decoder to memdev there are other queries
for that.

Then this format does not change between the "region" listing and
"memdev" listing, they both just output the endpoint decoder and leave
the rest to follow-on queries.

For example I expect operations software has already recorded the
endpoint decoder to region mapping, so when this data comes in the
endpoint decoder is a key to make that association. Otherwise:

    cxl list -RT -e $endpoint_decoder

...can answer follow up questions about what is impacted by a given
media error record.

>         "dpa":1073741824,
>         "dpa_length":64,

The dpa_length is also the hpa_length, right? So maybe just call the
field "length".

>         "hpa":1035355557888,
>         "source":"Injected"
>       },
>       {
>         "region":"region5",
>         "dpa":1073745920,
>         "dpa_length":64,
>         "hpa":1035355566080,
>         "source":"Injected"

This "source" field feels like debug data. In production nobody is going
to be doing poison injection, and if the administrator injected it then
its implied they know that status. Otherwise a media-error is a
media-error regardless of the source.

>       }
>     ]
>   }
> ]
> 
> # cxl list -r region5 --media-errors
> [
>   {
>     "region":"region5",
>     "resource":1035355553792,
>     "size":2147483648,
>     "type":"pmem",
>     "interleave_ways":2,
>     "interleave_granularity":4096,
>     "decode_state":"commit",
>     "media_errors":[
>       {
>         "memdev":"mem1",
>         "dpa":1073741824,
>         "dpa_length":64,
>         "hpa":1035355557888,
>         "source":"Injected"
>       },
>       {
>         "memdev":"mem1",
>         "dpa":1073745920,
>         "dpa_length":64,
>         "hpa":1035355566080,
>         "source":"Injected"
>       }
>     ]
>   }
> ]
> 
> Alison Schofield (7):
>   libcxl: add interfaces for GET_POISON_LIST mailbox commands
>   cxl: add an optional pid check to event parsing
>   cxl/event_trace: add a private context for private parsers
>   cxl/event_trace: add helpers get_field_[string|data]()
>   cxl/list: collect and parse media_error records
>   cxl/list: add --media-errors option to cxl list
>   cxl/test: add cxl-poison.sh unit test
> 
>  Documentation/cxl/cxl-list.txt |  71 +++++++++++
>  cxl/event_trace.c              |  53 +++++++-
>  cxl/event_trace.h              |   9 +-
>  cxl/filter.h                   |   3 +
>  cxl/json.c                     | 218 +++++++++++++++++++++++++++++++++
>  cxl/lib/libcxl.c               |  47 +++++++
>  cxl/lib/libcxl.sym             |   6 +
>  cxl/libcxl.h                   |   2 +
>  cxl/list.c                     |   2 +
>  test/cxl-poison.sh             | 133 ++++++++++++++++++++
>  test/meson.build               |   2 +
>  11 files changed, 543 insertions(+), 3 deletions(-)
>  create mode 100644 test/cxl-poison.sh
> 
> 
> base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c
> -- 
> 2.37.3
> 
>
Alison Schofield Jan. 18, 2024, 11:34 p.m. UTC | #2
On Thu, Jan 18, 2024 at 01:56:51PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Changes since v5:
> > - Use a private parser for cxl_poison events. (Dan)
> >   Previously used the default parser and re-parsed per the cxl-list
> >   needs. Replace that with a private parsing method for cxl_poison.
> > - Add a private context to support private parsers. 
> > - Add helpers to use with the cxl_poison parser.
> > - cxl list json: drop nr_records field (Dan)
> > - cxl list option: replace "poison" w "media-errors" (Dan)
> > - cxl list json: replace "poison" w "media_errors" (Dan)
> > - Link to v5: https://lore.kernel.org/linux-cxl/cover.1700615159.git.alison.schofield@intel.com/
> > 
> > 
> > Begin cover letter:
> > 
> > Add the option to add a memory devices poison list to the cxl-list
> > json output. Offer the option by memdev and by region. Sample usage:
> > 
> > # cxl list -m mem1 --media-errors
> > [
> >   {
> >     "memdev":"mem1",
> >     "pmem_size":1073741824,
> >     "ram_size":1073741824,
> >     "serial":1,
> >     "numa_node":1,
> >     "host":"cxl_mem.1",
> >     "media_errors":[
> >       {
> >         "dpa":0,
> >         "dpa_length":64,
> >         "source":"Injected"
> >       },
> >       {
> >         "region":"region5",
> 
> It feels odd to list the region here. I feel like what really matters is
> to list the endpoint decoder and if someone wants to associate endpoint
> decoder to region, or endpoint decoder to memdev there are other queries
> for that.
> 
> Then this format does not change between the "region" listing and
> "memdev" listing, they both just output the endpoint decoder and leave
> the rest to follow-on queries.
> 
> For example I expect operations software has already recorded the
> endpoint decoder to region mapping, so when this data comes in the
> endpoint decoder is a key to make that association. Otherwise:
> 
>     cxl list -RT -e $endpoint_decoder
> 
> ...can answer follow up questions about what is impacted by a given
> media error record.

I see it as a convenience offering, but I'm starting to see that your
stance is probably that a cxl-list option should only list additional
info provided by the option, and not include info that can be retrieved
elsewhere w cxl-list.

I plan to make this change to endpoint as you suggest.

> 
> >         "dpa":1073741824,
> >         "dpa_length":64,
> 
> The dpa_length is also the hpa_length, right? So maybe just call the
> field "length".
> 

No, the length only refers to the device address space. I don't think
the hpa is guaranteed to be contiguous, so only the starting hpa addr
is offered.

hmm..should we call it 'size' because that seems to imply less
contiguous-ness than length?

Which should it be 'dpa_length' or 'size' (or 'length')

> >         "hpa":1035355557888,
> >         "source":"Injected"
> >       },
> >       {
> >         "region":"region5",
> >         "dpa":1073745920,
> >         "dpa_length":64,
> >         "hpa":1035355566080,
> >         "source":"Injected"
> 
> This "source" field feels like debug data. In production nobody is going
> to be doing poison injection, and if the administrator injected it then
> its implied they know that status. Otherwise a media-error is a
> media-error regardless of the source.

From CXL Spec Tabel 8-140 Sources can be: 

Unknown.
External. Poison received from a source external to the device.
Internal. The device generated poison from an internal source.
Injected. The error was injected into the device for testing purposes.
Vendor Specific.

On the v5 review, Erwin commented:
>> This is how I would use source.
>> "external" = don't expect to see a cxl media error, look elsewhere like a UCNA or a mem_data error in the RP's CXL.CM RAS.
>> "internal" = expect to see a media error for more information.
>> "injected" = somebody injected the error, no service action needed except to maybe tighten up your security.
>> "vendor" = see vendor

If it's not presented here, user can look it up in the cxl_poison trace
event directly.

I think we should keep this as is.

> 
> >       }
> >     ]
> >   }
> > ]
> > 
> > # cxl list -r region5 --media-errors
> > [
> >   {
> >     "region":"region5",
> >     "resource":1035355553792,
> >     "size":2147483648,
> >     "type":"pmem",
> >     "interleave_ways":2,
> >     "interleave_granularity":4096,
> >     "decode_state":"commit",
> >     "media_errors":[
> >       {
> >         "memdev":"mem1",
> >         "dpa":1073741824,
> >         "dpa_length":64,
> >         "hpa":1035355557888,
> >         "source":"Injected"
> >       },
> >       {
> >         "memdev":"mem1",
> >         "dpa":1073745920,
> >         "dpa_length":64,
> >         "hpa":1035355566080,
> >         "source":"Injected"
> >       }
> >     ]
> >   }
> > ]
> > 
> > Alison Schofield (7):
> >   libcxl: add interfaces for GET_POISON_LIST mailbox commands
> >   cxl: add an optional pid check to event parsing
> >   cxl/event_trace: add a private context for private parsers
> >   cxl/event_trace: add helpers get_field_[string|data]()
> >   cxl/list: collect and parse media_error records
> >   cxl/list: add --media-errors option to cxl list
> >   cxl/test: add cxl-poison.sh unit test
> > 
> >  Documentation/cxl/cxl-list.txt |  71 +++++++++++
> >  cxl/event_trace.c              |  53 +++++++-
> >  cxl/event_trace.h              |   9 +-
> >  cxl/filter.h                   |   3 +
> >  cxl/json.c                     | 218 +++++++++++++++++++++++++++++++++
> >  cxl/lib/libcxl.c               |  47 +++++++
> >  cxl/lib/libcxl.sym             |   6 +
> >  cxl/libcxl.h                   |   2 +
> >  cxl/list.c                     |   2 +
> >  test/cxl-poison.sh             | 133 ++++++++++++++++++++
> >  test/meson.build               |   2 +
> >  11 files changed, 543 insertions(+), 3 deletions(-)
> >  create mode 100644 test/cxl-poison.sh
> > 
> > 
> > base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c
> > -- 
> > 2.37.3
> > 
> > 
> 
>
Dan Williams Jan. 18, 2024, 11:55 p.m. UTC | #3
Alison Schofield wrote:
[..]
> > >         "dpa":1073741824,
> > >         "dpa_length":64,
> > 
> > The dpa_length is also the hpa_length, right? So maybe just call the
> > field "length".
> > 
> 
> No, the length only refers to the device address space. I don't think
> the hpa is guaranteed to be contiguous, so only the starting hpa addr
> is offered.
> 
> hmm..should we call it 'size' because that seems to imply less
> contiguous-ness than length?

The only way the length could be discontiguous in HPA space is if the
error length is greater than the interleave granularity. Given poison is
tracked in cachelines and the smallest granularity is 4 cachelines it is
unlikely to hit the mutiple HPA case.

However, I think the kernel side should aim to preclude that from
happening. Given that this is relying on the kernel's translation I
would make it so that the kernel never leaves the impacted HPAs as
ambiguous. For example, if the interleave_granularity of the region is
256 and the DPA length is 512, it would be helpful if the *kernel* split
that into multiple trace events to communicate the multiple impacted
HPAs rather than leave it as an exercise to userspace.

> Which should it be 'dpa_length' or 'size' (or 'length')

I recall we used "length" for the number of badblocks in "ndctl list
--media-errors", might as well keep in consistent.

> > >         "hpa":1035355557888,
> > >         "source":"Injected"
> > >       },
> > >       {
> > >         "region":"region5",
> > >         "dpa":1073745920,
> > >         "dpa_length":64,
> > >         "hpa":1035355566080,
> > >         "source":"Injected"
> > 
> > This "source" field feels like debug data. In production nobody is going
> > to be doing poison injection, and if the administrator injected it then
> > its implied they know that status. Otherwise a media-error is a
> > media-error regardless of the source.
> 
> From CXL Spec Tabel 8-140 Sources can be: 
> 
> Unknown.
> External. Poison received from a source external to the device.
> Internal. The device generated poison from an internal source.
> Injected. The error was injected into the device for testing purposes.
> Vendor Specific.
> 
> On the v5 review, Erwin commented:
> >> This is how I would use source.
> >> "external" = don't expect to see a cxl media error, look elsewhere like a UCNA or a mem_data error in the RP's CXL.CM RAS.
> >> "internal" = expect to see a media error for more information.
> >> "injected" = somebody injected the error, no service action needed except to maybe tighten up your security.
> >> "vendor" = see vendor
> 
> If it's not presented here, user can look it up in the cxl_poison trace
> event directly.
> 
> I think we should keep this as is.

Ah, I had forgotten Erwin's comment, yeah, showing "external" vs
"internal" looks useful, "injected" gets to come along for the ride, and
if any vendor actually ships that "vendor" status that's a good
indication to the end user to go shopping for a device that plays better
with open standards.

Might be useful to capture Erwin's analysis of how to use that field in
the man page, if it's not there already.
Alison Schofield Feb. 7, 2024, 10:54 p.m. UTC | #4
On Thu, Jan 18, 2024 at 03:55:00PM -0800, Dan Williams wrote:
> Alison Schofield wrote:
> [..]
> > > >         "dpa":1073741824,
> > > >         "dpa_length":64,
> > > 
> > > The dpa_length is also the hpa_length, right? So maybe just call the
> > > field "length".
> > > 
> > 
> > No, the length only refers to the device address space. I don't think
> > the hpa is guaranteed to be contiguous, so only the starting hpa addr
> > is offered.
> > 
> > hmm..should we call it 'size' because that seems to imply less
> > contiguous-ness than length?
> 
> The only way the length could be discontiguous in HPA space is if the
> error length is greater than the interleave granularity. Given poison is
> tracked in cachelines and the smallest granularity is 4 cachelines it is
> unlikely to hit the mutiple HPA case.

Hi Dan,

Circling back to this issue, as I'm posting an udpated rev.

I'm not getting how *only* an error length greater that IG can lead to
discontigous HPA. If the poison starts on the last 64 bytes of an IG and
has a length greater than 64 bytes, we go beyond the endpoints mapping,
even if that length is less than IG.

In the layout below, if the device underlying endpoint2 reports
^poison^ as shown, it is discontinguous in HPA space.

HPA 0..........................................................N
ep1 ..........          ..........          ..........    
ep2           ..........          ..........          ..........
bad                   ^poison^ 
good                  ^po          ison^

'bad' is what happens today if length is applied to HPA
'good' is what is right

Am I missing something wrt cachelines you mention?

> 
> However, I think the kernel side should aim to preclude that from
> happening. Given that this is relying on the kernel's translation I
> would make it so that the kernel never leaves the impacted HPAs as
> ambiguous. For example, if the interleave_granularity of the region is
> 256 and the DPA length is 512, it would be helpful if the *kernel* split
> that into multiple trace events to communicate the multiple impacted
> HPAs rather than leave it as an exercise to userspace.
> 

That's a familiar plan that we rejected in the driver implementation,
As defined, a cxl_poison event reports a starting dpa, a dpa_length,
and the starting hpa if the address is mapped. That left userspace to do
the HPA translation work.

We can move that work to the driver independent of this ndctl work.

> 
> Might be useful to capture Erwin's analysis of how to use that field in
> the man page, if it's not there already.

The man page now has the definitions of the source field and a spec
reference.  I don't see the cxl list man page as the place to offer
media-error trouble-shooting tips.