mbox series

[RFC,v2,0/9] cxl/pci: Add fundamental error handling

Message ID 166336972295.3803215.1047199449525031921.stgit@djiang5-desk3.ch.intel.com
Headers show
Series cxl/pci: Add fundamental error handling | expand

Message

Dave Jiang Sept. 16, 2022, 11:10 p.m. UTC
Series set to RFC since there's no means to test. Would like to get opinion
on whether going with using trace events as reporting mechanism is ok.

Jonathan,
We currently don't have any ways to test AER events. Do you have any plans
to support AER events via QEMU emulation?

v2:
- Convert error reporting via printk to trace events
- Drop ".rmap =" initialization (Jonathan)
- return PCI_ERS_RESULT_NEED_RESET for UE in pci_channel_io_normal (Shiju)

Add a 'struct pci_error_handlers' instance for the cxl_pci driver.
Section 8.2.4.16 "CXL RAS Capability Structure" of the CXL rev3.0
specification defines the error sources considered in this
implementation. The RAS Capability Structure defines protocol, link and
internal errors which are distinct from memory poison errors that are
conveyed via direct consumption and/or media scanning.

The errors reported by the RAS registers are categorized into
correctable and uncorrectable errors, where the uncorrectable errors are
optionally steered to either fatal or non-fatal AER events. Table 12-2 
"Device Specific Error Reporting and Nomenclature Guidelines" in the CXL
rev3.0 specification outlines that the remediation for uncorrectable errors
is a reset to recover. This matches how the Linux PCIe AER core treats
uncorrectable errors as occasions to reset the device to recover
operation.

While the specification notes "CXL Reset" or "Secondary Bus Reset" as
theoretical recovery options, they are not feasible in practice since
in-flight CXL.mem operations may not terminate and cause knock-on system
fatal events. Reset is only reliable for recovering CXL.io, it is not
reliable for recovering CXL.mem. Assuming the system survives, a reset
causes CXL.mem operation to restart from scratch.

The "ECN: Error Isolation on CXL.mem and CXL.cache" [1] document
recognizes the CXL Reset vs CXL.mem operational conflict and helps to at
least provide a mechanism for the Root Port to terminate in flight
CXL.mem operations with completions. That still poses problems in
practice if the kernel is running out of "System RAM" backed by the CXL
device and poison is used to convey the data lost to the protocol error.

Regardless of whether the reset and restart of CXL.mem operations is
feasible / successful, the logging is still useful. So, the
implementation reads, reports, and clears the status in the RAS
Capability Structure registers, and it notifies the 'struct cxl_memdev'
associated with the given PCIe endpoint to reattach to its driver over
the reset so that the HDM decoder configuration can be reconstructed.

The first half of the series reworks component register mapping so that
the cxl_pci driver can own the RAS Capability while the cxl_port driver
continues to own the HDM Decoder Capability. The last half implements
the RAS Capability Structure mapping and reporting via 'struct
pci_error_handlers'.

The reporting of error information is done through event tracing. A new
cxl_ras event is introduced to report the Uncorrectable and Correctable
errors raised by CXL. The expectation is a monitoring user daemon such as
"cxl monitor" will harvest those events and record them in a log in a
format (JSON) that's consumable by management applications..

[1]: https://www.computeexpresslink.org/spec-landing

---

Dan Williams (8):
      cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers
      cxl/pci: Cleanup cxl_map_device_regs()
      cxl/pci: Kill cxl_map_regs()
      cxl/core/regs: Make cxl_map_{component, device}_regs() device generic
      cxl/port: Limit the port driver to just the HDM Decoder Capability
      cxl/pci: Prepare for mapping RAS Capability Structure
      cxl/pci: Find and map the RAS Capability Structure
      cxl/pci: Add (hopeful) error handling support

Dave Jiang (1):
      cxl/pci: add tracepoint events for CXL RAS


 drivers/cxl/core/hdm.c         |  33 ++---
 drivers/cxl/core/memdev.c      |   1 +
 drivers/cxl/core/pci.c         |   3 +-
 drivers/cxl/core/port.c        |   2 +-
 drivers/cxl/core/regs.c        | 172 +++++++++++++++-----------
 drivers/cxl/cxl.h              |  39 ++++--
 drivers/cxl/cxlmem.h           |   2 +
 drivers/cxl/cxlpci.h           |   9 --
 drivers/cxl/pci.c              | 216 +++++++++++++++++++++++++++------
 include/trace/events/cxl_ras.h | 117 ++++++++++++++++++
 10 files changed, 445 insertions(+), 149 deletions(-)
 create mode 100644 include/trace/events/cxl_ras.h

--

Comments

Jonathan Cameron Oct. 11, 2022, 2:17 p.m. UTC | #1
On Fri, 16 Sep 2022 16:10:53 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Series set to RFC since there's no means to test. Would like to get opinion
> on whether going with using trace events as reporting mechanism is ok.
> 
> Jonathan,
> We currently don't have any ways to test AER events. Do you have any plans
> to support AER events via QEMU emulation?

Sorry - missed this entirely as gotten a bit behind reading CXL emails.

Hmm. AER brings a few complexities IIRC. Can be handled either via
native handling in the RCEC / RP, or via GHES records, GED etc.

I don't think it would be particularly hard to emulate either of them.
I have some old code for AER firmware first injection that I could
recycle for that side of things but that's probably less interesting
here than the native case.

I have a few other things to send out as WIP first, but can have
a mess around with AER paths after that.  There is some support
already in QEMU for generic AER error injection, but we will need
to build on top of that to get the rest of the status in place
before the error is generated.  See hw/pci/pcie_aer.c

Obviously if it's something you want to take a look at in QEMU that
would be great too!

Jonathan

> 
> v2:
> - Convert error reporting via printk to trace events
> - Drop ".rmap =" initialization (Jonathan)
> - return PCI_ERS_RESULT_NEED_RESET for UE in pci_channel_io_normal (Shiju)
> 
> Add a 'struct pci_error_handlers' instance for the cxl_pci driver.
> Section 8.2.4.16 "CXL RAS Capability Structure" of the CXL rev3.0
> specification defines the error sources considered in this
> implementation. The RAS Capability Structure defines protocol, link and
> internal errors which are distinct from memory poison errors that are
> conveyed via direct consumption and/or media scanning.
> 
> The errors reported by the RAS registers are categorized into
> correctable and uncorrectable errors, where the uncorrectable errors are
> optionally steered to either fatal or non-fatal AER events. Table 12-2 
> "Device Specific Error Reporting and Nomenclature Guidelines" in the CXL
> rev3.0 specification outlines that the remediation for uncorrectable errors
> is a reset to recover. This matches how the Linux PCIe AER core treats
> uncorrectable errors as occasions to reset the device to recover
> operation.
> 
> While the specification notes "CXL Reset" or "Secondary Bus Reset" as
> theoretical recovery options, they are not feasible in practice since
> in-flight CXL.mem operations may not terminate and cause knock-on system
> fatal events. Reset is only reliable for recovering CXL.io, it is not
> reliable for recovering CXL.mem. Assuming the system survives, a reset
> causes CXL.mem operation to restart from scratch.
> 
> The "ECN: Error Isolation on CXL.mem and CXL.cache" [1] document
> recognizes the CXL Reset vs CXL.mem operational conflict and helps to at
> least provide a mechanism for the Root Port to terminate in flight
> CXL.mem operations with completions. That still poses problems in
> practice if the kernel is running out of "System RAM" backed by the CXL
> device and poison is used to convey the data lost to the protocol error.
> 
> Regardless of whether the reset and restart of CXL.mem operations is
> feasible / successful, the logging is still useful. So, the
> implementation reads, reports, and clears the status in the RAS
> Capability Structure registers, and it notifies the 'struct cxl_memdev'
> associated with the given PCIe endpoint to reattach to its driver over
> the reset so that the HDM decoder configuration can be reconstructed.
> 
> The first half of the series reworks component register mapping so that
> the cxl_pci driver can own the RAS Capability while the cxl_port driver
> continues to own the HDM Decoder Capability. The last half implements
> the RAS Capability Structure mapping and reporting via 'struct
> pci_error_handlers'.
> 
> The reporting of error information is done through event tracing. A new
> cxl_ras event is introduced to report the Uncorrectable and Correctable
> errors raised by CXL. The expectation is a monitoring user daemon such as
> "cxl monitor" will harvest those events and record them in a log in a
> format (JSON) that's consumable by management applications..
> 
> [1]: https://www.computeexpresslink.org/spec-landing
> 
> ---
> 
> Dan Williams (8):
>       cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers
>       cxl/pci: Cleanup cxl_map_device_regs()
>       cxl/pci: Kill cxl_map_regs()
>       cxl/core/regs: Make cxl_map_{component, device}_regs() device generic
>       cxl/port: Limit the port driver to just the HDM Decoder Capability
>       cxl/pci: Prepare for mapping RAS Capability Structure
>       cxl/pci: Find and map the RAS Capability Structure
>       cxl/pci: Add (hopeful) error handling support
> 
> Dave Jiang (1):
>       cxl/pci: add tracepoint events for CXL RAS
> 
> 
>  drivers/cxl/core/hdm.c         |  33 ++---
>  drivers/cxl/core/memdev.c      |   1 +
>  drivers/cxl/core/pci.c         |   3 +-
>  drivers/cxl/core/port.c        |   2 +-
>  drivers/cxl/core/regs.c        | 172 +++++++++++++++-----------
>  drivers/cxl/cxl.h              |  39 ++++--
>  drivers/cxl/cxlmem.h           |   2 +
>  drivers/cxl/cxlpci.h           |   9 --
>  drivers/cxl/pci.c              | 216 +++++++++++++++++++++++++++------
>  include/trace/events/cxl_ras.h | 117 ++++++++++++++++++
>  10 files changed, 445 insertions(+), 149 deletions(-)
>  create mode 100644 include/trace/events/cxl_ras.h
> 
> --
>
Dave Jiang Oct. 11, 2022, 3:18 p.m. UTC | #2
On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
> On Fri, 16 Sep 2022 16:10:53 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> Series set to RFC since there's no means to test. Would like to get opinion
>> on whether going with using trace events as reporting mechanism is ok.
>>
>> Jonathan,
>> We currently don't have any ways to test AER events. Do you have any plans
>> to support AER events via QEMU emulation?
> Sorry - missed this entirely as gotten a bit behind reading CXL emails.
>
> Hmm. AER brings a few complexities IIRC. Can be handled either via
> native handling in the RCEC / RP, or via GHES records, GED etc.
>
> I don't think it would be particularly hard to emulate either of them.
> I have some old code for AER firmware first injection that I could
> recycle for that side of things but that's probably less interesting
> here than the native case.
>
> I have a few other things to send out as WIP first, but can have
> a mess around with AER paths after that.  There is some support
> already in QEMU for generic AER error injection, but we will need
> to build on top of that to get the rest of the status in place
> before the error is generated.  See hw/pci/pcie_aer.c
>
> Obviously if it's something you want to take a look at in QEMU that
> would be great too!

No worries. I know you are super busy. Before we go into injection, I 
think first step is having the CXL device advertise _OSC handover of AER 
handling? How complicated is it to have qemu advertise that?


>
> Jonathan
>
>> v2:
>> - Convert error reporting via printk to trace events
>> - Drop ".rmap =" initialization (Jonathan)
>> - return PCI_ERS_RESULT_NEED_RESET for UE in pci_channel_io_normal (Shiju)
>>
>> Add a 'struct pci_error_handlers' instance for the cxl_pci driver.
>> Section 8.2.4.16 "CXL RAS Capability Structure" of the CXL rev3.0
>> specification defines the error sources considered in this
>> implementation. The RAS Capability Structure defines protocol, link and
>> internal errors which are distinct from memory poison errors that are
>> conveyed via direct consumption and/or media scanning.
>>
>> The errors reported by the RAS registers are categorized into
>> correctable and uncorrectable errors, where the uncorrectable errors are
>> optionally steered to either fatal or non-fatal AER events. Table 12-2
>> "Device Specific Error Reporting and Nomenclature Guidelines" in the CXL
>> rev3.0 specification outlines that the remediation for uncorrectable errors
>> is a reset to recover. This matches how the Linux PCIe AER core treats
>> uncorrectable errors as occasions to reset the device to recover
>> operation.
>>
>> While the specification notes "CXL Reset" or "Secondary Bus Reset" as
>> theoretical recovery options, they are not feasible in practice since
>> in-flight CXL.mem operations may not terminate and cause knock-on system
>> fatal events. Reset is only reliable for recovering CXL.io, it is not
>> reliable for recovering CXL.mem. Assuming the system survives, a reset
>> causes CXL.mem operation to restart from scratch.
>>
>> The "ECN: Error Isolation on CXL.mem and CXL.cache" [1] document
>> recognizes the CXL Reset vs CXL.mem operational conflict and helps to at
>> least provide a mechanism for the Root Port to terminate in flight
>> CXL.mem operations with completions. That still poses problems in
>> practice if the kernel is running out of "System RAM" backed by the CXL
>> device and poison is used to convey the data lost to the protocol error.
>>
>> Regardless of whether the reset and restart of CXL.mem operations is
>> feasible / successful, the logging is still useful. So, the
>> implementation reads, reports, and clears the status in the RAS
>> Capability Structure registers, and it notifies the 'struct cxl_memdev'
>> associated with the given PCIe endpoint to reattach to its driver over
>> the reset so that the HDM decoder configuration can be reconstructed.
>>
>> The first half of the series reworks component register mapping so that
>> the cxl_pci driver can own the RAS Capability while the cxl_port driver
>> continues to own the HDM Decoder Capability. The last half implements
>> the RAS Capability Structure mapping and reporting via 'struct
>> pci_error_handlers'.
>>
>> The reporting of error information is done through event tracing. A new
>> cxl_ras event is introduced to report the Uncorrectable and Correctable
>> errors raised by CXL. The expectation is a monitoring user daemon such as
>> "cxl monitor" will harvest those events and record them in a log in a
>> format (JSON) that's consumable by management applications..
>>
>> [1]: https://www.computeexpresslink.org/spec-landing
>>
>> ---
>>
>> Dan Williams (8):
>>        cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers
>>        cxl/pci: Cleanup cxl_map_device_regs()
>>        cxl/pci: Kill cxl_map_regs()
>>        cxl/core/regs: Make cxl_map_{component, device}_regs() device generic
>>        cxl/port: Limit the port driver to just the HDM Decoder Capability
>>        cxl/pci: Prepare for mapping RAS Capability Structure
>>        cxl/pci: Find and map the RAS Capability Structure
>>        cxl/pci: Add (hopeful) error handling support
>>
>> Dave Jiang (1):
>>        cxl/pci: add tracepoint events for CXL RAS
>>
>>
>>   drivers/cxl/core/hdm.c         |  33 ++---
>>   drivers/cxl/core/memdev.c      |   1 +
>>   drivers/cxl/core/pci.c         |   3 +-
>>   drivers/cxl/core/port.c        |   2 +-
>>   drivers/cxl/core/regs.c        | 172 +++++++++++++++-----------
>>   drivers/cxl/cxl.h              |  39 ++++--
>>   drivers/cxl/cxlmem.h           |   2 +
>>   drivers/cxl/cxlpci.h           |   9 --
>>   drivers/cxl/pci.c              | 216 +++++++++++++++++++++++++++------
>>   include/trace/events/cxl_ras.h | 117 ++++++++++++++++++
>>   10 files changed, 445 insertions(+), 149 deletions(-)
>>   create mode 100644 include/trace/events/cxl_ras.h
>>
>> --
>>
Jonathan Cameron Oct. 11, 2022, 5:19 p.m. UTC | #3
On Tue, 11 Oct 2022 08:18:34 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
> > On Fri, 16 Sep 2022 16:10:53 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >  
> >> Series set to RFC since there's no means to test. Would like to get opinion
> >> on whether going with using trace events as reporting mechanism is ok.
> >>
> >> Jonathan,
> >> We currently don't have any ways to test AER events. Do you have any plans
> >> to support AER events via QEMU emulation?  
> > Sorry - missed this entirely as gotten a bit behind reading CXL emails.
> >
> > Hmm. AER brings a few complexities IIRC. Can be handled either via
> > native handling in the RCEC / RP, or via GHES records, GED etc.
> >
> > I don't think it would be particularly hard to emulate either of them.
> > I have some old code for AER firmware first injection that I could
> > recycle for that side of things but that's probably less interesting
> > here than the native case.
> >
> > I have a few other things to send out as WIP first, but can have
> > a mess around with AER paths after that.  There is some support
> > already in QEMU for generic AER error injection, but we will need
> > to build on top of that to get the rest of the status in place
> > before the error is generated.  See hw/pci/pcie_aer.c
> >
> > Obviously if it's something you want to take a look at in QEMU that
> > would be great too!  
> 
> No worries. I know you are super busy. Before we go into injection, I 
> think first step is having the CXL device advertise _OSC handover of AER 
> handling? How complicated is it to have qemu advertise that?
> 
Should be trivial. Actually if the comments are right we already have
it set to say the OS can take control - which is reasonable seeing as
we haven't implemented any firmware handling yet.

https://elixir.bootlin.com/qemu/latest/source/hw/acpi/cxl.c#L193

the AML in DSDT on my ARM test machine (happened to have that running)
looks plausible though my ability to parse AML is a bit limited!

Seems to mask with 0x1f which means the firmware accepts:
Native PCI express hotplug
SHPC Native hot plug control
PCI Express Native Power Management
PCI Express Advanced Error Reporting
PCI Express Capability Structure control
if the OS asks for them.

Reject OS request for higher bit controlled things like DPC.

Absolute minimum we'd then need would be AER capability for the EP.
cxl-rp needs support but I have a feeling we already got that by
inheriting from PCIE_ROOT_PORT :)
So may just need to hook up AER at the type 3 device and then figure out
the routing to fake the result of ERR_COR and the other messages.

Obviously we'd need to fill the rest of the error stuff in at
the type 3 device or (switch :) before "sending".

Jonathan

> 
> >
> > Jonathan
> >  
> >> v2:
> >> - Convert error reporting via printk to trace events
> >> - Drop ".rmap =" initialization (Jonathan)
> >> - return PCI_ERS_RESULT_NEED_RESET for UE in pci_channel_io_normal (Shiju)
> >>
> >> Add a 'struct pci_error_handlers' instance for the cxl_pci driver.
> >> Section 8.2.4.16 "CXL RAS Capability Structure" of the CXL rev3.0
> >> specification defines the error sources considered in this
> >> implementation. The RAS Capability Structure defines protocol, link and
> >> internal errors which are distinct from memory poison errors that are
> >> conveyed via direct consumption and/or media scanning.
> >>
> >> The errors reported by the RAS registers are categorized into
> >> correctable and uncorrectable errors, where the uncorrectable errors are
> >> optionally steered to either fatal or non-fatal AER events. Table 12-2
> >> "Device Specific Error Reporting and Nomenclature Guidelines" in the CXL
> >> rev3.0 specification outlines that the remediation for uncorrectable errors
> >> is a reset to recover. This matches how the Linux PCIe AER core treats
> >> uncorrectable errors as occasions to reset the device to recover
> >> operation.
> >>
> >> While the specification notes "CXL Reset" or "Secondary Bus Reset" as
> >> theoretical recovery options, they are not feasible in practice since
> >> in-flight CXL.mem operations may not terminate and cause knock-on system
> >> fatal events. Reset is only reliable for recovering CXL.io, it is not
> >> reliable for recovering CXL.mem. Assuming the system survives, a reset
> >> causes CXL.mem operation to restart from scratch.
> >>
> >> The "ECN: Error Isolation on CXL.mem and CXL.cache" [1] document
> >> recognizes the CXL Reset vs CXL.mem operational conflict and helps to at
> >> least provide a mechanism for the Root Port to terminate in flight
> >> CXL.mem operations with completions. That still poses problems in
> >> practice if the kernel is running out of "System RAM" backed by the CXL
> >> device and poison is used to convey the data lost to the protocol error.
> >>
> >> Regardless of whether the reset and restart of CXL.mem operations is
> >> feasible / successful, the logging is still useful. So, the
> >> implementation reads, reports, and clears the status in the RAS
> >> Capability Structure registers, and it notifies the 'struct cxl_memdev'
> >> associated with the given PCIe endpoint to reattach to its driver over
> >> the reset so that the HDM decoder configuration can be reconstructed.
> >>
> >> The first half of the series reworks component register mapping so that
> >> the cxl_pci driver can own the RAS Capability while the cxl_port driver
> >> continues to own the HDM Decoder Capability. The last half implements
> >> the RAS Capability Structure mapping and reporting via 'struct
> >> pci_error_handlers'.
> >>
> >> The reporting of error information is done through event tracing. A new
> >> cxl_ras event is introduced to report the Uncorrectable and Correctable
> >> errors raised by CXL. The expectation is a monitoring user daemon such as
> >> "cxl monitor" will harvest those events and record them in a log in a
> >> format (JSON) that's consumable by management applications..
> >>
> >> [1]: https://www.computeexpresslink.org/spec-landing
> >>
> >> ---
> >>
> >> Dan Williams (8):
> >>        cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers
> >>        cxl/pci: Cleanup cxl_map_device_regs()
> >>        cxl/pci: Kill cxl_map_regs()
> >>        cxl/core/regs: Make cxl_map_{component, device}_regs() device generic
> >>        cxl/port: Limit the port driver to just the HDM Decoder Capability
> >>        cxl/pci: Prepare for mapping RAS Capability Structure
> >>        cxl/pci: Find and map the RAS Capability Structure
> >>        cxl/pci: Add (hopeful) error handling support
> >>
> >> Dave Jiang (1):
> >>        cxl/pci: add tracepoint events for CXL RAS
> >>
> >>
> >>   drivers/cxl/core/hdm.c         |  33 ++---
> >>   drivers/cxl/core/memdev.c      |   1 +
> >>   drivers/cxl/core/pci.c         |   3 +-
> >>   drivers/cxl/core/port.c        |   2 +-
> >>   drivers/cxl/core/regs.c        | 172 +++++++++++++++-----------
> >>   drivers/cxl/cxl.h              |  39 ++++--
> >>   drivers/cxl/cxlmem.h           |   2 +
> >>   drivers/cxl/cxlpci.h           |   9 --
> >>   drivers/cxl/pci.c              | 216 +++++++++++++++++++++++++++------
> >>   include/trace/events/cxl_ras.h | 117 ++++++++++++++++++
> >>   10 files changed, 445 insertions(+), 149 deletions(-)
> >>   create mode 100644 include/trace/events/cxl_ras.h
> >>
> >> --
> >>  
>
Jonathan Cameron Oct. 19, 2022, 5:30 p.m. UTC | #4
On Tue, 11 Oct 2022 18:19:15 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Tue, 11 Oct 2022 08:18:34 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > On 10/11/2022 7:17 AM, Jonathan Cameron wrote:  
> > > On Fri, 16 Sep 2022 16:10:53 -0700
> > > Dave Jiang <dave.jiang@intel.com> wrote:
> > >    
> > >> Series set to RFC since there's no means to test. Would like to get opinion
> > >> on whether going with using trace events as reporting mechanism is ok.
> > >>
> > >> Jonathan,
> > >> We currently don't have any ways to test AER events. Do you have any plans
> > >> to support AER events via QEMU emulation?    
> > > Sorry - missed this entirely as gotten a bit behind reading CXL emails.

Hi Dave,

Quick update.

Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
figuring out why I wasn't getting messages past the upstream switch port.
Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
that patch isn't upstream yet.
Also QEMU AER rooting seems to be based on some older PCIE spec
so needed some tweaks to get the device to actually issue ERR_FATAL etc.

Anyhow, should have something you can play with in a day or two.
In meantime an example dump (not writing the header log yet!)

pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
cxl_port endpoint6: No CMA mailbox
cxl_pci 0000:0f:00.0: mem3: error resume successful
pcieport 0000:0e:00.0: AER: device recovery successful

Jonathan
Dave Jiang Oct. 19, 2022, 5:38 p.m. UTC | #5
On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
> On Tue, 11 Oct 2022 18:19:15 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>
>> On Tue, 11 Oct 2022 08:18:34 -0700
>> Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
>>>> On Fri, 16 Sep 2022 16:10:53 -0700
>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>     
>>>>> Series set to RFC since there's no means to test. Would like to get opinion
>>>>> on whether going with using trace events as reporting mechanism is ok.
>>>>>
>>>>> Jonathan,
>>>>> We currently don't have any ways to test AER events. Do you have any plans
>>>>> to support AER events via QEMU emulation?
>>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.
> Hi Dave,
>
> Quick update.
>
> Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> figuring out why I wasn't getting messages past the upstream switch port.
> Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> that patch isn't upstream yet.
> Also QEMU AER rooting seems to be based on some older PCIE spec
> so needed some tweaks to get the device to actually issue ERR_FATAL etc.
>
> Anyhow, should have something you can play with in a day or two.

Awesome! Thanks! :)


> In meantime an example dump (not writing the header log yet!)
>
> pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
> cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
> cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
> cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
> cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
> cxl_port endpoint6: No CMA mailbox
> cxl_pci 0000:0f:00.0: mem3: error resume successful
> pcieport 0000:0e:00.0: AER: device recovery successful
>
> Jonathan
Jonathan Cameron Oct. 24, 2022, 4:01 p.m. UTC | #6
On Wed, 19 Oct 2022 10:38:13 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
> > On Tue, 11 Oct 2022 18:19:15 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >  
> >> On Tue, 11 Oct 2022 08:18:34 -0700
> >> Dave Jiang <dave.jiang@intel.com> wrote:
> >>  
> >>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:  
> >>>> On Fri, 16 Sep 2022 16:10:53 -0700
> >>>> Dave Jiang <dave.jiang@intel.com> wrote:
> >>>>       
> >>>>> Series set to RFC since there's no means to test. Would like to get opinion
> >>>>> on whether going with using trace events as reporting mechanism is ok.
> >>>>>
> >>>>> Jonathan,
> >>>>> We currently don't have any ways to test AER events. Do you have any plans
> >>>>> to support AER events via QEMU emulation?  
> >>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.  
> > Hi Dave,
> >
> > Quick update.
> >
> > Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> > figuring out why I wasn't getting messages past the upstream switch port.
> > Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> > that patch isn't upstream yet.
> > Also QEMU AER rooting seems to be based on some older PCIE spec
> > so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> >
> > Anyhow, should have something you can play with in a day or two.  
> 
> Awesome! Thanks! :)

Took a little longer than expected..

Anyhow, now at
https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24

That tree is carrying far too many things right now for it make much sense
to me to email this to qemu-devel - though I may pull
hw/pci/aer: Add missing routing for AER errors
out in advance as that's closing a spec different between QEMU emulation of AER
and what the PCI spec says.

Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
patches have been on list for a week or so.

Top patch includes a very short 'how to' in patch description.  Basically fire
up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
qemu commandline and use commands like:

{ "execute": "qmp_capabilities" }
...
{ "execute": "cxl-inject-uncorrectable-error",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
        "type": "cache-address-parity",
        "header": [ 3, 4]
    } }
...
{ "execute": "cxl-inject-correctable-error",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
        "type": "physical",
        "header": [ 3, 4]
    } }



> 
> 
> > In meantime an example dump (not writing the header log yet!)
> >
> > pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
> > cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
> > cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
> > cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
> > cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
> > cxl_port endpoint6: No CMA mailbox
> > cxl_pci 0000:0f:00.0: mem3: error resume successful
> > pcieport 0000:0e:00.0: AER: device recovery successful
> >
> > Jonathan  
>
Dave Jiang Oct. 25, 2022, 3:22 p.m. UTC | #7
On 10/24/2022 9:01 AM, Jonathan Cameron wrote:
> On Wed, 19 Oct 2022 10:38:13 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
>>> On Tue, 11 Oct 2022 18:19:15 +0100
>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>   
>>>> On Tue, 11 Oct 2022 08:18:34 -0700
>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>   
>>>>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
>>>>>> On Fri, 16 Sep 2022 16:10:53 -0700
>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>>        
>>>>>>> Series set to RFC since there's no means to test. Would like to get opinion
>>>>>>> on whether going with using trace events as reporting mechanism is ok.
>>>>>>>
>>>>>>> Jonathan,
>>>>>>> We currently don't have any ways to test AER events. Do you have any plans
>>>>>>> to support AER events via QEMU emulation?
>>>>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.
>>> Hi Dave,
>>>
>>> Quick update.
>>>
>>> Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
>>> figuring out why I wasn't getting messages past the upstream switch port.
>>> Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
>>> that patch isn't upstream yet.
>>> Also QEMU AER rooting seems to be based on some older PCIE spec
>>> so needed some tweaks to get the device to actually issue ERR_FATAL etc.
>>>
>>> Anyhow, should have something you can play with in a day or two.
>> Awesome! Thanks! :)
> Took a little longer than expected..
>
> Anyhow, now at
> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24

Thank you! I'll try this out as soon as I get a chance.

>
> That tree is carrying far too many things right now for it make much sense
> to me to email this to qemu-devel - though I may pull
> hw/pci/aer: Add missing routing for AER errors
> out in advance as that's closing a spec different between QEMU emulation of AER
> and what the PCI spec says.
>
> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
> patches have been on list for a week or so.
>
> Top patch includes a very short 'how to' in patch description.  Basically fire
> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
> qemu commandline and use commands like:
>
> { "execute": "qmp_capabilities" }
> ...
> { "execute": "cxl-inject-uncorrectable-error",
>      "arguments": {
>          "path": "/machine/peripheral/cxl-pmem0",
>          "type": "cache-address-parity",
>          "header": [ 3, 4]
>      } }
> ...
> { "execute": "cxl-inject-correctable-error",
>      "arguments": {
>          "path": "/machine/peripheral/cxl-pmem0",
>          "type": "physical",
>          "header": [ 3, 4]
>      } }
>
>
>
>>
>>> In meantime an example dump (not writing the header log yet!)
>>>
>>> pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
>>> cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
>>> cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
>>> cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
>>> cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
>>> cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
>>> cxl_port endpoint6: No CMA mailbox
>>> cxl_pci 0000:0f:00.0: mem3: error resume successful
>>> pcieport 0000:0e:00.0: AER: device recovery successful
>>>
>>> Jonathan
Jonathan Cameron Nov. 3, 2022, 12:58 p.m. UTC | #8
On Mon, 24 Oct 2022 17:01:02 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 19 Oct 2022 10:38:13 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > On 10/19/2022 10:30 AM, Jonathan Cameron wrote:  
> > > On Tue, 11 Oct 2022 18:19:15 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >    
> > >> On Tue, 11 Oct 2022 08:18:34 -0700
> > >> Dave Jiang <dave.jiang@intel.com> wrote:
> > >>    
> > >>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:    
> > >>>> On Fri, 16 Sep 2022 16:10:53 -0700
> > >>>> Dave Jiang <dave.jiang@intel.com> wrote:
> > >>>>         
> > >>>>> Series set to RFC since there's no means to test. Would like to get opinion
> > >>>>> on whether going with using trace events as reporting mechanism is ok.
> > >>>>>
> > >>>>> Jonathan,
> > >>>>> We currently don't have any ways to test AER events. Do you have any plans
> > >>>>> to support AER events via QEMU emulation?    
> > >>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.    
> > > Hi Dave,
> > >
> > > Quick update.
> > >
> > > Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> > > figuring out why I wasn't getting messages past the upstream switch port.
> > > Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> > > that patch isn't upstream yet.
> > > Also QEMU AER rooting seems to be based on some older PCIE spec
> > > so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> > >
> > > Anyhow, should have something you can play with in a day or two.    
> > 
> > Awesome! Thanks! :)  
> 
> Took a little longer than expected..
> 
> Anyhow, now at
> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
> 
> That tree is carrying far too many things right now for it make much sense
> to me to email this to qemu-devel - though I may pull
> hw/pci/aer: Add missing routing for AER errors
> out in advance as that's closing a spec different between QEMU emulation of AER
> and what the PCI spec says.
> 
> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
> patches have been on list for a week or so.
> 
> Top patch includes a very short 'how to' in patch description.  Basically fire
> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
> qemu commandline and use commands like:
> 
> { "execute": "qmp_capabilities" }
> ...
> { "execute": "cxl-inject-uncorrectable-error",
>     "arguments": {
>         "path": "/machine/peripheral/cxl-pmem0",
>         "type": "cache-address-parity",
>         "header": [ 3, 4]
>     } }
> ...
> { "execute": "cxl-inject-correctable-error",
>     "arguments": {
>         "path": "/machine/peripheral/cxl-pmem0",
>         "type": "physical",
>         "header": [ 3, 4]
>     } }
> 

So Dave reported that this wasn't working on x86 qemu machines.

A fun bit of debugging later (I hate AML) and I think I have find the issue +
have a hack to workaround it for now.

So need some background.
1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
   bit of handling to create appropriate ACPI DSDT magic.
2) The CXL root port is based on pcie_root_port.c 
3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
   for their signaling.
4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
   actual interrupt on line 23 for my particular configuration
5) The ACPI table says it's on line 11.
6) x86 code for creating the PRT has an informative comment...
https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
 * The main goal is to equaly distribute the interrupts
 * over the 4 existing ACPI links (works only for i440fx).

So the hack I'm running is below (note the UID thing is a separate bug that stops
iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
for that shortly).

There are a bunch of possible approaches to fix this if my identification of
the issue is correct.

1) Clean equivalent of this hack that runs on appropriate machines only.
2) Use MSI instead. (ioh3420 root port takes this approach I think).

From 286c8f9b6d229d9e71f64657b6b3ccb70cb98306 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Thu, 3 Nov 2022 12:20:25 +0000
Subject: [PATCH] HACK: Fix-up interrupt routing for CXL on q35.

I need to do some more thinking to figure out correct approach
to solve this problem.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/i386/acpi-build.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4f54b61904..8055253e68 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -746,7 +746,7 @@ static Aml *build_prt(bool is_pci0_prt)
                       lnk_idx));
 
         /* route[2] = "LNK[D|A|B|C]", selection based on pin % 3  */
-        aml_append(while_ctx, initialize_route(route, "LNKD", lnk_idx, 0));
+        aml_append(while_ctx, initialize_route(route, "GSIH", lnk_idx, 0));
         if (is_pci0_prt) {
             Aml *if_device_1, *if_pin_4, *else_pin_4;
 
@@ -762,16 +762,16 @@ static Aml *build_prt(bool is_pci0_prt)
                 else_pin_4 = aml_else();
                 {
                     aml_append(else_pin_4,
-                        aml_store(build_prt_entry("LNKA"), route));
+                        aml_store(build_prt_entry("GSIE"), route));
                 }
                 aml_append(if_device_1, else_pin_4);
             }
             aml_append(while_ctx, if_device_1);
         } else {
-            aml_append(while_ctx, initialize_route(route, "LNKA", lnk_idx, 1));
+            aml_append(while_ctx, initialize_route(route, "GSIE", lnk_idx, 1));
         }
-        aml_append(while_ctx, initialize_route(route, "LNKB", lnk_idx, 2));
-        aml_append(while_ctx, initialize_route(route, "LNKC", lnk_idx, 3));
+        aml_append(while_ctx, initialize_route(route, "GSIF", lnk_idx, 2));
+        aml_append(while_ctx, initialize_route(route, "GSIG", lnk_idx, 3));
 
         /* route[0] = 0x[slot]FFFF */
         aml_append(while_ctx,
@@ -1627,7 +1627,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
                 aml_append(pkg, aml_eisaid("PNP0A03"));
                 aml_append(dev, aml_name_decl("_CID", pkg));
                 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-                aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+//                aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
                 build_cxl_osc_method(dev);
             } else if (pci_bus_is_express(bus)) {
                 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
Jonathan Cameron Nov. 3, 2022, 1:27 p.m. UTC | #9
On Thu, 3 Nov 2022 12:58:51 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 24 Oct 2022 17:01:02 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Wed, 19 Oct 2022 10:38:13 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >   
> > > On 10/19/2022 10:30 AM, Jonathan Cameron wrote:    
> > > > On Tue, 11 Oct 2022 18:19:15 +0100
> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > > >      
> > > >> On Tue, 11 Oct 2022 08:18:34 -0700
> > > >> Dave Jiang <dave.jiang@intel.com> wrote:
> > > >>      
> > > >>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:      
> > > >>>> On Fri, 16 Sep 2022 16:10:53 -0700
> > > >>>> Dave Jiang <dave.jiang@intel.com> wrote:
> > > >>>>           
> > > >>>>> Series set to RFC since there's no means to test. Would like to get opinion
> > > >>>>> on whether going with using trace events as reporting mechanism is ok.
> > > >>>>>
> > > >>>>> Jonathan,
> > > >>>>> We currently don't have any ways to test AER events. Do you have any plans
> > > >>>>> to support AER events via QEMU emulation?      
> > > >>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.      
> > > > Hi Dave,
> > > >
> > > > Quick update.
> > > >
> > > > Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> > > > figuring out why I wasn't getting messages past the upstream switch port.
> > > > Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> > > > that patch isn't upstream yet.
> > > > Also QEMU AER rooting seems to be based on some older PCIE spec
> > > > so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> > > >
> > > > Anyhow, should have something you can play with in a day or two.      
> > > 
> > > Awesome! Thanks! :)    
> > 
> > Took a little longer than expected..
> > 
> > Anyhow, now at
> > https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
> > 
> > That tree is carrying far too many things right now for it make much sense
> > to me to email this to qemu-devel - though I may pull
> > hw/pci/aer: Add missing routing for AER errors
> > out in advance as that's closing a spec different between QEMU emulation of AER
> > and what the PCI spec says.
> > 
> > Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
> > patches have been on list for a week or so.
> > 
> > Top patch includes a very short 'how to' in patch description.  Basically fire
> > up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
> > qemu commandline and use commands like:
> > 
> > { "execute": "qmp_capabilities" }
> > ...
> > { "execute": "cxl-inject-uncorrectable-error",
> >     "arguments": {
> >         "path": "/machine/peripheral/cxl-pmem0",
> >         "type": "cache-address-parity",
> >         "header": [ 3, 4]
> >     } }
> > ...
> > { "execute": "cxl-inject-correctable-error",
> >     "arguments": {
> >         "path": "/machine/peripheral/cxl-pmem0",
> >         "type": "physical",
> >         "header": [ 3, 4]
> >     } }
> >   
> 
> So Dave reported that this wasn't working on x86 qemu machines.
> 
> A fun bit of debugging later (I hate AML) and I think I have find the issue +
> have a hack to workaround it for now.
> 
> So need some background.
> 1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
>    bit of handling to create appropriate ACPI DSDT magic.
> 2) The CXL root port is based on pcie_root_port.c 
> 3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
>    for their signaling.
> 4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
>    actual interrupt on line 23 for my particular configuration
> 5) The ACPI table says it's on line 11.
> 6) x86 code for creating the PRT has an informative comment...
> https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
>  * The main goal is to equaly distribute the interrupts
>  * over the 4 existing ACPI links (works only for i440fx).
> 
> So the hack I'm running is below (note the UID thing is a separate bug that stops
> iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
> for that shortly).
> 
> There are a bunch of possible approaches to fix this if my identification of
> the issue is correct.
> 
> 1) Clean equivalent of this hack that runs on appropriate machines only.
> 2) Use MSI instead. (ioh3420 root port takes this approach I think).

This turned out to be trivial case of cut and pate.  So MSI support it is (mostly because it doesn't
involve fixing AML generation :)

Very lightly tested on one config of x86 machine and one of arm64.
I've not thought about this at all yet, so it's a direct copy of the ioh3420 with 
appropriate renames for it being in the cxl_rp code.


From a5e85d90cb734fc1de81e0d99e443747348e65e7 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Thu, 3 Nov 2022 13:10:24 +0000
Subject: [PATCH] hw: pci-bridge: cxl_root_port: Write up MSI

Done to avoid fixing ACPI rout description of traditional PCI interrupts on q35.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/pci-bridge/cxl_root_port.c | 63 +++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
index 1a9363a6de..a7475c427e 100644
--- a/hw/pci-bridge/cxl_root_port.c
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -22,6 +22,7 @@
 #include "qemu/range.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_port.h"
+#include "hw/pci/msi.h"
 #include "hw/qdev-properties.h"
 #include "hw/sysbus.h"
 #include "qapi/error.h"
@@ -29,6 +30,10 @@
 
 #define CXL_ROOT_PORT_DID 0x7075
 
+#define CXL_RP_MSI_OFFSET               0x60
+#define CXL_RP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
+#define CXL_RP_MSI_NR_VECTOR            2
+
 /* Copied from the gen root port which we derive */
 #define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
 #define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
@@ -36,6 +41,7 @@
 #define CXL_ROOT_PORT_DVSEC_OFFSET \
     (GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
 
+
 typedef struct CXLRootPort {
     /*< private >*/
     PCIESlot parent_obj;
@@ -47,6 +53,50 @@ typedef struct CXLRootPort {
 #define TYPE_CXL_ROOT_PORT "cxl-rp"
 DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
 
+/*
+ * If two MSI vector are allocated, Advanced Error Interrupt Message Number
+ * is 1. otherwise 0.
+ * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
+ */
+static uint8_t cxl_rp_aer_vector(const PCIDevice *d)
+{
+    switch (msi_nr_vectors_allocated(d)) {
+    case 1:
+        return 0;
+    case 2:
+        return 1;
+    case 4:
+    case 8:
+    case 16:
+    case 32:
+    default:
+        break;
+    }
+    abort();
+    return 0;
+}
+
+static int cxl_rp_interrupts_init(PCIDevice *d, Error **errp)
+{
+    int rc;
+
+    rc = msi_init(d, CXL_RP_MSI_OFFSET, CXL_RP_MSI_NR_VECTOR,
+                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
+                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+                  errp);
+    if (rc < 0) {
+        assert(rc == -ENOTSUP);
+    }
+
+    return rc;
+}
+
+static void cxl_rp_interrupts_uninit(PCIDevice *d)
+{
+    msi_uninit(d);
+}
+
+    
 static void latch_registers(CXLRootPort *crp)
 {
     uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
@@ -177,6 +227,15 @@ static void cxl_rp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
     }
 }
 
+static void cxl_rp_aer_vector_update(PCIDevice *d)
+{
+    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
+
+    if (rpc->aer_vector) {
+        pcie_aer_root_set_vector(d, rpc->aer_vector(d));
+    }
+}
+
 static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
                                 int len)
 {
@@ -203,6 +262,7 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
         }
     }
     pci_bridge_write_config(d, address, val, len);
+    cxl_rp_aer_vector_update(d);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
     pcie_aer_write_config(d, address, val, len);
@@ -229,6 +289,9 @@ static void cxl_root_port_class_init(ObjectClass *oc, void *data)
 
     rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
     rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
+    rpc->aer_vector = cxl_rp_aer_vector;
+    rpc->interrupts_init = cxl_rp_interrupts_init;
+    rpc->interrupts_uninit = cxl_rp_interrupts_uninit;
 
     dc->hotpluggable = false;
 }
Dave Jiang Nov. 16, 2022, 11:20 p.m. UTC | #10
On 11/3/2022 6:27 AM, Jonathan Cameron wrote:
> On Thu, 3 Nov 2022 12:58:51 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
>> On Mon, 24 Oct 2022 17:01:02 +0100
>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>
>>> On Wed, 19 Oct 2022 10:38:13 -0700
>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>    
>>>> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
>>>>> On Tue, 11 Oct 2022 18:19:15 +0100
>>>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>>>       
>>>>>> On Tue, 11 Oct 2022 08:18:34 -0700
>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>>       
>>>>>>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
>>>>>>>> On Fri, 16 Sep 2022 16:10:53 -0700
>>>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>>>>            
>>>>>>>>> Series set to RFC since there's no means to test. Would like to get opinion
>>>>>>>>> on whether going with using trace events as reporting mechanism is ok.
>>>>>>>>>
>>>>>>>>> Jonathan,
>>>>>>>>> We currently don't have any ways to test AER events. Do you have any plans
>>>>>>>>> to support AER events via QEMU emulation?
>>>>>>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.
>>>>> Hi Dave,
>>>>>
>>>>> Quick update.
>>>>>
>>>>> Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
>>>>> figuring out why I wasn't getting messages past the upstream switch port.
>>>>> Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
>>>>> that patch isn't upstream yet.
>>>>> Also QEMU AER rooting seems to be based on some older PCIE spec
>>>>> so needed some tweaks to get the device to actually issue ERR_FATAL etc.
>>>>>
>>>>> Anyhow, should have something you can play with in a day or two.
>>>>
>>>> Awesome! Thanks! :)
>>>
>>> Took a little longer than expected..
>>>
>>> Anyhow, now at
>>> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
>>>
>>> That tree is carrying far too many things right now for it make much sense
>>> to me to email this to qemu-devel - though I may pull
>>> hw/pci/aer: Add missing routing for AER errors
>>> out in advance as that's closing a spec different between QEMU emulation of AER
>>> and what the PCI spec says.
>>>
>>> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
>>> patches have been on list for a week or so.
>>>
>>> Top patch includes a very short 'how to' in patch description.  Basically fire
>>> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
>>> qemu commandline and use commands like:
>>>
>>> { "execute": "qmp_capabilities" }
>>> ...
>>> { "execute": "cxl-inject-uncorrectable-error",
>>>      "arguments": {
>>>          "path": "/machine/peripheral/cxl-pmem0",
>>>          "type": "cache-address-parity",
>>>          "header": [ 3, 4]
>>>      } }
>>> ...
>>> { "execute": "cxl-inject-correctable-error",
>>>      "arguments": {
>>>          "path": "/machine/peripheral/cxl-pmem0",
>>>          "type": "physical",
>>>          "header": [ 3, 4]
>>>      } }
>>>    
>>
>> So Dave reported that this wasn't working on x86 qemu machines.
>>
>> A fun bit of debugging later (I hate AML) and I think I have find the issue +
>> have a hack to workaround it for now.
>>
>> So need some background.
>> 1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
>>     bit of handling to create appropriate ACPI DSDT magic.
>> 2) The CXL root port is based on pcie_root_port.c
>> 3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
>>     for their signaling.
>> 4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
>>     actual interrupt on line 23 for my particular configuration
>> 5) The ACPI table says it's on line 11.
>> 6) x86 code for creating the PRT has an informative comment...
>> https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
>>   * The main goal is to equaly distribute the interrupts
>>   * over the 4 existing ACPI links (works only for i440fx).
>>
>> So the hack I'm running is below (note the UID thing is a separate bug that stops
>> iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
>> for that shortly).
>>
>> There are a bunch of possible approaches to fix this if my identification of
>> the issue is correct.
>>
>> 1) Clean equivalent of this hack that runs on appropriate machines only.
>> 2) Use MSI instead. (ioh3420 root port takes this approach I think).
> 
> This turned out to be trivial case of cut and pate.  So MSI support it is (mostly because it doesn't
> involve fixing AML generation :)
> 
> Very lightly tested on one config of x86 machine and one of arm64.
> I've not thought about this at all yet, so it's a direct copy of the ioh3420 with
> appropriate renames for it being in the cxl_rp code.

Hi Jonathan, do you have a qemu branch with the latest code? Maybe I'll 
give it a try. Thanks.

> 
> 
>  From a5e85d90cb734fc1de81e0d99e443747348e65e7 Mon Sep 17 00:00:00 2001
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Date: Thu, 3 Nov 2022 13:10:24 +0000
> Subject: [PATCH] hw: pci-bridge: cxl_root_port: Write up MSI
> 
> Done to avoid fixing ACPI rout description of traditional PCI interrupts on q35.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>   hw/pci-bridge/cxl_root_port.c | 63 +++++++++++++++++++++++++++++++++++
>   1 file changed, 63 insertions(+)
> 
> diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
> index 1a9363a6de..a7475c427e 100644
> --- a/hw/pci-bridge/cxl_root_port.c
> +++ b/hw/pci-bridge/cxl_root_port.c
> @@ -22,6 +22,7 @@
>   #include "qemu/range.h"
>   #include "hw/pci/pci_bridge.h"
>   #include "hw/pci/pcie_port.h"
> +#include "hw/pci/msi.h"
>   #include "hw/qdev-properties.h"
>   #include "hw/sysbus.h"
>   #include "qapi/error.h"
> @@ -29,6 +30,10 @@
>   
>   #define CXL_ROOT_PORT_DID 0x7075
>   
> +#define CXL_RP_MSI_OFFSET               0x60
> +#define CXL_RP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
> +#define CXL_RP_MSI_NR_VECTOR            2
> +
>   /* Copied from the gen root port which we derive */
>   #define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
>   #define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
> @@ -36,6 +41,7 @@
>   #define CXL_ROOT_PORT_DVSEC_OFFSET \
>       (GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
>   
> +
>   typedef struct CXLRootPort {
>       /*< private >*/
>       PCIESlot parent_obj;
> @@ -47,6 +53,50 @@ typedef struct CXLRootPort {
>   #define TYPE_CXL_ROOT_PORT "cxl-rp"
>   DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
>   
> +/*
> + * If two MSI vector are allocated, Advanced Error Interrupt Message Number
> + * is 1. otherwise 0.
> + * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
> + */
> +static uint8_t cxl_rp_aer_vector(const PCIDevice *d)
> +{
> +    switch (msi_nr_vectors_allocated(d)) {
> +    case 1:
> +        return 0;
> +    case 2:
> +        return 1;
> +    case 4:
> +    case 8:
> +    case 16:
> +    case 32:
> +    default:
> +        break;
> +    }
> +    abort();
> +    return 0;
> +}
> +
> +static int cxl_rp_interrupts_init(PCIDevice *d, Error **errp)
> +{
> +    int rc;
> +
> +    rc = msi_init(d, CXL_RP_MSI_OFFSET, CXL_RP_MSI_NR_VECTOR,
> +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
> +                  errp);
> +    if (rc < 0) {
> +        assert(rc == -ENOTSUP);
> +    }
> +
> +    return rc;
> +}
> +
> +static void cxl_rp_interrupts_uninit(PCIDevice *d)
> +{
> +    msi_uninit(d);
> +}
> +
> +
>   static void latch_registers(CXLRootPort *crp)
>   {
>       uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
> @@ -177,6 +227,15 @@ static void cxl_rp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
>       }
>   }
>   
> +static void cxl_rp_aer_vector_update(PCIDevice *d)
> +{
> +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
> +
> +    if (rpc->aer_vector) {
> +        pcie_aer_root_set_vector(d, rpc->aer_vector(d));
> +    }
> +}
> +
>   static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
>                                   int len)
>   {
> @@ -203,6 +262,7 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
>           }
>       }
>       pci_bridge_write_config(d, address, val, len);
> +    cxl_rp_aer_vector_update(d);
>       pcie_cap_flr_write_config(d, address, val, len);
>       pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
>       pcie_aer_write_config(d, address, val, len);
> @@ -229,6 +289,9 @@ static void cxl_root_port_class_init(ObjectClass *oc, void *data)
>   
>       rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
>       rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
> +    rpc->aer_vector = cxl_rp_aer_vector;
> +    rpc->interrupts_init = cxl_rp_interrupts_init;
> +    rpc->interrupts_uninit = cxl_rp_interrupts_uninit;
>   
>       dc->hotpluggable = false;
>   }
Jonathan Cameron Nov. 17, 2022, 1:50 p.m. UTC | #11
On Wed, 16 Nov 2022 16:20:20 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 11/3/2022 6:27 AM, Jonathan Cameron wrote:
> > On Thu, 3 Nov 2022 12:58:51 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >   
> >> On Mon, 24 Oct 2022 17:01:02 +0100
> >> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >>  
> >>> On Wed, 19 Oct 2022 10:38:13 -0700
> >>> Dave Jiang <dave.jiang@intel.com> wrote:
> >>>      
> >>>> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:  
> >>>>> On Tue, 11 Oct 2022 18:19:15 +0100
> >>>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >>>>>         
> >>>>>> On Tue, 11 Oct 2022 08:18:34 -0700
> >>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
> >>>>>>         
> >>>>>>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:  
> >>>>>>>> On Fri, 16 Sep 2022 16:10:53 -0700
> >>>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
> >>>>>>>>              
> >>>>>>>>> Series set to RFC since there's no means to test. Would like to get opinion
> >>>>>>>>> on whether going with using trace events as reporting mechanism is ok.
> >>>>>>>>>
> >>>>>>>>> Jonathan,
> >>>>>>>>> We currently don't have any ways to test AER events. Do you have any plans
> >>>>>>>>> to support AER events via QEMU emulation?  
> >>>>>>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.  
> >>>>> Hi Dave,
> >>>>>
> >>>>> Quick update.
> >>>>>
> >>>>> Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> >>>>> figuring out why I wasn't getting messages past the upstream switch port.
> >>>>> Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> >>>>> that patch isn't upstream yet.
> >>>>> Also QEMU AER rooting seems to be based on some older PCIE spec
> >>>>> so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> >>>>>
> >>>>> Anyhow, should have something you can play with in a day or two.  
> >>>>
> >>>> Awesome! Thanks! :)  
> >>>
> >>> Took a little longer than expected..
> >>>
> >>> Anyhow, now at
> >>> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
> >>>
> >>> That tree is carrying far too many things right now for it make much sense
> >>> to me to email this to qemu-devel - though I may pull
> >>> hw/pci/aer: Add missing routing for AER errors
> >>> out in advance as that's closing a spec different between QEMU emulation of AER
> >>> and what the PCI spec says.
> >>>
> >>> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
> >>> patches have been on list for a week or so.
> >>>
> >>> Top patch includes a very short 'how to' in patch description.  Basically fire
> >>> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
> >>> qemu commandline and use commands like:
> >>>
> >>> { "execute": "qmp_capabilities" }
> >>> ...
> >>> { "execute": "cxl-inject-uncorrectable-error",
> >>>      "arguments": {
> >>>          "path": "/machine/peripheral/cxl-pmem0",
> >>>          "type": "cache-address-parity",
> >>>          "header": [ 3, 4]
> >>>      } }
> >>> ...
> >>> { "execute": "cxl-inject-correctable-error",
> >>>      "arguments": {
> >>>          "path": "/machine/peripheral/cxl-pmem0",
> >>>          "type": "physical",
> >>>          "header": [ 3, 4]
> >>>      } }
> >>>      
> >>
> >> So Dave reported that this wasn't working on x86 qemu machines.
> >>
> >> A fun bit of debugging later (I hate AML) and I think I have find the issue +
> >> have a hack to workaround it for now.
> >>
> >> So need some background.
> >> 1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
> >>     bit of handling to create appropriate ACPI DSDT magic.
> >> 2) The CXL root port is based on pcie_root_port.c
> >> 3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
> >>     for their signaling.
> >> 4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
> >>     actual interrupt on line 23 for my particular configuration
> >> 5) The ACPI table says it's on line 11.
> >> 6) x86 code for creating the PRT has an informative comment...
> >> https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
> >>   * The main goal is to equaly distribute the interrupts
> >>   * over the 4 existing ACPI links (works only for i440fx).
> >>
> >> So the hack I'm running is below (note the UID thing is a separate bug that stops
> >> iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
> >> for that shortly).
> >>
> >> There are a bunch of possible approaches to fix this if my identification of
> >> the issue is correct.
> >>
> >> 1) Clean equivalent of this hack that runs on appropriate machines only.
> >> 2) Use MSI instead. (ioh3420 root port takes this approach I think).  
> > 
> > This turned out to be trivial case of cut and pate.  So MSI support it is (mostly because it doesn't
> > involve fixing AML generation :)
> > 
> > Very lightly tested on one config of x86 machine and one of arm64.
> > I've not thought about this at all yet, so it's a direct copy of the ioh3420 with
> > appropriate renames for it being in the cxl_rp code.  
> 
> Hi Jonathan, do you have a qemu branch with the latest code? Maybe I'll 
> give it a try. Thanks.
> 

https://gitlab.com/jic23/qemu/-/tree/cxl-2022-11-17

should work... (famous last words).

Jonathan



> > 
> > 
> >  From a5e85d90cb734fc1de81e0d99e443747348e65e7 Mon Sep 17 00:00:00 2001
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Date: Thu, 3 Nov 2022 13:10:24 +0000
> > Subject: [PATCH] hw: pci-bridge: cxl_root_port: Write up MSI
> > 
> > Done to avoid fixing ACPI rout description of traditional PCI interrupts on q35.
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >   hw/pci-bridge/cxl_root_port.c | 63 +++++++++++++++++++++++++++++++++++
> >   1 file changed, 63 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
> > index 1a9363a6de..a7475c427e 100644
> > --- a/hw/pci-bridge/cxl_root_port.c
> > +++ b/hw/pci-bridge/cxl_root_port.c
> > @@ -22,6 +22,7 @@
> >   #include "qemu/range.h"
> >   #include "hw/pci/pci_bridge.h"
> >   #include "hw/pci/pcie_port.h"
> > +#include "hw/pci/msi.h"
> >   #include "hw/qdev-properties.h"
> >   #include "hw/sysbus.h"
> >   #include "qapi/error.h"
> > @@ -29,6 +30,10 @@
> >   
> >   #define CXL_ROOT_PORT_DID 0x7075
> >   
> > +#define CXL_RP_MSI_OFFSET               0x60
> > +#define CXL_RP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
> > +#define CXL_RP_MSI_NR_VECTOR            2
> > +
> >   /* Copied from the gen root port which we derive */
> >   #define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
> >   #define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
> > @@ -36,6 +41,7 @@
> >   #define CXL_ROOT_PORT_DVSEC_OFFSET \
> >       (GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
> >   
> > +
> >   typedef struct CXLRootPort {
> >       /*< private >*/
> >       PCIESlot parent_obj;
> > @@ -47,6 +53,50 @@ typedef struct CXLRootPort {
> >   #define TYPE_CXL_ROOT_PORT "cxl-rp"
> >   DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
> >   
> > +/*
> > + * If two MSI vector are allocated, Advanced Error Interrupt Message Number
> > + * is 1. otherwise 0.
> > + * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
> > + */
> > +static uint8_t cxl_rp_aer_vector(const PCIDevice *d)
> > +{
> > +    switch (msi_nr_vectors_allocated(d)) {
> > +    case 1:
> > +        return 0;
> > +    case 2:
> > +        return 1;
> > +    case 4:
> > +    case 8:
> > +    case 16:
> > +    case 32:
> > +    default:
> > +        break;
> > +    }
> > +    abort();
> > +    return 0;
> > +}
> > +
> > +static int cxl_rp_interrupts_init(PCIDevice *d, Error **errp)
> > +{
> > +    int rc;
> > +
> > +    rc = msi_init(d, CXL_RP_MSI_OFFSET, CXL_RP_MSI_NR_VECTOR,
> > +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> > +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
> > +                  errp);
> > +    if (rc < 0) {
> > +        assert(rc == -ENOTSUP);
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> > +static void cxl_rp_interrupts_uninit(PCIDevice *d)
> > +{
> > +    msi_uninit(d);
> > +}
> > +
> > +
> >   static void latch_registers(CXLRootPort *crp)
> >   {
> >       uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
> > @@ -177,6 +227,15 @@ static void cxl_rp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
> >       }
> >   }
> >   
> > +static void cxl_rp_aer_vector_update(PCIDevice *d)
> > +{
> > +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
> > +
> > +    if (rpc->aer_vector) {
> > +        pcie_aer_root_set_vector(d, rpc->aer_vector(d));
> > +    }
> > +}
> > +
> >   static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
> >                                   int len)
> >   {
> > @@ -203,6 +262,7 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
> >           }
> >       }
> >       pci_bridge_write_config(d, address, val, len);
> > +    cxl_rp_aer_vector_update(d);
> >       pcie_cap_flr_write_config(d, address, val, len);
> >       pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
> >       pcie_aer_write_config(d, address, val, len);
> > @@ -229,6 +289,9 @@ static void cxl_root_port_class_init(ObjectClass *oc, void *data)
> >   
> >       rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
> >       rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
> > +    rpc->aer_vector = cxl_rp_aer_vector;
> > +    rpc->interrupts_init = cxl_rp_interrupts_init;
> > +    rpc->interrupts_uninit = cxl_rp_interrupts_uninit;
> >   
> >       dc->hotpluggable = false;
> >   }
Dave Jiang Nov. 18, 2022, 5:15 p.m. UTC | #12
On 11/17/2022 6:50 AM, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 16:20:20 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> On 11/3/2022 6:27 AM, Jonathan Cameron wrote:
>>> On Thu, 3 Nov 2022 12:58:51 +0000
>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>    
>>>> On Mon, 24 Oct 2022 17:01:02 +0100
>>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>>   
>>>>> On Wed, 19 Oct 2022 10:38:13 -0700
>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>       
>>>>>> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
>>>>>>> On Tue, 11 Oct 2022 18:19:15 +0100
>>>>>>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>>>>>>          
>>>>>>>> On Tue, 11 Oct 2022 08:18:34 -0700
>>>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>>>>          
>>>>>>>>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:
>>>>>>>>>> On Fri, 16 Sep 2022 16:10:53 -0700
>>>>>>>>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>>>>>>>>               
>>>>>>>>>>> Series set to RFC since there's no means to test. Would like to get opinion
>>>>>>>>>>> on whether going with using trace events as reporting mechanism is ok.
>>>>>>>>>>>
>>>>>>>>>>> Jonathan,
>>>>>>>>>>> We currently don't have any ways to test AER events. Do you have any plans
>>>>>>>>>>> to support AER events via QEMU emulation?
>>>>>>>>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.
>>>>>>> Hi Dave,
>>>>>>>
>>>>>>> Quick update.
>>>>>>>
>>>>>>> Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
>>>>>>> figuring out why I wasn't getting messages past the upstream switch port.
>>>>>>> Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
>>>>>>> that patch isn't upstream yet.
>>>>>>> Also QEMU AER rooting seems to be based on some older PCIE spec
>>>>>>> so needed some tweaks to get the device to actually issue ERR_FATAL etc.
>>>>>>>
>>>>>>> Anyhow, should have something you can play with in a day or two.
>>>>>>
>>>>>> Awesome! Thanks! :)
>>>>>
>>>>> Took a little longer than expected..
>>>>>
>>>>> Anyhow, now at
>>>>> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
>>>>>
>>>>> That tree is carrying far too many things right now for it make much sense
>>>>> to me to email this to qemu-devel - though I may pull
>>>>> hw/pci/aer: Add missing routing for AER errors
>>>>> out in advance as that's closing a spec different between QEMU emulation of AER
>>>>> and what the PCI spec says.
>>>>>
>>>>> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
>>>>> patches have been on list for a week or so.
>>>>>
>>>>> Top patch includes a very short 'how to' in patch description.  Basically fire
>>>>> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
>>>>> qemu commandline and use commands like:
>>>>>
>>>>> { "execute": "qmp_capabilities" }
>>>>> ...
>>>>> { "execute": "cxl-inject-uncorrectable-error",
>>>>>       "arguments": {
>>>>>           "path": "/machine/peripheral/cxl-pmem0",
>>>>>           "type": "cache-address-parity",
>>>>>           "header": [ 3, 4]
>>>>>       } }
>>>>> ...
>>>>> { "execute": "cxl-inject-correctable-error",
>>>>>       "arguments": {
>>>>>           "path": "/machine/peripheral/cxl-pmem0",
>>>>>           "type": "physical",
>>>>>           "header": [ 3, 4]
>>>>>       } }
>>>>>       
>>>>
>>>> So Dave reported that this wasn't working on x86 qemu machines.
>>>>
>>>> A fun bit of debugging later (I hate AML) and I think I have find the issue +
>>>> have a hack to workaround it for now.
>>>>
>>>> So need some background.
>>>> 1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
>>>>      bit of handling to create appropriate ACPI DSDT magic.
>>>> 2) The CXL root port is based on pcie_root_port.c
>>>> 3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
>>>>      for their signaling.
>>>> 4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
>>>>      actual interrupt on line 23 for my particular configuration
>>>> 5) The ACPI table says it's on line 11.
>>>> 6) x86 code for creating the PRT has an informative comment...
>>>> https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
>>>>    * The main goal is to equaly distribute the interrupts
>>>>    * over the 4 existing ACPI links (works only for i440fx).
>>>>
>>>> So the hack I'm running is below (note the UID thing is a separate bug that stops
>>>> iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
>>>> for that shortly).
>>>>
>>>> There are a bunch of possible approaches to fix this if my identification of
>>>> the issue is correct.
>>>>
>>>> 1) Clean equivalent of this hack that runs on appropriate machines only.
>>>> 2) Use MSI instead. (ioh3420 root port takes this approach I think).
>>>
>>> This turned out to be trivial case of cut and pate.  So MSI support it is (mostly because it doesn't
>>> involve fixing AML generation :)
>>>
>>> Very lightly tested on one config of x86 machine and one of arm64.
>>> I've not thought about this at all yet, so it's a direct copy of the ioh3420 with
>>> appropriate renames for it being in the cxl_rp code.
>>
>> Hi Jonathan, do you have a qemu branch with the latest code? Maybe I'll
>> give it a try. Thanks.
>>
> 
> https://gitlab.com/jic23/qemu/-/tree/cxl-2022-11-17
> 
> should work... (famous last words).

Yup it works great! Got my code tested. Thank you!

> 
> Jonathan
> 
> 
> 
>>>
>>>
>>>   From a5e85d90cb734fc1de81e0d99e443747348e65e7 Mon Sep 17 00:00:00 2001
>>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Date: Thu, 3 Nov 2022 13:10:24 +0000
>>> Subject: [PATCH] hw: pci-bridge: cxl_root_port: Write up MSI
>>>
>>> Done to avoid fixing ACPI rout description of traditional PCI interrupts on q35.
>>>
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>>    hw/pci-bridge/cxl_root_port.c | 63 +++++++++++++++++++++++++++++++++++
>>>    1 file changed, 63 insertions(+)
>>>
>>> diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
>>> index 1a9363a6de..a7475c427e 100644
>>> --- a/hw/pci-bridge/cxl_root_port.c
>>> +++ b/hw/pci-bridge/cxl_root_port.c
>>> @@ -22,6 +22,7 @@
>>>    #include "qemu/range.h"
>>>    #include "hw/pci/pci_bridge.h"
>>>    #include "hw/pci/pcie_port.h"
>>> +#include "hw/pci/msi.h"
>>>    #include "hw/qdev-properties.h"
>>>    #include "hw/sysbus.h"
>>>    #include "qapi/error.h"
>>> @@ -29,6 +30,10 @@
>>>    
>>>    #define CXL_ROOT_PORT_DID 0x7075
>>>    
>>> +#define CXL_RP_MSI_OFFSET               0x60
>>> +#define CXL_RP_MSI_SUPPORTED_FLAGS      PCI_MSI_FLAGS_MASKBIT
>>> +#define CXL_RP_MSI_NR_VECTOR            2
>>> +
>>>    /* Copied from the gen root port which we derive */
>>>    #define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
>>>    #define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
>>> @@ -36,6 +41,7 @@
>>>    #define CXL_ROOT_PORT_DVSEC_OFFSET \
>>>        (GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
>>>    
>>> +
>>>    typedef struct CXLRootPort {
>>>        /*< private >*/
>>>        PCIESlot parent_obj;
>>> @@ -47,6 +53,50 @@ typedef struct CXLRootPort {
>>>    #define TYPE_CXL_ROOT_PORT "cxl-rp"
>>>    DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
>>>    
>>> +/*
>>> + * If two MSI vector are allocated, Advanced Error Interrupt Message Number
>>> + * is 1. otherwise 0.
>>> + * 17.12.5.10 RPERRSTS,  32:27 bit Advanced Error Interrupt Message Number.
>>> + */
>>> +static uint8_t cxl_rp_aer_vector(const PCIDevice *d)
>>> +{
>>> +    switch (msi_nr_vectors_allocated(d)) {
>>> +    case 1:
>>> +        return 0;
>>> +    case 2:
>>> +        return 1;
>>> +    case 4:
>>> +    case 8:
>>> +    case 16:
>>> +    case 32:
>>> +    default:
>>> +        break;
>>> +    }
>>> +    abort();
>>> +    return 0;
>>> +}
>>> +
>>> +static int cxl_rp_interrupts_init(PCIDevice *d, Error **errp)
>>> +{
>>> +    int rc;
>>> +
>>> +    rc = msi_init(d, CXL_RP_MSI_OFFSET, CXL_RP_MSI_NR_VECTOR,
>>> +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
>>> +                  CXL_RP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
>>> +                  errp);
>>> +    if (rc < 0) {
>>> +        assert(rc == -ENOTSUP);
>>> +    }
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +static void cxl_rp_interrupts_uninit(PCIDevice *d)
>>> +{
>>> +    msi_uninit(d);
>>> +}
>>> +
>>> +
>>>    static void latch_registers(CXLRootPort *crp)
>>>    {
>>>        uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
>>> @@ -177,6 +227,15 @@ static void cxl_rp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
>>>        }
>>>    }
>>>    
>>> +static void cxl_rp_aer_vector_update(PCIDevice *d)
>>> +{
>>> +    PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
>>> +
>>> +    if (rpc->aer_vector) {
>>> +        pcie_aer_root_set_vector(d, rpc->aer_vector(d));
>>> +    }
>>> +}
>>> +
>>>    static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
>>>                                    int len)
>>>    {
>>> @@ -203,6 +262,7 @@ static void cxl_rp_write_config(PCIDevice *d, uint32_t address, uint32_t val,
>>>            }
>>>        }
>>>        pci_bridge_write_config(d, address, val, len);
>>> +    cxl_rp_aer_vector_update(d);
>>>        pcie_cap_flr_write_config(d, address, val, len);
>>>        pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
>>>        pcie_aer_write_config(d, address, val, len);
>>> @@ -229,6 +289,9 @@ static void cxl_root_port_class_init(ObjectClass *oc, void *data)
>>>    
>>>        rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
>>>        rpc->acs_offset = GEN_PCIE_ROOT_PORT_ACS_OFFSET;
>>> +    rpc->aer_vector = cxl_rp_aer_vector;
>>> +    rpc->interrupts_init = cxl_rp_interrupts_init;
>>> +    rpc->interrupts_uninit = cxl_rp_interrupts_uninit;
>>>    
>>>        dc->hotpluggable = false;
>>>    }
>