diff mbox series

[v1] cxl/pci: Fix debug message in cxl_probe_regs()

Message ID 20210811194747.44688-1-johnny.li@montage-tech.com
State Accepted
Commit 036a16a39e2fab9bf7279201d04cf7e90993521f
Headers show
Series [v1] cxl/pci: Fix debug message in cxl_probe_regs() | expand

Commit Message

johnny Aug. 11, 2021, 7:47 p.m. UTC
Indicator string for mbox and memdev register set to status
incorrectly in error message.

Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com>
---
 drivers/cxl/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Dan Williams Aug. 11, 2021, 3:38 p.m. UTC | #1
On Wed, Aug 11, 2021 at 12:54 AM Li Qiang (Johnny Li)
<johnny.li@montage-tech.com> wrote:
>
> Indicator string for mbox and memdev register set to status
> incorrectly in error message.
>
> Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com>

Looks good to me, applied.

> ---
>  drivers/cxl/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 47315bb2db10..70e80237865c 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1032,8 +1032,8 @@ static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base,
>                     !dev_map->memdev.valid) {
>                         dev_err(dev, "registers not found: %s%s%s\n",
>                                 !dev_map->status.valid ? "status " : "",
> -                               !dev_map->mbox.valid ? "status " : "",
> -                               !dev_map->memdev.valid ? "status " : "");
> +                               !dev_map->mbox.valid ? "mbox " : "",
> +                               !dev_map->memdev.valid ? "memdev " : "");
>                         return -ENXIO;
>                 }
>
> --
> 2.17.1
>
>
Chris Browy Aug. 11, 2021, 4:35 p.m. UTC | #2
Hi Dan,

Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0 
spec.  How is initiated at OS-level?  I tried some conventional approach or using sysfs 
remove command on RC and then setting RC bridge control secondary bus reset (using 
setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM 
Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to 
send CXL Power Management.Message RESETPREP REQUEST to CXL Device 
before all this.  None of this occurs.

I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT.  This 
work also serves the needs for the CXLCV test software.

What is the proper procedure to test this from OS or user perspective?  Are there sysfs 
entries or other commands that are necessary for CXL compared to PCIe?

Is there something missing in QEMU CXL host bridge hardware point of view 
to automatically generate the VDM?

BTW we want to work through all the reset and power mode testing so hot reset is just the start.

For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host 
part right).

Best Regards,
Chris
Dan Williams Aug. 12, 2021, 1:08 a.m. UTC | #3
On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> wrote:
>
> Hi Dan,
>
> Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0
> spec.  How is initiated at OS-level?

I have no idea what the spec means by "OS orchestrated reset flow".
Linux PCI core has no idea about special CXL device reset
requirements, it just attempts typical PCI reset methods starting with
FLR, escalating to D3 D0 toggle, and finally attempting secondary bus
reset if any errors were reported on the previous attempts.

> I tried some conventional approach or using sysfs
> remove command on RC and then setting RC bridge control secondary bus reset (using
> setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM
> Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to
> send CXL Power Management.Message RESETPREP REQUEST to CXL Device
> before all this.  None of this occurs.
>
> I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT.  This
> work also serves the needs for the CXLCV test software.

It's not clear to me how a QEMU behavioral model could send things
like the CXL PM VDM.

> What is the proper procedure to test this from OS or user perspective?  Are there sysfs
> entries or other commands that are necessary for CXL compared to PCIe?

/sys/bus/pci/devices/$device/reset is a method to trigger PCI device
reset, but I do not expect that will ever gain CXL specific knowledge.

>
> Is there something missing in QEMU CXL host bridge hardware point of view
> to automatically generate the VDM?

Certainly QEMU is only a behavioral model, and the VDM is a part of a
hardware functional protocol.

> BTW we want to work through all the reset and power mode testing so hot reset is just the start.
>
> For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host
> part right).

It sounds promising, but it also sounds out of scope for what the
Linux driver can affect.  As far as I can see, the CXL specification
must assume that OS software treats CXL.io as typical PCIE for the
purposes of issuing resets.
Vikram Sethi Aug. 13, 2021, 5:01 p.m. UTC | #4
Hi Dan, 

> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> 
> On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> wrote:
> >
> 
> /sys/bus/pci/devices/$device/reset is a method to trigger PCI device reset,
> but I do not expect that will ever gain CXL specific knowledge.
> 
CXL reset may need some thought, specially for devices that don't expose FLR but do expose CXL reset (while former does not affect CXL.cache/mem, the latter wipes out CXL.cache/mem state in the device and there is discoverability as to whether or not memory contents can be cleared as part of CXL reset). We may need a way of triggering CXL reset from userspace, and if the existing /sys/bus/pci/devices/$device/reset won't have knowledge of CXL reset, there still should be a prioritized order in the kernel in which CXL reset is attempted before more drastic resets like SBR. IIRC CXL reset can also impact all functions that use CXL.cache/mem, but not legacy PCIe functions on the device which do not use CXL.cache/mem (there is discoverability as to which functions are not impacted by CXL reset). 

Thanks,
Vikram
Bjorn Helgaas Aug. 13, 2021, 5:14 p.m. UTC | #5
[+cc Amey (working on PCI resets), linux-pci]

On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> Hi Dan, 
> 
> > -----Original Message-----
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > wrote:
> > 
> > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > device reset, but I do not expect that will ever gain CXL specific
> > knowledge.
> > 
> CXL reset may need some thought, specially for devices that don't
> expose FLR but do expose CXL reset (while former does not affect
> CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> device and there is discoverability as to whether or not memory
> contents can be cleared as part of CXL reset). We may need a way of
> triggering CXL reset from userspace, and if the existing
> /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> reset, there still should be a prioritized order in the kernel in
> which CXL reset is attempted before more drastic resets like SBR.
> IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> but not legacy PCIe functions on the device which do not use
> CXL.cache/mem (there is discoverability as to which functions are
> not impacted by CXL reset). 
> 
> Thanks,
> Vikram
Dan Williams Aug. 13, 2021, 9:27 p.m. UTC | #6
On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).

What's the Linux use case for supporting CXL reset for a CXL memory
expander? PCI reset is useful for device assignment, and CXL reset
might be useful for similarly assigning an accelerator. CXL.mem on the
other hand can be directly assigned at a per-page level without also
needing to assign the device. How could a VM reliably program HDM
decoders when it cannot perceive the host physical address space? I
understand the utility of CXL reset for device bring-up and test
software that knows what it is doing can write config space directly,
but that software would assume all responsibility.
Amey Narkhede Aug. 14, 2021, 11:16 a.m. UTC | #7
On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).
> >
> > Thanks,
> > Vikram

We can add new reset method and expose it to userspace via new 'reset_method'
sysfs attribute introduced in this series
https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

Thanks,
Amey
Dan Williams Aug. 14, 2021, 7:47 p.m. UTC | #8
On Sat, Aug 14, 2021 at 4:16 AM Amey Narkhede <ameynarkhede03@gmail.com> wrote:
>
> On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> > >
> > > Thanks,
> > > Vikram
>
> We can add new reset method and expose it to userspace via new 'reset_method'
> sysfs attribute introduced in this series
> https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

It's not clear to me that's a suitable place for CXL reset though. CXL
reset wants to coordinate with the device's participation in a
potential interleave-set across multiple devices. So something like
/sys/bus/cxl/devices/memX/reset might be a better location for
coordinated CXL reset if needed. Again though, the primary use case
for userspace triggered reset is device assignment, and there are
better mechanisms to assign CXL.mem resources to a guest.
Vikram Sethi Aug. 17, 2021, 3:03 a.m. UTC | #9
> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy
> > > > <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> 
> What's the Linux use case for supporting CXL reset for a CXL memory
> expander? PCI reset is useful for device assignment, and CXL reset might be
> useful for similarly assigning an accelerator. CXL.mem on the other hand can
> be directly assigned at a per-page level without also needing to assign the
> device. How could a VM reliably program HDM decoders when it cannot
> perceive the host physical address space? I understand the utility of CXL
> reset for device bring-up and test software that knows what it is doing can
> write config space directly, but that software would assume all responsibility.

Agree that CXL reset will be needed for type1/2 CXL devices (accelerators) 
which will need a sysfs interface for userspace to use CXL reset.
diff mbox series

Patch

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 47315bb2db10..70e80237865c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1032,8 +1032,8 @@  static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base,
 		    !dev_map->memdev.valid) {
 			dev_err(dev, "registers not found: %s%s%s\n",
 				!dev_map->status.valid ? "status " : "",
-				!dev_map->mbox.valid ? "status " : "",
-				!dev_map->memdev.valid ? "status " : "");
+				!dev_map->mbox.valid ? "mbox " : "",
+				!dev_map->memdev.valid ? "memdev " : "");
 			return -ENXIO;
 		}