Message ID | 20210811194747.44688-1-johnny.li@montage-tech.com |
---|---|
State | Accepted |
Commit | 036a16a39e2fab9bf7279201d04cf7e90993521f |
Headers | show |
Series | [v1] cxl/pci: Fix debug message in cxl_probe_regs() | expand |
On Wed, Aug 11, 2021 at 12:54 AM Li Qiang (Johnny Li) <johnny.li@montage-tech.com> wrote: > > Indicator string for mbox and memdev register set to status > incorrectly in error message. > > Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com> Looks good to me, applied. > --- > drivers/cxl/pci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 47315bb2db10..70e80237865c 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -1032,8 +1032,8 @@ static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base, > !dev_map->memdev.valid) { > dev_err(dev, "registers not found: %s%s%s\n", > !dev_map->status.valid ? "status " : "", > - !dev_map->mbox.valid ? "status " : "", > - !dev_map->memdev.valid ? "status " : ""); > + !dev_map->mbox.valid ? "mbox " : "", > + !dev_map->memdev.valid ? "memdev " : ""); > return -ENXIO; > } > > -- > 2.17.1 > >
Hi Dan, Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0 spec. How is initiated at OS-level? I tried some conventional approach or using sysfs remove command on RC and then setting RC bridge control secondary bus reset (using setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to send CXL Power Management.Message RESETPREP REQUEST to CXL Device before all this. None of this occurs. I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT. This work also serves the needs for the CXLCV test software. What is the proper procedure to test this from OS or user perspective? Are there sysfs entries or other commands that are necessary for CXL compared to PCIe? Is there something missing in QEMU CXL host bridge hardware point of view to automatically generate the VDM? BTW we want to work through all the reset and power mode testing so hot reset is just the start. For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host part right). Best Regards, Chris
On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> wrote: > > Hi Dan, > > Wondering if you could shed some light on OS orchestrated reset, Sec 9.3 of CXL 2.0 > spec. How is initiated at OS-level? I have no idea what the spec means by "OS orchestrated reset flow". Linux PCI core has no idea about special CXL device reset requirements, it just attempts typical PCI reset methods starting with FLR, escalating to D3 D0 toggle, and finally attempting secondary bus reset if any errors were reported on the previous attempts. > I tried some conventional approach or using sysfs > remove command on RC and then setting RC bridge control secondary bus reset (using > setpci from Ben's pciutils) and can drive the system into PCIe host reset flow (LTSSM > Recovery for hot reset) but the CXL host is supposed follow steps in 9.3 including to > send CXL Power Management.Message RESETPREP REQUEST to CXL Device > before all this. None of this occurs. > > I’m testing on pure QEMU emulation as well the the QEMU co-sim to real DUT. This > work also serves the needs for the CXLCV test software. It's not clear to me how a QEMU behavioral model could send things like the CXL PM VDM. > What is the proper procedure to test this from OS or user perspective? Are there sysfs > entries or other commands that are necessary for CXL compared to PCIe? /sys/bus/pci/devices/$device/reset is a method to trigger PCI device reset, but I do not expect that will ever gain CXL specific knowledge. > > Is there something missing in QEMU CXL host bridge hardware point of view > to automatically generate the VDM? Certainly QEMU is only a behavioral model, and the VDM is a part of a hardware functional protocol. > BTW we want to work through all the reset and power mode testing so hot reset is just the start. > > For us the QEMU co-sim is a good approach for pre-silicon verification (if we can get the host > part right). It sounds promising, but it also sounds out of scope for what the Linux driver can affect. As far as I can see, the CXL specification must assume that OS software treats CXL.io as typical PCIE for the purposes of issuing resets.
Hi Dan, > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> > wrote: > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI device reset, > but I do not expect that will ever gain CXL specific knowledge. > CXL reset may need some thought, specially for devices that don't expose FLR but do expose CXL reset (while former does not affect CXL.cache/mem, the latter wipes out CXL.cache/mem state in the device and there is discoverability as to whether or not memory contents can be cleared as part of CXL reset). We may need a way of triggering CXL reset from userspace, and if the existing /sys/bus/pci/devices/$device/reset won't have knowledge of CXL reset, there still should be a prioritized order in the kernel in which CXL reset is attempted before more drastic resets like SBR. IIRC CXL reset can also impact all functions that use CXL.cache/mem, but not legacy PCIe functions on the device which do not use CXL.cache/mem (there is discoverability as to which functions are not impacted by CXL reset). Thanks, Vikram
[+cc Amey (working on PCI resets), linux-pci] On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote: > Hi Dan, > > > -----Original Message----- > > From: Dan Williams <dan.j.williams@intel.com> > > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> > > wrote: > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI > > device reset, but I do not expect that will ever gain CXL specific > > knowledge. > > > CXL reset may need some thought, specially for devices that don't > expose FLR but do expose CXL reset (while former does not affect > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the > device and there is discoverability as to whether or not memory > contents can be cleared as part of CXL reset). We may need a way of > triggering CXL reset from userspace, and if the existing > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL > reset, there still should be a prioritized order in the kernel in > which CXL reset is attempted before more drastic resets like SBR. > IIRC CXL reset can also impact all functions that use CXL.cache/mem, > but not legacy PCIe functions on the device which do not use > CXL.cache/mem (there is discoverability as to which functions are > not impacted by CXL reset). > > Thanks, > Vikram
On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Amey (working on PCI resets), linux-pci] > > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote: > > Hi Dan, > > > > > -----Original Message----- > > > From: Dan Williams <dan.j.williams@intel.com> > > > > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> > > > wrote: > > > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI > > > device reset, but I do not expect that will ever gain CXL specific > > > knowledge. > > > > > CXL reset may need some thought, specially for devices that don't > > expose FLR but do expose CXL reset (while former does not affect > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the > > device and there is discoverability as to whether or not memory > > contents can be cleared as part of CXL reset). We may need a way of > > triggering CXL reset from userspace, and if the existing > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL > > reset, there still should be a prioritized order in the kernel in > > which CXL reset is attempted before more drastic resets like SBR. > > IIRC CXL reset can also impact all functions that use CXL.cache/mem, > > but not legacy PCIe functions on the device which do not use > > CXL.cache/mem (there is discoverability as to which functions are > > not impacted by CXL reset). What's the Linux use case for supporting CXL reset for a CXL memory expander? PCI reset is useful for device assignment, and CXL reset might be useful for similarly assigning an accelerator. CXL.mem on the other hand can be directly assigned at a per-page level without also needing to assign the device. How could a VM reliably program HDM decoders when it cannot perceive the host physical address space? I understand the utility of CXL reset for device bring-up and test software that knows what it is doing can write config space directly, but that software would assume all responsibility.
On 21/08/13 12:14PM, Bjorn Helgaas wrote: > [+cc Amey (working on PCI resets), linux-pci] > > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote: > > Hi Dan, > > > > > -----Original Message----- > > > From: Dan Williams <dan.j.williams@intel.com> > > > > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> > > > wrote: > > > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI > > > device reset, but I do not expect that will ever gain CXL specific > > > knowledge. > > > > > CXL reset may need some thought, specially for devices that don't > > expose FLR but do expose CXL reset (while former does not affect > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the > > device and there is discoverability as to whether or not memory > > contents can be cleared as part of CXL reset). We may need a way of > > triggering CXL reset from userspace, and if the existing > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL > > reset, there still should be a prioritized order in the kernel in > > which CXL reset is attempted before more drastic resets like SBR. > > IIRC CXL reset can also impact all functions that use CXL.cache/mem, > > but not legacy PCIe functions on the device which do not use > > CXL.cache/mem (there is discoverability as to which functions are > > not impacted by CXL reset). > > > > Thanks, > > Vikram We can add new reset method and expose it to userspace via new 'reset_method' sysfs attribute introduced in this series https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/ Thanks, Amey
On Sat, Aug 14, 2021 at 4:16 AM Amey Narkhede <ameynarkhede03@gmail.com> wrote: > > On 21/08/13 12:14PM, Bjorn Helgaas wrote: > > [+cc Amey (working on PCI resets), linux-pci] > > > > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote: > > > Hi Dan, > > > > > > > -----Original Message----- > > > > From: Dan Williams <dan.j.williams@intel.com> > > > > > > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com> > > > > wrote: > > > > > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI > > > > device reset, but I do not expect that will ever gain CXL specific > > > > knowledge. > > > > > > > CXL reset may need some thought, specially for devices that don't > > > expose FLR but do expose CXL reset (while former does not affect > > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the > > > device and there is discoverability as to whether or not memory > > > contents can be cleared as part of CXL reset). We may need a way of > > > triggering CXL reset from userspace, and if the existing > > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL > > > reset, there still should be a prioritized order in the kernel in > > > which CXL reset is attempted before more drastic resets like SBR. > > > IIRC CXL reset can also impact all functions that use CXL.cache/mem, > > > but not legacy PCIe functions on the device which do not use > > > CXL.cache/mem (there is discoverability as to which functions are > > > not impacted by CXL reset). > > > > > > Thanks, > > > Vikram > > We can add new reset method and expose it to userspace via new 'reset_method' > sysfs attribute introduced in this series > https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/ It's not clear to me that's a suitable place for CXL reset though. CXL reset wants to coordinate with the device's participation in a potential interleave-set across multiple devices. So something like /sys/bus/cxl/devices/memX/reset might be a better location for coordinated CXL reset if needed. Again though, the primary use case for userspace triggered reset is device assignment, and there are better mechanisms to assign CXL.mem resources to a guest.
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > [+cc Amey (working on PCI resets), linux-pci] > > > > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote: > > > Hi Dan, > > > > > > > -----Original Message----- > > > > From: Dan Williams <dan.j.williams@intel.com> > > > > > > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy > > > > <cbrowy@avery-design.com> > > > > wrote: > > > > > > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI > > > > device reset, but I do not expect that will ever gain CXL specific > > > > knowledge. > > > > > > > CXL reset may need some thought, specially for devices that don't > > > expose FLR but do expose CXL reset (while former does not affect > > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the > > > device and there is discoverability as to whether or not memory > > > contents can be cleared as part of CXL reset). We may need a way of > > > triggering CXL reset from userspace, and if the existing > > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL > > > reset, there still should be a prioritized order in the kernel in > > > which CXL reset is attempted before more drastic resets like SBR. > > > IIRC CXL reset can also impact all functions that use CXL.cache/mem, > > > but not legacy PCIe functions on the device which do not use > > > CXL.cache/mem (there is discoverability as to which functions are > > > not impacted by CXL reset). > > What's the Linux use case for supporting CXL reset for a CXL memory > expander? PCI reset is useful for device assignment, and CXL reset might be > useful for similarly assigning an accelerator. CXL.mem on the other hand can > be directly assigned at a per-page level without also needing to assign the > device. How could a VM reliably program HDM decoders when it cannot > perceive the host physical address space? I understand the utility of CXL > reset for device bring-up and test software that knows what it is doing can > write config space directly, but that software would assume all responsibility. Agree that CXL reset will be needed for type1/2 CXL devices (accelerators) which will need a sysfs interface for userspace to use CXL reset.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 47315bb2db10..70e80237865c 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -1032,8 +1032,8 @@ static int cxl_probe_regs(struct cxl_mem *cxlm, void __iomem *base, !dev_map->memdev.valid) { dev_err(dev, "registers not found: %s%s%s\n", !dev_map->status.valid ? "status " : "", - !dev_map->mbox.valid ? "status " : "", - !dev_map->memdev.valid ? "status " : ""); + !dev_map->mbox.valid ? "mbox " : "", + !dev_map->memdev.valid ? "memdev " : ""); return -ENXIO; }
Indicator string for mbox and memdev register set to status incorrectly in error message. Signed-off-by: Li Qiang (Johnny Li) <johnny.li@montage-tech.com> --- drivers/cxl/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)