mbox series

[v6,0/5] PCI: Add Secondary Bus Reset (SBR) support for CXL

Message ID 20240502165851.1948523-1-dave.jiang@intel.com (mailing list archive)
Headers show
Series PCI: Add Secondary Bus Reset (SBR) support for CXL | expand

Message

Dave Jiang May 2, 2024, 4:57 p.m. UTC
Hi Bjorn,
Please consider this series for the next appropriate kernel merge window.
The CXL part has review tags from Dan and Jonathan. It can go through your tree.
The series attempt to add secondary bus reset (SBR) support to CXL. By default,
SBR for CXL is masked. Per CXL specification r3.1 8.1.5.2, the Port Control
Extensions register bit 0 (Unmask SBR) in the host bridge controls the masking
of CXL SBR.  "When 0, SBR bit in Bridge Control register of this Port has no
effect. When 1, the Port shall generate hot reset when SBR in Bridge Control
gets set to 1.  Default value of this bit is 0. When the Port is operating in
PCIe mode or RCD mode, this field has no effect on SBR functionality and Port
shall follow PCIe Base Specification."

v6:
- Split out locking of upstream bridge to a separate patch and add lockdep
  support. (Dan)
- Added Dan's review tags for last 2 patches.

v5:
- Move bridge locking to pci_reset_function(). (Dan)
- Simplify retrieval of cxld. (Dan)
- Add lock to device to prevent racing of disabled cxlmd. (Dan)
- Promote dev_warn() to dev_crit() (Dan)
- Remove LOCKDEP_NOT_UNRELIABLE flag. (Dan)

v4:
- Return u16 for cxl_port_dvsec(). (Lukas)
- Use pci_dev_lock guard() on bridge. (Lukas)
- Clarify commit subject on 4/4. (Jonathan)

v3:
- Move PCI_DVSEC_VENDOR_ID_CXL to PCI_VENDOR_ID_CXL in pci_ids.h (Bjorn)
- Remove of CXL device checking. (Bjorn)
- Rename defines to PCI_DVSEC_CXL_PORT_*. (Bjorn)
- Fix SBR define in commit log. (Bjorn)
- Update comment on dvsec not found. (Dan)
- Check return of dvsec value read for error. (Dan)
- Move cxl_port_dvsec() to an earlier patch. (Dan)
- Add pci_cfg_access_lock() for bridge access. (Dan)
- Change cxl_bus_force method to cxl_bus. (Dan)
- Rename decoder_hw_mismatch() to __cxl_endpoint_decoder_reset_detected(). (Dan)
- Move CXL register access function to core/pci.c. (Dan)
- Add kernel taint to decoder reset warning. (Dan)

v2:
- Use pci_upstream_bridge() instead of dev->bus->self. (Lukas)
- Rename is_cxl_device() to pci_is_cxl(). (Lukas)
- Return -ENOTTY on error. (Lukas)

Patch 1:
Move PCI_DVSEC_VENDOR_ID_CXL to PCI_VENDOR_ID_CXL in pci_ids.h and adjust related
CXL bits.

Patch 2:
Add locking of the upstream bridge to keep configuration of the bridge consistent.

Patch 3:
Add check to PCI bus_reset path for CXL device and return with error if "Unmask
SBR" bit is set to 0. This allows user to realize that SBR is masked for this
CXL device. However, if the user sets the "Unmask SBR" bit via a tool such as
setpci, then the bus_reset will proceed.

Patch4:
Add a new PCI reset method "cxl_bus" in order to allow the user an
intetional way to perform SBR. The code will set the "Unmask SBR" bit to
1 and proceed with bus_reset. The original value of the bit will be restored
after the reset operation.

Patch5:
CXL driver change that provides a ->reset_done() callback. It compares the
hardware decoder settings with the existing software configuration and emit
warning if they differ. The difference indicates that decoders were programmed
before the reset and are now cleared after reset. There may be onlined system
memory backed by CXL memory device that are now violently ripped away from
kernel mapping.

Patch series stemmed from this [1] patch. With comments [2] from Bjorn.

[1]: https://lore.kernel.org/linux-cxl/20240215232307.2793530-1-dave.jiang@intel.com/
[2]: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/

Comments

Bjorn Helgaas May 8, 2024, 6:27 p.m. UTC | #1
On Thu, May 02, 2024 at 09:57:29AM -0700, Dave Jiang wrote:
> Hi Bjorn,
> Please consider this series for the next appropriate kernel merge window.
> The CXL part has review tags from Dan and Jonathan. It can go through your tree.
> The series attempt to add secondary bus reset (SBR) support to CXL. By default,
> SBR for CXL is masked. Per CXL specification r3.1 8.1.5.2, the Port Control
> Extensions register bit 0 (Unmask SBR) in the host bridge controls the masking
> of CXL SBR.  "When 0, SBR bit in Bridge Control register of this Port has no
> effect. When 1, the Port shall generate hot reset when SBR in Bridge Control
> gets set to 1.  Default value of this bit is 0. When the Port is operating in
> PCIe mode or RCD mode, this field has no effect on SBR functionality and Port
> shall follow PCIe Base Specification."

Applied with minor cleanups to pci/cxl for v6.10, thanks!

> v6:
> - Split out locking of upstream bridge to a separate patch and add lockdep
>   support. (Dan)
> - Added Dan's review tags for last 2 patches.
> 
> v5:
> - Move bridge locking to pci_reset_function(). (Dan)
> - Simplify retrieval of cxld. (Dan)
> - Add lock to device to prevent racing of disabled cxlmd. (Dan)
> - Promote dev_warn() to dev_crit() (Dan)
> - Remove LOCKDEP_NOT_UNRELIABLE flag. (Dan)
> 
> v4:
> - Return u16 for cxl_port_dvsec(). (Lukas)
> - Use pci_dev_lock guard() on bridge. (Lukas)
> - Clarify commit subject on 4/4. (Jonathan)
> 
> v3:
> - Move PCI_DVSEC_VENDOR_ID_CXL to PCI_VENDOR_ID_CXL in pci_ids.h (Bjorn)
> - Remove of CXL device checking. (Bjorn)
> - Rename defines to PCI_DVSEC_CXL_PORT_*. (Bjorn)
> - Fix SBR define in commit log. (Bjorn)
> - Update comment on dvsec not found. (Dan)
> - Check return of dvsec value read for error. (Dan)
> - Move cxl_port_dvsec() to an earlier patch. (Dan)
> - Add pci_cfg_access_lock() for bridge access. (Dan)
> - Change cxl_bus_force method to cxl_bus. (Dan)
> - Rename decoder_hw_mismatch() to __cxl_endpoint_decoder_reset_detected(). (Dan)
> - Move CXL register access function to core/pci.c. (Dan)
> - Add kernel taint to decoder reset warning. (Dan)
> 
> v2:
> - Use pci_upstream_bridge() instead of dev->bus->self. (Lukas)
> - Rename is_cxl_device() to pci_is_cxl(). (Lukas)
> - Return -ENOTTY on error. (Lukas)
> 
> Patch 1:
> Move PCI_DVSEC_VENDOR_ID_CXL to PCI_VENDOR_ID_CXL in pci_ids.h and adjust related
> CXL bits.
> 
> Patch 2:
> Add locking of the upstream bridge to keep configuration of the bridge consistent.
> 
> Patch 3:
> Add check to PCI bus_reset path for CXL device and return with error if "Unmask
> SBR" bit is set to 0. This allows user to realize that SBR is masked for this
> CXL device. However, if the user sets the "Unmask SBR" bit via a tool such as
> setpci, then the bus_reset will proceed.
> 
> Patch4:
> Add a new PCI reset method "cxl_bus" in order to allow the user an
> intetional way to perform SBR. The code will set the "Unmask SBR" bit to
> 1 and proceed with bus_reset. The original value of the bit will be restored
> after the reset operation.
> 
> Patch5:
> CXL driver change that provides a ->reset_done() callback. It compares the
> hardware decoder settings with the existing software configuration and emit
> warning if they differ. The difference indicates that decoders were programmed
> before the reset and are now cleared after reset. There may be onlined system
> memory backed by CXL memory device that are now violently ripped away from
> kernel mapping.
> 
> Patch series stemmed from this [1] patch. With comments [2] from Bjorn.
> 
> [1]: https://lore.kernel.org/linux-cxl/20240215232307.2793530-1-dave.jiang@intel.com/
> [2]: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/
>