mbox series

[v2,00/17] Improve PCI device post-reset readiness polling

Message ID 20200302184429.12880-1-stanspas@amazon.com (mailing list archive)
Headers show
Series Improve PCI device post-reset readiness polling | expand

Message

Stanislav Spassov March 2, 2020, 6:44 p.m. UTC
From: Stanislav Spassov <stanspas@amazon.de>

The first version of this patch series can be found here:
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com

Originally, this patch series aimed to only solve an issue where
pci_dev_wait can cause system crashes. After a reset, a hung device may
keep responding with CRS completions indefinitely. If CRS Software
Visibility is enabled on the Root Port, attempting to read any register
other than PCI_VENDOR_ID will cause the Root Port to autonomously retry
the request without reporting back to the CPU core. Unless the number of
retries or the amount of time spent retrying is limited by
platform-specific means, this scenario leads to low-level platform
timeouts (such as a TOR Timeout), which easily escalate to a crash.

The feedback on the first version of this patch series inspired a
deeper dive into the PCI Firmware Spec (_DSM functions 8 and 9),
which revealed several different types of delays that can be overriden
on a per-device basis to avoid waiting for too long on device that are
known to come back quickly after reset. The kernel already stores such
overrides for some, but not all of the delays.

While adding the infrastructure to allow overriding delays, I discovered
and addressed several inconsistencies between what the PCIE
Base Specification says and what the code does, and came up with more
improvements all around device resets and readiness polling.

This patch series now paves the way for Readiness Time Reporting capability
support, and touches upon (in comments) some changes that would be
required for supporting Readiness Notifications.

Stanislav Spassov (17):
  PCI: Fall back to slot/bus reset if softer methods timeout
  PCI: Remove unused PCI_PM_BUS_WAIT
  PCI: Use pci_bridge_wait_for_secondary_bus after SBR
  PCI: Do not override delay for D0->D3hot transition
  PCI: Fix handling of _DSM 8 (avoiding reset delays)
  PCI: Fix us->ms conversion in pci_acpi_optimize_delay
  PCI: Clean up and document PM/reset delays
  PCI: Add more delay overrides to struct pci_dev
  PCI: Generalize pci_bus_max_d3cold_delay to pci_bus_max_delay
  PCI: Use correct delay in pci_bridge_wait_for_secondary_bus
  PCI: Refactor pci_dev_wait to remove timeout parameter
  PCI: Refactor pci_dev_wait to take pci_init_event
  PCI: Cache CRS Software Visibiliy in struct pci_dev
  PCI: Introduce per-device reset_ready_poll override
  PCI: Refactor polling loop out of pci_dev_wait
  PCI: Add CRS handling to pci_dev_wait()
  PCI: Lower PCIE_RESET_READY_POLL_MS from 1m to 1s

 Documentation/power/pci.rst         |   4 +-
 arch/x86/pci/intel_mid_pci.c        |   2 +-
 drivers/hid/intel-ish-hid/ipc/ipc.c |   2 +-
 drivers/mfd/intel-lpss-pci.c        |   2 +-
 drivers/net/ethernet/marvell/sky2.c |   2 +-
 drivers/pci/iov.c                   |   4 +-
 drivers/pci/pci-acpi.c              | 106 +++++++++----
 drivers/pci/pci-driver.c            |   4 +-
 drivers/pci/pci.c                   | 233 +++++++++++++++++++---------
 drivers/pci/pci.h                   |  81 +++++++++-
 drivers/pci/probe.c                 |  10 +-
 drivers/pci/quirks.c                |   9 +-
 include/linux/pci-acpi.h            |   8 +-
 include/linux/pci.h                 |  45 +++++-
 14 files changed, 388 insertions(+), 124 deletions(-)


base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9