diff mbox series

[v4,3/3] PCI: Add CRS handling to pci_dev_wait()

Message ID 20200307172044.29645-4-stanspas@amazon.com
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series Improve PCI device post-reset readiness polling | expand

Commit Message

Stanislav Spassov March 7, 2020, 5:20 p.m. UTC
From: Stanislav Spassov <stanspas@amazon.de>

The PCI Express specification dictates minimal amounts of time that the
host needs to wait after triggering different kinds of resets before it
is allowed to attempt accessing the device. After this waiting period,
devices are required to be responsive to Configuration Space reads.
However, if a device needs more time to actually complete the reset
operation internally, it may respond to the read with a Completion
Request Retry Status (CRS), and keep doing so on subsequent reads
for as long as necessary. If the device is broken, it may even keep
responding with CRS indefinitely.

The specification also mandates that any Root Port that supports CRS
and has CRS Software Visibility (CRS SV) enabled will synthesize the
special value 0x0001 for the Vendor ID and set any other bits to 1
upon receiving a CRS Completion for a Configuration Read Request that
includes both bytes of the Vendor ID (offset 0).

If CRS SV is disabled or a different register (not Vendor ID) is being
read, the request is retried autonomously by the Root Port.
Platform-specific configuration registers may exist to limit the number
of or time taken by such retries.

If CRS is not supported, or a device is responding with CA/UR
Completions (rather than CRS), the behavior is platform-dependent, but
generally the Root Port synthesizes ~0 to complete the software read.

Previously, pci_dev_wait() avoided taking advantage of CRS. However,
on platforms where no retry limit/timeout can be configured, a device
responding with CRS for too long (e.g. because it is stuck and cannot
complete its reset) may trigger more severe error conditions (e.g. TOR
timeout, 3-strike CPU CATERR), because the Root Port never reports back
to the lower-level component requesting the transaction.

This patch introduces special handling when CRS is available, and
otherwise falls back to the previous behavior of polling COMMAND.

Signed-off-by: Stanislav Spassov <stanspas@amazon.de>
---
 drivers/pci/pci.c | 55 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 47 insertions(+), 8 deletions(-)

Comments

Sinan Kaya March 9, 2020, 3:55 p.m. UTC | #1
On 3/7/2020 12:20 PM, Stanislav Spassov wrote:
> +		rc = pci_dev_poll_until_not_equal(dev, PCI_VENDOR_ID, 0xffff,
> +						  0x0001, reset_type, timeout,
> +						  &waited, &id);
> +		if (rc)
> +			return rc;
> +

If I remember right, this doesn't work for VF sending CRS because VF
always returns 0xffff for VENDOR_ID register.
Raj, Ashok March 9, 2020, 4:19 p.m. UTC | #2
On Mon, Mar 09, 2020 at 11:55:11AM -0400, Sinan Kaya wrote:
> On 3/7/2020 12:20 PM, Stanislav Spassov wrote:
> > +		rc = pci_dev_poll_until_not_equal(dev, PCI_VENDOR_ID, 0xffff,
> > +						  0x0001, reset_type, timeout,
> > +						  &waited, &id);
> > +		if (rc)
> > +			return rc;
> > +
> 
> If I remember right, this doesn't work for VF sending CRS because VF
> always returns 0xffff for VENDOR_ID register.

Is this required by the PCIe spec? i think the only requirement is 
the 1s wait after PF has done the VF enable. See Implementation Note
right above section 2.3.1.1 in the Base spec 5.0. 

If this behavior is different for maybe a specific SRIOV device we should
probably quirk the standard behavior?

The rules are mentioned in so many places, but looking through the 
SRIOV section's doesn't seem to specify special rules for VF's other than
the wait time after VF enable.
Spassov, Stanislav March 9, 2020, 4:38 p.m. UTC | #3
On Mon, 2020-03-09 at 09:19 -0700, Raj, Ashok wrote:
> On Mon, Mar 09, 2020 at 11:55:11AM -0400, Sinan Kaya wrote:
> > On 3/7/2020 12:20 PM, Stanislav Spassov wrote:
> > > +           rc = pci_dev_poll_until_not_equal(dev, PCI_VENDOR_ID, 0xffff,
> > > +                                             0x0001, reset_type, timeout,
> > > +                                             &waited, &id);
> > > +           if (rc)
> > > +                   return rc;
> > > +
> > 
> > If I remember right, this doesn't work for VF sending CRS because VF
> > always returns 0xffff for VENDOR_ID register.
> 
> Is this required by the PCIe spec? i think the only requirement is
> the 1s wait after PF has done the VF enable. See Implementation Note
> right above section 2.3.1.1 in the Base spec 5.0.
> 
> If this behavior is different for maybe a specific SRIOV device we should
> probably quirk the standard behavior?
> 
> The rules are mentioned in so many places, but looking through the
> SRIOV section's doesn't seem to specify special rules for VF's other than
> the wait time after VF enable.

PCI Express Base Specification Revision 5.0 Version 1.0 (May 22, 2019)
on pages 1139 and 1140 within section 9.3.4 PF/VF Configuration Space Header
describes:

"9.3.4.1.1 Vendor ID Register Changes (Offset 00h)
...
This field in all VFs returns FFFFh when read. VI software should return the Vendor ID value from the associated PF as the
Vendor ID value for the VF."

(and similar for the Device ID)

Right after that is an Implemention Note "Legacy PCI Probing Software" that explains:

"Returning FFFFh for Device ID and Vendor ID values allows some legacy software to ignore VFs."

So, whenever a VF is providing an actual response to a vid/did read, it will respond with a Successful Completion and the data
will be 0xFFFF. However, when the VF is not ready yet after a reset, I would expect it to be returning CRS completions just like
any other device (nothing in the spec explicitly confirms or denies this, as far as I can tell). Then, the root port has no idea
if a device that it received a CRS completion from is a PF or VF, so it has to treat them equivalently, and therefore (for CRS SV enabled)
synthesize 0x0001 for the VID.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Sinan Kaya March 9, 2020, 5:33 p.m. UTC | #4
On 3/9/2020 12:38 PM, Spassov, Stanislav wrote:
> So, whenever a VF is providing an actual response to a vid/did read, it will respond with a Successful Completion and the data
> will be 0xFFFF. However, when the VF is not ready yet after a reset, I would expect it to be returning CRS completions just like
> any other device (nothing in the spec explicitly confirms or denies this, as far as I can tell). Then, the root port has no idea
> if a device that it received a CRS completion from is a PF or VF, so it has to treat them equivalently, and therefore (for CRS SV enabled)
> synthesize 0x0001 for the VID.

Looking closer, I see you brought bad_value to the function parameter.
Yes, this should work as long as device responds with 0x0001. Previous
code used to bail out on ~0x0 immediately.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 44f5d4907db6..a028147f4471 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1073,17 +1073,56 @@  static inline int pci_dev_poll_until_not_equal(struct pci_dev *dev, int where,
 
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
+	int waited = 0;
+	int rc = 0;
+
+
 	/*
 	 * After reset, the device should not silently discard config
 	 * requests, but it may still indicate that it needs more time by
-	 * responding to them with CRS completions.  The Root Port will
-	 * generally synthesize ~0 data to complete the read (except when
-	 * CRS SV is enabled and the read was for the Vendor ID; in that
-	 * case it synthesizes 0x0001 data).
-	 *
-	 * Wait for the device to return a non-CRS completion.  Read the
-	 * Command register instead of Vendor ID so we don't have to
-	 * contend with the CRS SV value.
+	 * responding to them with CRS completions. For such completions:
+	 * - If CRS SV is enabled on the Root Port, and the read request
+	 *   covers both bytes of the Vendor ID register, the Root Port
+	 *   will synthesize the value 0x0001 (and set any extra requested
+	 *   bytes to 0xff)
+	 * - If CRS SV is not enabled on the Root Port, the Root Port must
+	 *   re-issue the Configuration Request as a new Request.
+	 *   Depending on platform-specific Root Complex configurations,
+	 *   the Root Port may stop retrying after a set number of attempts,
+	 *   or a configured timeout is hit, or continue indefinitely
+	 *   (ultimately resulting in non-PCI-specific platform errors, such as
+	 *   a TOR timeout).
+	 */
+	if (dev->crssv_enabled) {
+		u32 id;
+
+		rc = pci_dev_poll_until_not_equal(dev, PCI_VENDOR_ID, 0xffff,
+						  0x0001, reset_type, timeout,
+						  &waited, &id);
+		if (rc)
+			return rc;
+
+		timeout -= waited;
+
+		/*
+		 * If Vendor/Device ID is valid, the device must be ready.
+		 * Note: SR-IOV VFs return ~0 for reads to Vendor/Device
+		 * ID and will not be recognized as ready by this check.
+		 */
+		if (id != 0x0000ffff && id != 0xffff0000 &&
+		    id != 0x00000000 && id != 0xffffffff)
+			return 0;
+	}
+
+	/*
+	 * Root Ports will generally indicate error scenarios (e.g.
+	 * internal timeouts, or received Completion with CA/UR) by
+	 * synthesizing an 'all bits set' value (~0).
+	 * In case CRS is not supported/enabled, as well as for SR-IOV VFs,
+	 * fall back to polling a different register that cannot validly
+	 * contain ~0. As of PCIe 5.0, bits 11-15 of COMMAND are still RsvdP
+	 * and must return 0 when read.
+	 * XXX: These bits might become meaningful in the future
 	 */
 	return pci_dev_poll_until_not_equal(dev, PCI_COMMAND, ~0, ~0,
 					    reset_type, timeout, NULL,