diff mbox series

[V2,1/1] Intel Sky Lake-E host root ports check.

Message ID 20220327112011.3350-2-shlomop@pliops.com (mailing list archive)
State Superseded
Headers show
Series Intel Sky Lake-E host root ports check. | expand

Commit Message

Shlomo Pongratz March 27, 2022, 11:20 a.m. UTC
On commit 7b94b53db34f ("PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to the whitelist")
Andrew Maier added the Sky Lake-E additional devices
2031, 2032 and 2033 root ports to the already existing 2030 device.

The Intel devices 2030, 2031, 2032 and 2033 which are ports A, B, C and D,
and if all exist they will occupy slots 0 till 3 in that order.

Now if for example device 2030 is missing then there will no device on slot 0, but
other devices can reside on other slots according to there port.
For this reason the test that insisted that the bridge should be on slot 0 was modified
to support bridges that are not on slot 0.

Signed-off-by: Shlomo Pongratz <shlomop@pliops.com>
---
 drivers/pci/p2pdma.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

Comments

Jason Gunthorpe March 28, 2022, 11:37 a.m. UTC | #1
On Sun, Mar 27, 2022 at 02:20:11PM +0300, Shlomo Pongratz wrote:
> On commit 7b94b53db34f ("PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to the whitelist")
> Andrew Maier added the Sky Lake-E additional devices
> 2031, 2032 and 2033 root ports to the already existing 2030 device.
> 
> The Intel devices 2030, 2031, 2032 and 2033 which are ports A, B, C and D,
> and if all exist they will occupy slots 0 till 3 in that order.
> 
> Now if for example device 2030 is missing then there will no device on slot 0, but
> other devices can reside on other slots according to there port.
> For this reason the test that insisted that the bridge should be on slot 0 was modified
> to support bridges that are not on slot 0.

This helped our systems here! Thanks

Though to be clear the BIOS/ACPI modeling seems to be wrong in a way
which prevents linux from finding the true root port which is the main
cause of this problem.

2030-2033 are *root ports* not host bridges. So when we are in
pci_host_bridge_dev() the code is not looking at the system's host
bridge device at all, but a root port off the host bridge.

Which explains why the non-zero slot is happening.

So this might be better to add a flag 'IS_ROOT_PORT' instead of 'port'
and then just ignore the slot number entirely for root ports.

Though maybe someone has a better idea how the host bridge stuff is
supposed to work on these skylake-e systems.

> + * The method above will work in most cases but not for all.
> + * Note that the Intel devices 2030, 2031, 2032 and 2033 are ports A, B, C and D.
> + * Consider on a bus X only port C is connected downstream so in the PCI scan only
> + * device 8086:2032 on 0000:X:02.0 will be found as birdges with no children are ignored

'bridges' mispelled

> + *
>   * This function is equivalent to pci_get_slot(host->bus, 0), however it does
>   * not take the pci_bus_sem lock seeing __host_bridge_whitelist() must not
>   * sleep.
> @@ -350,7 +356,10 @@ static struct pci_dev *pci_host_bridge_dev(struct pci_host_bridge *host)
>  
>  	if (!root)
>  		return NULL;
> -	if (root->devfn != PCI_DEVFN(0, 0))
> +	/* Here just check that the function is 0
> +	 * The slot number will be checked later
> +	 */
> +	if (PCI_FUNC(root->devfn) != 0)
>  		return NULL;
>  
>  	return root;
> @@ -372,6 +381,13 @@ static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>  	for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
>  		if (vendor != entry->vendor || device != entry->device)
>  			continue;
> +		/* For devices which are bounded to a specific slot
> +		 * (e.g. Intel Sky Lake-E host root ports) check the port is
> +		 * Identical to the slot number.
> +		 * For other devices continue to inssist on slot 0

"insist" mispelled.

Jason
Shlomo Pongratz March 28, 2022, 2:35 p.m. UTC | #2
Hi Jason,

Thank you for your comments I'll fix the spelling mistakes.

You suggested to remove the port field and to ignore the slot number for root ports,
and I understand the reasoning, but, from safety reasons, if we know that device 2030
will always be found on slot 0 and 2032 for example will always be found on slot 2
wouldn't it be prudent to compare the device number vs the port number,
unless you believe that the BIOS/ACPI issue will be fixed.

Shlomo.
Jason Gunthorpe March 28, 2022, 2:38 p.m. UTC | #3
On Mon, Mar 28, 2022 at 05:35:36PM +0300, Shlomo Pongratz wrote:
> Hi Jason,
> 
> Thank you for your comments I'll fix the spelling mistakes.
> 
> You suggested to remove the port field and to ignore the slot number for root ports,
> and I understand the reasoning, but, from safety reasons, if we know that device 2030
> will always be found on slot 0 and 2032 for example will always be found on slot 2
> wouldn't it be prudent to compare the device number vs the port number,
> unless you believe that the BIOS/ACPI issue will be fixed.

I'm not sure that is guaranteed, it seems like a BIOS choice..

IMHO what we know is that if we see those devices then we are on a
Skylake-E already and we don't really need more checks.

Jason
diff mbox series

Patch

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 30b1df3c9d2f..ca8585ffbe56 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -305,23 +305,24 @@  static bool cpu_supports_p2pdma(void)
 static const struct pci_p2pdma_whitelist_entry {
 	unsigned short vendor;
 	unsigned short device;
+	unsigned short port;
 	enum {
 		REQ_SAME_HOST_BRIDGE	= 1 << 0,
 	} flags;
 } pci_p2pdma_whitelist[] = {
 	/* Intel Xeon E5/Core i7 */
-	{PCI_VENDOR_ID_INTEL,	0x3c00, REQ_SAME_HOST_BRIDGE},
-	{PCI_VENDOR_ID_INTEL,	0x3c01, REQ_SAME_HOST_BRIDGE},
+	{PCI_VENDOR_ID_INTEL,	0x3c00, 0, REQ_SAME_HOST_BRIDGE},
+	{PCI_VENDOR_ID_INTEL,	0x3c01, 0, REQ_SAME_HOST_BRIDGE},
 	/* Intel Xeon E7 v3/Xeon E5 v3/Core i7 */
-	{PCI_VENDOR_ID_INTEL,	0x2f00, REQ_SAME_HOST_BRIDGE},
-	{PCI_VENDOR_ID_INTEL,	0x2f01, REQ_SAME_HOST_BRIDGE},
+	{PCI_VENDOR_ID_INTEL,	0x2f00, 0, REQ_SAME_HOST_BRIDGE},
+	{PCI_VENDOR_ID_INTEL,	0x2f01, 0, REQ_SAME_HOST_BRIDGE},
 	/* Intel SkyLake-E */
-	{PCI_VENDOR_ID_INTEL,	0x2030, 0},
-	{PCI_VENDOR_ID_INTEL,	0x2031, 0},
-	{PCI_VENDOR_ID_INTEL,	0x2032, 0},
-	{PCI_VENDOR_ID_INTEL,	0x2033, 0},
-	{PCI_VENDOR_ID_INTEL,	0x2020, 0},
-	{PCI_VENDOR_ID_INTEL,	0x09a2, 0},
+	{PCI_VENDOR_ID_INTEL,	0x2030, 0, 0},
+	{PCI_VENDOR_ID_INTEL,	0x2031, 1, 0},
+	{PCI_VENDOR_ID_INTEL,	0x2032, 2, 0},
+	{PCI_VENDOR_ID_INTEL,	0x2033, 3, 0},
+	{PCI_VENDOR_ID_INTEL,	0x2020, 0, 0},
+	{PCI_VENDOR_ID_INTEL,	0x09a2, 0, 0},
 	{}
 };
 
@@ -333,6 +334,11 @@  static const struct pci_p2pdma_whitelist_entry {
  * bus->devices list and that the devfn is 00.0. These assumptions should hold
  * for all the devices in the whitelist above.
  *
+ * The method above will work in most cases but not for all.
+ * Note that the Intel devices 2030, 2031, 2032 and 2033 are ports A, B, C and D.
+ * Consider on a bus X only port C is connected downstream so in the PCI scan only
+ * device 8086:2032 on 0000:X:02.0 will be found as birdges with no children are ignored
+ *
  * This function is equivalent to pci_get_slot(host->bus, 0), however it does
  * not take the pci_bus_sem lock seeing __host_bridge_whitelist() must not
  * sleep.
@@ -350,7 +356,10 @@  static struct pci_dev *pci_host_bridge_dev(struct pci_host_bridge *host)
 
 	if (!root)
 		return NULL;
-	if (root->devfn != PCI_DEVFN(0, 0))
+	/* Here just check that the function is 0
+	 * The slot number will be checked later
+	 */
+	if (PCI_FUNC(root->devfn) != 0)
 		return NULL;
 
 	return root;
@@ -372,6 +381,13 @@  static bool __host_bridge_whitelist(struct pci_host_bridge *host,
 	for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
 		if (vendor != entry->vendor || device != entry->device)
 			continue;
+		/* For devices which are bounded to a specific slot
+		 * (e.g. Intel Sky Lake-E host root ports) check the port is
+		 * Identical to the slot number.
+		 * For other devices continue to inssist on slot 0
+		 */
+		if (PCI_SLOT(root->devfn) != entry->port)
+			return false;
 		if (entry->flags & REQ_SAME_HOST_BRIDGE && !same_host_bridge)
 			return false;