diff mbox

pci/probe: Enable CRS for Intel Haswell root ports

Message ID 53FFA54D.9000907@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Rajat Jain Aug. 28, 2014, 9:55 p.m. UTC
The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
retry the configuration cycles, if an endpoint responds with a CRS
(Configuration Request Retry Status), and the "CRS Software Visibility"
flag is not set at the root port. This results in a CPU hang, when the
kernel tries to enumerate the device that responds with CRS.

Please note that this root port behavior (of endless retries) is still
compliant with PCIe spec as the spec leaves the behavior open to
implementation, on how many retries to do if "CRS visibility flag" is
not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)

Ref1:
https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf

Ref2:
PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"

Following CPUs are affected:
http://ark.intel.com/products/codename/42174/Haswell#@All

Thus we need to enable the CRS visibility flag for such root ports. The
commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
default") suggests to maintain a whitelist of the systems for which CRS
should be enabled. This patch does the same.

Note: Looking at the spec and reading about the CRS, IMHO the "CRS
visibility" looks like a good thing to me that should always be enabled
on the root ports that support it. And may be we should always enable
it if supported and maintain a blacklist of devices on which should be
disabled (because of known issues).

How I stumbled upon this and tested the fix:

Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)

I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
with CRS for a long time when the kernel tries to enumerate the
endpoint, trying to indicate that the device is not yet ready. This is
because it needs some configuration over I2C in order to complete its
reset sequence. This results in a CPU hang during enumeration.

I used this setup to fix and test this issue. After enabling the CRS
visibility flag at the root port, I see that CPU moves on as expected
declaring the following (instead of freezing):

pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
ffff0001

Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
---
Hi Bjorn / folks,

I had also saught suggestions on how this patch should be modelled.
Please find a suggestive alternative here:

https://lkml.org/lkml/2014/8/1/186

Please let me know your thoughts.

Thanks,

Rajat




 drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Wei Yang Aug. 29, 2014, 4:04 a.m. UTC | #1
On Thu, Aug 28, 2014 at 02:55:25PM -0700, Rajat Jain wrote:
>The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
>retry the configuration cycles, if an endpoint responds with a CRS
>(Configuration Request Retry Status), and the "CRS Software Visibility"
>flag is not set at the root port. This results in a CPU hang, when the
>kernel tries to enumerate the device that responds with CRS.
>
>Please note that this root port behavior (of endless retries) is still
>compliant with PCIe spec as the spec leaves the behavior open to
>implementation, on how many retries to do if "CRS visibility flag" is
>not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)
>
>Ref1:
>https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf
>
>Ref2:
>PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"
>
>Following CPUs are affected:
>http://ark.intel.com/products/codename/42174/Haswell#@All
>
>Thus we need to enable the CRS visibility flag for such root ports. The
>commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
>default") suggests to maintain a whitelist of the systems for which CRS
>should be enabled. This patch does the same.
>
>Note: Looking at the spec and reading about the CRS, IMHO the "CRS
>visibility" looks like a good thing to me that should always be enabled
>on the root ports that support it. And may be we should always enable
>it if supported and maintain a blacklist of devices on which should be
>disabled (because of known issues).
>
>How I stumbled upon this and tested the fix:
>
>Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)
>
>I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
>with CRS for a long time when the kernel tries to enumerate the
>endpoint, trying to indicate that the device is not yet ready. This is
>because it needs some configuration over I2C in order to complete its
>reset sequence. This results in a CPU hang during enumeration.
>
>I used this setup to fix and test this issue. After enabling the CRS
>visibility flag at the root port, I see that CPU moves on as expected
>declaring the following (instead of freezing):
>
>pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
>ffff0001
>
>Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
>Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>Signed-off-by: Guenter Roeck <groeck@juniper.net>
>---
>Hi Bjorn / folks,
>
>I had also saught suggestions on how this patch should be modelled.
>Please find a suggestive alternative here:
>
>https://lkml.org/lkml/2014/8/1/186
>
>Please let me know your thoughts.
>
>Thanks,
>
>Rajat
>
>
>
>
> drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>index e3cf8a2..909ca75 100644
>--- a/drivers/pci/probe.c
>+++ b/drivers/pci/probe.c
>@@ -740,6 +740,32 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
> }
> EXPORT_SYMBOL(pci_add_new_bus);
>
>+static const struct pci_device_id crs_whitelist[] = {
>+	{ PCI_VDEVICE(INTEL, 0x2f00), },
>+	{ PCI_VDEVICE(INTEL, 0x2f01), },
>+	{ PCI_VDEVICE(INTEL, 0x2f02), },
>+	{ PCI_VDEVICE(INTEL, 0x2f03), },
>+	{ PCI_VDEVICE(INTEL, 0x2f04), },
>+	{ PCI_VDEVICE(INTEL, 0x2f05), },
>+	{ PCI_VDEVICE(INTEL, 0x2f06), },
>+	{ PCI_VDEVICE(INTEL, 0x2f07), },
>+	{ PCI_VDEVICE(INTEL, 0x2f08), },
>+	{ PCI_VDEVICE(INTEL, 0x2f09), },
>+	{ PCI_VDEVICE(INTEL, 0x2f0a), },
>+	{ PCI_VDEVICE(INTEL, 0x2f0b), },
>+	{ },
>+};
>+
>+static void pci_enable_crs(struct pci_dev *dev)
>+{
>+	/* Enable CRS Software visibility only for whitelisted systems */
>+	if (pci_is_pcie(dev) &&
>+	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
>+	    pci_match_id(crs_whitelist, dev))
>+		pcie_capability_set_word(dev, PCI_EXP_RTCTL,
>+					 PCI_EXP_RTCTL_CRSSVE);
>+}
>+
> /*
>  * If it's a bridge, configure it and scan the bus behind it.
>  * For CardBus bridges, we don't scan behind as the devices will
>@@ -787,6 +813,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
> 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
> 			      bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>
>+	pci_enable_crs(dev);
>+

Rajat,

I am not familiar with the CRS Software Visibility, but I think your code
could fix the problem.

While I have one tiny suggestion. I see the commit 
"ad7edfe04908 [PCI] Do not enable CRS Software Visibility by default" has the
pci_enable_crs() called in pci_scan_bridge(), while I think this may not be a
very good place for this fix.

How about have a early fixup for these intel devices? Since the original code
is called on each bridge, while this fix is just invoked on specific devices.
And sounds we are not planing to have this enabled by default in a short term.
If we have other devices to be in the white list in the future, we would expand
this list. This will make the probe.c not that generic. 

Hmm... to me, enable this in the eary fixup is a different stage as you did in
pci_scan_bridge(). Not sure enable them in a different stage will effect the 
behavior. If this matters, my suggestion is not a good one.

> 	if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
> 	    !is_cardbus && !broken) {
> 		unsigned int cmax;
>-- 
>1.7.9.5
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Aug. 29, 2014, 5:11 p.m. UTC | #2
Hello Wei Yang,

Thanks for your mail and review.

On Thu, Aug 28, 2014 at 9:04 PM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Thu, Aug 28, 2014 at 02:55:25PM -0700, Rajat Jain wrote:
>>The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
>>retry the configuration cycles, if an endpoint responds with a CRS
>>(Configuration Request Retry Status), and the "CRS Software Visibility"
>>flag is not set at the root port. This results in a CPU hang, when the
>>kernel tries to enumerate the device that responds with CRS.
>>
>>Please note that this root port behavior (of endless retries) is still
>>compliant with PCIe spec as the spec leaves the behavior open to
>>implementation, on how many retries to do if "CRS visibility flag" is
>>not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)
>>
>>Ref1:
>>https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf
>>
>>Ref2:
>>PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"
>>
>>Following CPUs are affected:
>>http://ark.intel.com/products/codename/42174/Haswell#@All
>>
>>Thus we need to enable the CRS visibility flag for such root ports. The
>>commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
>>default") suggests to maintain a whitelist of the systems for which CRS
>>should be enabled. This patch does the same.
>>
>>Note: Looking at the spec and reading about the CRS, IMHO the "CRS
>>visibility" looks like a good thing to me that should always be enabled
>>on the root ports that support it. And may be we should always enable
>>it if supported and maintain a blacklist of devices on which should be
>>disabled (because of known issues).
>>
>>How I stumbled upon this and tested the fix:
>>
>>Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)
>>
>>I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
>>with CRS for a long time when the kernel tries to enumerate the
>>endpoint, trying to indicate that the device is not yet ready. This is
>>because it needs some configuration over I2C in order to complete its
>>reset sequence. This results in a CPU hang during enumeration.
>>
>>I used this setup to fix and test this issue. After enabling the CRS
>>visibility flag at the root port, I see that CPU moves on as expected
>>declaring the following (instead of freezing):
>>
>>pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
>>ffff0001
>>
>>Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
>>Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>>Signed-off-by: Guenter Roeck <groeck@juniper.net>
>>---
>>Hi Bjorn / folks,
>>
>>I had also saught suggestions on how this patch should be modelled.
>>Please find a suggestive alternative here:
>>
>>https://lkml.org/lkml/2014/8/1/186
>>
>>Please let me know your thoughts.
>>
>>Thanks,
>>
>>Rajat
>>
>>
>>
>>
>> drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>>diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>index e3cf8a2..909ca75 100644
>>--- a/drivers/pci/probe.c
>>+++ b/drivers/pci/probe.c
>>@@ -740,6 +740,32 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>> }
>> EXPORT_SYMBOL(pci_add_new_bus);
>>
>>+static const struct pci_device_id crs_whitelist[] = {
>>+      { PCI_VDEVICE(INTEL, 0x2f00), },
>>+      { PCI_VDEVICE(INTEL, 0x2f01), },
>>+      { PCI_VDEVICE(INTEL, 0x2f02), },
>>+      { PCI_VDEVICE(INTEL, 0x2f03), },
>>+      { PCI_VDEVICE(INTEL, 0x2f04), },
>>+      { PCI_VDEVICE(INTEL, 0x2f05), },
>>+      { PCI_VDEVICE(INTEL, 0x2f06), },
>>+      { PCI_VDEVICE(INTEL, 0x2f07), },
>>+      { PCI_VDEVICE(INTEL, 0x2f08), },
>>+      { PCI_VDEVICE(INTEL, 0x2f09), },
>>+      { PCI_VDEVICE(INTEL, 0x2f0a), },
>>+      { PCI_VDEVICE(INTEL, 0x2f0b), },
>>+      { },
>>+};
>>+
>>+static void pci_enable_crs(struct pci_dev *dev)
>>+{
>>+      /* Enable CRS Software visibility only for whitelisted systems */
>>+      if (pci_is_pcie(dev) &&
>>+          pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
>>+          pci_match_id(crs_whitelist, dev))
>>+              pcie_capability_set_word(dev, PCI_EXP_RTCTL,
>>+                                       PCI_EXP_RTCTL_CRSSVE);
>>+}
>>+
>> /*
>>  * If it's a bridge, configure it and scan the bus behind it.
>>  * For CardBus bridges, we don't scan behind as the devices will
>>@@ -787,6 +813,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
>>       pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
>>                             bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>>
>>+      pci_enable_crs(dev);
>>+
>
> Rajat,
>
> I am not familiar with the CRS Software Visibility, but I think your code
> could fix the problem.
>
> While I have one tiny suggestion. I see the commit
> "ad7edfe04908 [PCI] Do not enable CRS Software Visibility by default" has the
> pci_enable_crs() called in pci_scan_bridge(), while I think this may not be a
> very good place for this fix.
>
> How about have a early fixup for these intel devices? Since the original code
> is called on each bridge, while this fix is just invoked on specific devices.
> And sounds we are not planing to have this enabled by default in a short term.

Yes, my current patch is like that.

However to the extent I could understand, the CRS seemed only a good
thing to me and I could not really visualize in what conditions could
it create a problem. I was actually seeking opinion on if we should
enable it always and instead maintain a blacklist of systems on which
it is known to cause problems and should be disabled. But I do
understand that obtaining such a blacklist may not be possible since
the original commit is pretty old. The good thing with a black list
woulds that there may be other PCI root port bridges with the same
(PCIe compliant) behavior and will hang.. Such cases will be
automatically taken care since it is very difficult in those cases to
debug and trace the problem to this CRS issue.

> If we have other devices to be in the white list in the future, we would
> expand this list. This will make the probe.c not that generic.

I agree that we shouldn't really be putting things like device lists
in  a generic file such as probe.c. However, I think it is important
that we make the call to enable CRS visibility somewhere in the common
PCI root port initialization path (although the decision whether or
not to enable it, or enable it for certain devices only can be taken
from a platform / device specific file). That is so that any developer
seeing similar issues and trying to debug the PCI code should be able
to stumble upon CRS and think "Hmmm ... maybe I need this for my
device also?". I did not want to separate this out as just a quirk,
because quirks are typically seen as a device anomaly (which in this
case it is not). And developers wouldn't' tend to delve into unrelated
device quirks, to debug a problem they are seeing on a different
device.

To this end, I had a different suggestive patch posted here, that
maintains device lists in platform specific files:

https://lkml.org/lkml/2014/8/1/186

Please let me know if you think this is any better?

>
> Hmm... to me, enable this in the eary fixup is a different stage as you did in
> pci_scan_bridge(). Not sure enable them in a different stage will effect the
> behavior. If this matters, my suggestion is not a good one.


Your suggestions are very welcome, thus please keep them coming. I can
try moving the call to "enable_crs" around - however since the
original commit had it in this place, I had decided to keep it here.
The other thing that I was not clear was under which reset conditions
would the CRS visibility flag get cleared? Would it get cleared if we
do a root port reset (or any PCI resets)? If so a quirk might not
work. I looked though the PCIe specs and did not find any hint on when
could this be cleared. So I decided to play safe and leave it in the
place it originally was.

Thanks,

Rajat


>
>>       if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
>>           !is_cardbus && !broken) {
>>               unsigned int cmax;
>>--
>>1.7.9.5
>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Richard Yang
> Help you, Help me
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang Aug. 31, 2014, 1:31 p.m. UTC | #3
On Fri, Aug 29, 2014 at 10:11:16AM -0700, Rajat Jain wrote:
>Hello Wei Yang,
>
>Thanks for your mail and review.
>
>On Thu, Aug 28, 2014 at 9:04 PM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Thu, Aug 28, 2014 at 02:55:25PM -0700, Rajat Jain wrote:
>>>The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
>>>retry the configuration cycles, if an endpoint responds with a CRS
>>>(Configuration Request Retry Status), and the "CRS Software Visibility"
>>>flag is not set at the root port. This results in a CPU hang, when the
>>>kernel tries to enumerate the device that responds with CRS.
>>>
>>>Please note that this root port behavior (of endless retries) is still
>>>compliant with PCIe spec as the spec leaves the behavior open to
>>>implementation, on how many retries to do if "CRS visibility flag" is
>>>not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)
>>>
>>>Ref1:
>>>https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf
>>>
>>>Ref2:
>>>PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"
>>>
>>>Following CPUs are affected:
>>>http://ark.intel.com/products/codename/42174/Haswell#@All
>>>
>>>Thus we need to enable the CRS visibility flag for such root ports. The
>>>commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
>>>default") suggests to maintain a whitelist of the systems for which CRS
>>>should be enabled. This patch does the same.
>>>
>>>Note: Looking at the spec and reading about the CRS, IMHO the "CRS
>>>visibility" looks like a good thing to me that should always be enabled
>>>on the root ports that support it. And may be we should always enable
>>>it if supported and maintain a blacklist of devices on which should be
>>>disabled (because of known issues).
>>>
>>>How I stumbled upon this and tested the fix:
>>>
>>>Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)
>>>
>>>I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
>>>with CRS for a long time when the kernel tries to enumerate the
>>>endpoint, trying to indicate that the device is not yet ready. This is
>>>because it needs some configuration over I2C in order to complete its
>>>reset sequence. This results in a CPU hang during enumeration.
>>>
>>>I used this setup to fix and test this issue. After enabling the CRS
>>>visibility flag at the root port, I see that CPU moves on as expected
>>>declaring the following (instead of freezing):
>>>
>>>pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
>>>ffff0001
>>>
>>>Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
>>>Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>>>Signed-off-by: Guenter Roeck <groeck@juniper.net>
>>>---
>>>Hi Bjorn / folks,
>>>
>>>I had also saught suggestions on how this patch should be modelled.
>>>Please find a suggestive alternative here:
>>>
>>>https://lkml.org/lkml/2014/8/1/186
>>>
>>>Please let me know your thoughts.
>>>
>>>Thanks,
>>>
>>>Rajat
>>>
>>>
>>>
>>>
>>> drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
>>> 1 file changed, 28 insertions(+)
>>>
>>>diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>index e3cf8a2..909ca75 100644
>>>--- a/drivers/pci/probe.c
>>>+++ b/drivers/pci/probe.c
>>>@@ -740,6 +740,32 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>>> }
>>> EXPORT_SYMBOL(pci_add_new_bus);
>>>
>>>+static const struct pci_device_id crs_whitelist[] = {
>>>+      { PCI_VDEVICE(INTEL, 0x2f00), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f01), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f02), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f03), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f04), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f05), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f06), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f07), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f08), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f09), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f0a), },
>>>+      { PCI_VDEVICE(INTEL, 0x2f0b), },
>>>+      { },
>>>+};
>>>+
>>>+static void pci_enable_crs(struct pci_dev *dev)
>>>+{
>>>+      /* Enable CRS Software visibility only for whitelisted systems */
>>>+      if (pci_is_pcie(dev) &&
>>>+          pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
>>>+          pci_match_id(crs_whitelist, dev))
>>>+              pcie_capability_set_word(dev, PCI_EXP_RTCTL,
>>>+                                       PCI_EXP_RTCTL_CRSSVE);
>>>+}
>>>+
>>> /*
>>>  * If it's a bridge, configure it and scan the bus behind it.
>>>  * For CardBus bridges, we don't scan behind as the devices will
>>>@@ -787,6 +813,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
>>>       pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
>>>                             bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>>>
>>>+      pci_enable_crs(dev);
>>>+
>>
>> Rajat,
>>
>> I am not familiar with the CRS Software Visibility, but I think your code
>> could fix the problem.
>>
>> While I have one tiny suggestion. I see the commit
>> "ad7edfe04908 [PCI] Do not enable CRS Software Visibility by default" has the
>> pci_enable_crs() called in pci_scan_bridge(), while I think this may not be a
>> very good place for this fix.
>>
>> How about have a early fixup for these intel devices? Since the original code
>> is called on each bridge, while this fix is just invoked on specific devices.
>> And sounds we are not planing to have this enabled by default in a short term.
>
>Yes, my current patch is like that.
>
>However to the extent I could understand, the CRS seemed only a good
>thing to me and I could not really visualize in what conditions could
>it create a problem. I was actually seeking opinion on if we should
>enable it always and instead maintain a blacklist of systems on which
>it is known to cause problems and should be disabled. But I do
>understand that obtaining such a blacklist may not be possible since
>the original commit is pretty old. The good thing with a black list
>woulds that there may be other PCI root port bridges with the same
>(PCIe compliant) behavior and will hang.. Such cases will be
>automatically taken care since it is very difficult in those cases to
>debug and trace the problem to this CRS issue.
>
>> If we have other devices to be in the white list in the future, we would
>> expand this list. This will make the probe.c not that generic.
>
>I agree that we shouldn't really be putting things like device lists
>in  a generic file such as probe.c. However, I think it is important
>that we make the call to enable CRS visibility somewhere in the common
>PCI root port initialization path (although the decision whether or
>not to enable it, or enable it for certain devices only can be taken
>from a platform / device specific file). That is so that any developer
>seeing similar issues and trying to debug the PCI code should be able
>to stumble upon CRS and think "Hmmm ... maybe I need this for my
>device also?". I did not want to separate this out as just a quirk,
>because quirks are typically seen as a device anomaly (which in this
>case it is not). And developers wouldn't' tend to delve into unrelated
>device quirks, to debug a problem they are seeing on a different
>device.
>
>To this end, I had a different suggestive patch posted here, that
>maintains device lists in platform specific files:
>
>https://lkml.org/lkml/2014/8/1/186
>
>Please let me know if you think this is any better?
>

Hmm... I see this one, you make pcibios_enable_crs() a weak function and
implement it in x86 arch. 

I see you concern to have them in a generic code instead of putting them in
quirk. If so, this is a good solution. Sounds I don't find a better one.

Let's see whether others have better idea. :-)

>>
>> Hmm... to me, enable this in the eary fixup is a different stage as you did in
>> pci_scan_bridge(). Not sure enable them in a different stage will effect the
>> behavior. If this matters, my suggestion is not a good one.
>
>
>Your suggestions are very welcome, thus please keep them coming. I can
>try moving the call to "enable_crs" around - however since the
>original commit had it in this place, I had decided to keep it here.
>The other thing that I was not clear was under which reset conditions
>would the CRS visibility flag get cleared? Would it get cleared if we
>do a root port reset (or any PCI resets)? If so a quirk might not
>work. I looked though the PCIe specs and did not find any hint on when
>could this be cleared. So I decided to play safe and leave it in the
>place it originally was.
>
>Thanks,
>
>Rajat
>
>
>>
>>>       if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
>>>           !is_cardbus && !broken) {
>>>               unsigned int cmax;
>>>--
>>>1.7.9.5
>>>
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Richard Yang
>> Help you, Help me
>>
Bjorn Helgaas Sept. 2, 2014, 4:14 a.m. UTC | #4
[+cc Linus (author of ad7edfe04908), Matthew (author of 07ff9220908c
from full history), Yinghai (author of 2f5d8e4ff947), Richard]

On Thu, Aug 28, 2014 at 3:55 PM, Rajat Jain <rajatxjain@gmail.com> wrote:
> The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
> retry the configuration cycles, if an endpoint responds with a CRS
> (Configuration Request Retry Status), and the "CRS Software Visibility"
> flag is not set at the root port. This results in a CPU hang, when the
> kernel tries to enumerate the device that responds with CRS.

Help me understand this.  I thought a config read of Vendor ID would see:

  - 0xffff if no device is present, or
  - 0x0001 if the device is present but not ready yet, and the OS
should retry the read.  This only happens when Software Visibility is
enabled.
  - Normal Vendor ID if device is present and ready.  If Software
Visibility is disabled (as it is in Linux today) and the endpoint
responds with CRS, the Root Complex may retry the config read
invisibly to software.

I suppose these retries are causing your hang, but since the device is
actually present, I would think the CPU would see a read that took a
very long time to complete, but that did eventually complete.  Is this
causing a CPU timeout?  Is the endpoint broken and returning CRS
endlessly?

> Please note that this root port behavior (of endless retries) is still
> compliant with PCIe spec as the spec leaves the behavior open to
> implementation, on how many retries to do if "CRS visibility flag" is
> not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)
>
> Ref1:
> https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf
>
> Ref2:
> PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"
>
> Following CPUs are affected:
> http://ark.intel.com/products/codename/42174/Haswell#@All
>
> Thus we need to enable the CRS visibility flag for such root ports. The
> commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
> default") suggests to maintain a whitelist of the systems for which CRS
> should be enabled. This patch does the same.

I'm not a fan of adding a whitelist for devices that work correctly.
I don't think that's a maintainable solution.  Since we haven't had
many systems yet that care about CRS, some kind of "enable CRS on
machines newer than X" might work.

From Linus' email (https://lkml.org/lkml/2007/12/27/92), the only
detailed data I found was from Ciaran McCreesh
(https://lkml.org/lkml/2007/10/29/500).  That case is interesting
because RootCap claims CRSVisible is not supported, but RootCtl claims
it is enabled:

  00:04.0 PCI bridge: ATI Technologies Inc Unknown device 7934
(prog-if 00 [Normal decode])
      Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
      Capabilities: [58] Express (v1) Root Port (Slot+), MSI 00
          RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna-
CRSVisible+
          RootCap: CRSVisible-
  02:00.0 Ethernet controller: Unknown device 0001:8168 (rev 01)

The Linux code removed by ad7edfe04908 doesn't bother to check the
RootCap bit; it simply tries to set the RootCtl enable.  Per spec,
that would be safe because the RootCtl enable bit should be hardwired
to 0 if the Root Port doesn't implement the capability, but I can
imagine a hardware bug where CRSVisible is partly implemented and the
designer assumes it won't be used because he doesn't advertise that
it's supported.

I think you should add that RootCap test to your patch.  It would be
very interesting to test that on some of these machines where we found
problems.

> Note: Looking at the spec and reading about the CRS, IMHO the "CRS
> visibility" looks like a good thing to me that should always be enabled
> on the root ports that support it. And may be we should always enable
> it if supported and maintain a blacklist of devices on which should be
> disabled (because of known issues).
>
> How I stumbled upon this and tested the fix:
>
> Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)
>
> I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
> with CRS for a long time when the kernel tries to enumerate the
> endpoint, trying to indicate that the device is not yet ready. This is
> because it needs some configuration over I2C in order to complete its
> reset sequence. This results in a CPU hang during enumeration.

Is the I2C configuration something the OS needs to do?  If so, I
suppose we will always get CRS after reset, and we'll always have to
wait for a timeout, then do that I2C configuration, then somehow try
to re-enumerate the device?  That seems like a broken usage model.  I
guess maybe the I2C configurator could initiate a rescan to sort of
patch things up, but it still seems wrong that we have the timeout and
generate a warning.

> I used this setup to fix and test this issue. After enabling the CRS
> visibility flag at the root port, I see that CPU moves on as expected
> declaring the following (instead of freezing):
>
> pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
> ffff0001

This message is from pci_bus_check_dev() (added by 2f5d8e4ff947).
pci_bus_read_dev_vendor_id() already has an internal timeout mechanism
with a message that makes sense.  I don't understand why we need
another one with a different, unintelligible, message in
pci_bus_check_dev().

> Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Guenter Roeck <groeck@juniper.net>
> ---
> Hi Bjorn / folks,
>
> I had also saught suggestions on how this patch should be modelled.
> Please find a suggestive alternative here:
>
> https://lkml.org/lkml/2014/8/1/186
>
> Please let me know your thoughts.
>
> Thanks,
>
> Rajat
>
>
>
>
>  drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..909ca75 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -740,6 +740,32 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>  }
>  EXPORT_SYMBOL(pci_add_new_bus);
>
> +static const struct pci_device_id crs_whitelist[] = {
> +       { PCI_VDEVICE(INTEL, 0x2f00), },
> +       { PCI_VDEVICE(INTEL, 0x2f01), },
> +       { PCI_VDEVICE(INTEL, 0x2f02), },
> +       { PCI_VDEVICE(INTEL, 0x2f03), },
> +       { PCI_VDEVICE(INTEL, 0x2f04), },
> +       { PCI_VDEVICE(INTEL, 0x2f05), },
> +       { PCI_VDEVICE(INTEL, 0x2f06), },
> +       { PCI_VDEVICE(INTEL, 0x2f07), },
> +       { PCI_VDEVICE(INTEL, 0x2f08), },
> +       { PCI_VDEVICE(INTEL, 0x2f09), },
> +       { PCI_VDEVICE(INTEL, 0x2f0a), },
> +       { PCI_VDEVICE(INTEL, 0x2f0b), },
> +       { },
> +};
> +
> +static void pci_enable_crs(struct pci_dev *dev)
> +{
> +       /* Enable CRS Software visibility only for whitelisted systems */
> +       if (pci_is_pcie(dev) &&
> +           pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
> +           pci_match_id(crs_whitelist, dev))
> +               pcie_capability_set_word(dev, PCI_EXP_RTCTL,
> +                                        PCI_EXP_RTCTL_CRSSVE);
> +}
> +
>  /*
>   * If it's a bridge, configure it and scan the bus behind it.
>   * For CardBus bridges, we don't scan behind as the devices will
> @@ -787,6 +813,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
>         pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
>                               bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>
> +       pci_enable_crs(dev);
> +
>         if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
>             !is_cardbus && !broken) {
>                 unsigned int cmax;
> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Sept. 2, 2014, 6:39 p.m. UTC | #5
Hi,

On Mon, Sep 1, 2014 at 9:14 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Linus (author of ad7edfe04908), Matthew (author of 07ff9220908c
> from full history), Yinghai (author of 2f5d8e4ff947), Richard]
>
> On Thu, Aug 28, 2014 at 3:55 PM, Rajat Jain <rajatxjain@gmail.com> wrote:
>> The PCIe root port of the Intel Haswell CPU, has a behavior to endlessly
>> retry the configuration cycles, if an endpoint responds with a CRS
>> (Configuration Request Retry Status), and the "CRS Software Visibility"
>> flag is not set at the root port. This results in a CPU hang, when the
>> kernel tries to enumerate the device that responds with CRS.
>
> Help me understand this.  I thought a config read of Vendor ID would see:
>
>   - 0xffff if no device is present, or
>   - 0x0001 if the device is present but not ready yet, and the OS
> should retry the read.  This only happens when Software Visibility is
> enabled.
>   - Normal Vendor ID if device is present and ready.  If Software
> Visibility is disabled (as it is in Linux today) and the endpoint
> responds with CRS, the Root Complex may retry the config read
> invisibly to software.

Yes, this is absolutely correct. However, note that if software
visibility is disabled, the root complex may retry the config reads -
but the PCIe spec leaves it open on whether or not it should retry or
for how long it should retry. In this current case of Intel CPU, I
confirmed with Intel that their CPU has chosen to retry indefinitely.
In other words, the Intel root complex will keep on retrying
(invisible to software) until the endpoint responds with a status
other than CRS.

>
> I suppose these retries are causing your hang, but since the device is
> actually present, I would think the CPU would see a read that took a
> very long time to complete, but that did eventually complete.

No, the read never completes as the root complex keeps on retrying
since it keeps getting CRS.

>  Is this
> causing a CPU timeout?  Is the endpoint broken and returning CRS
> endlessly?

Er, yes the endpoint would keep on returning CRS for a veryyyy long
time (until it is configured using I2C). However, the end point is not
really broken because (in our case) the I2C configuration it is
expecting, is to be done from user space (using i2c-dev). Since this
enumeration (and this hang) happens at the kernel boot up time, we
never really boot up and the userspace never gets a chance to do that
I2C configuration. Thus we are stuck in an endless loop of CRS.

I see reasons to enable CRS (without regard to the my specific HW)
today because even in the case of a broken endpoint (that returns CRS
indefinitely), the system should be able to ignore that device and
move on. In other words a broken endpoint should not cause the entire
PCI hierarchy to fail. The primary reason for introducing CRS in the
PCIe spec, IMHO probably was to arm the OS so that it can deal with
such broken devices?


>
>> Please note that this root port behavior (of endless retries) is still
>> compliant with PCIe spec as the spec leaves the behavior open to
>> implementation, on how many retries to do if "CRS visibility flag" is
>> not enabled and it receives a CRS. (Intel has chosen to retry indefinitely)
>>
>> Ref1:
>> https://www.pcisig.com/specifications/pciexpress/ECN_CRS_Software_Visibility_No27.pdf
>>
>> Ref2:
>> PCIe spec V3.0, pg119, pg127 for "Configuration Request Retry Status"
>>
>> Following CPUs are affected:
>> http://ark.intel.com/products/codename/42174/Haswell#@All
>>
>> Thus we need to enable the CRS visibility flag for such root ports. The
>> commit ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by
>> default") suggests to maintain a whitelist of the systems for which CRS
>> should be enabled. This patch does the same.
>
> I'm not a fan of adding a whitelist for devices that work correctly.
> I don't think that's a maintainable solution.  Since we haven't had
> many systems yet that care about CRS, some kind of "enable CRS on
> machines newer than X" might work.

I agree. I am willing to do whatever is suggested by the community.
The only thing I was not sure is if it is possible to get the list of
blacklisted devices, and the availability of those devices for
testing.

>
> From Linus' email (https://lkml.org/lkml/2007/12/27/92), the only
> detailed data I found was from Ciaran McCreesh
> (https://lkml.org/lkml/2007/10/29/500).  That case is interesting
> because RootCap claims CRSVisible is not supported, but RootCtl claims
> it is enabled:
>
>   00:04.0 PCI bridge: ATI Technologies Inc Unknown device 7934
> (prog-if 00 [Normal decode])
>       Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
>       Capabilities: [58] Express (v1) Root Port (Slot+), MSI 00
>           RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna-
> CRSVisible+
>           RootCap: CRSVisible-
>   02:00.0 Ethernet controller: Unknown device 0001:8168 (rev 01)
>
> The Linux code removed by ad7edfe04908 doesn't bother to check the
> RootCap bit; it simply tries to set the RootCtl enable.  Per spec,
> that would be safe because the RootCtl enable bit should be hardwired
> to 0 if the Root Port doesn't implement the capability, but I can
> imagine a hardware bug where CRSVisible is partly implemented and the
> designer assumes it won't be used because he doesn't advertise that
> it's supported.
>
> I think you should add that RootCap test to your patch.

Yes, I agree it is a good idea, I will add that.

>  It would be
> very interesting to test that on some of these machines where we found
> problems.
>
>> Note: Looking at the spec and reading about the CRS, IMHO the "CRS
>> visibility" looks like a good thing to me that should always be enabled
>> on the root ports that support it. And may be we should always enable
>> it if supported and maintain a blacklist of devices on which should be
>> disabled (because of known issues).
>>
>> How I stumbled upon this and tested the fix:
>>
>> Root port: PCI bridge: Intel Corporation Device 2f02 (rev 01)
>>
>> I have a PCIe endpoint (a PLX 8713 NT bridge) that will keep on responding
>> with CRS for a long time when the kernel tries to enumerate the
>> endpoint, trying to indicate that the device is not yet ready. This is
>> because it needs some configuration over I2C in order to complete its
>> reset sequence. This results in a CPU hang during enumeration.
>
> Is the I2C configuration something the OS needs to do?  If so, I
> suppose we will always get CRS after reset, and we'll always have to
> wait for a timeout, then do that I2C configuration, then somehow try
> to re-enumerate the device?

The current problem is that we do not have drivers for I2C interface
of such PCIe devices in the kernel. I have such a driver almost ready
and I'm planning to send it out to the community. But until that
happens, we're configuring it in user space using i2c-dev, and then
will initiate hot-plug manually. However, with this patch I was hoping
to solve CRS issue in general that other people might see if they have
a broken endpoint (or something that takes really long to initialize
itself).

>  That seems like a broken usage model.  I
> guess maybe the I2C configurator could initiate a rescan to sort of
> patch things up,

[A side comment: One question w.r.t this particular HW I have at my
disposal (that requires an I2C configuration before it can be
enumerated over PCI). Is there a way to ensure that the I2C subsystem
and drivers get initialized BEFORE the PCI enumeration happens? IF so,
we can just use that to ensure that the I2C config happens before the
PCI enumeration and the endpoint will never respond with CRS. To the
best of my understanding, I could only see that we cannot make any
assumptions about (and we cannot also force) any ordering between
different subsystems such as I2C and PCI.]

Thanks,

Rajat

> but it still seems wrong that we have the timeout and
> generate a warning.
>
>> I used this setup to fix and test this issue. After enabling the CRS
>> visibility flag at the root port, I see that CPU moves on as expected
>> declaring the following (instead of freezing):
>>
>> pci 0000:30:00.0 id reading try 50 times with interval 20 ms to get
>> ffff0001
>
> This message is from pci_bus_check_dev() (added by 2f5d8e4ff947).
> pci_bus_read_dev_vendor_id() already has an internal timeout mechanism
> with a message that makes sense.  I don't understand why we need
> another one with a different, unintelligible, message in
> pci_bus_check_dev().
>
>> Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
>> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>> Signed-off-by: Guenter Roeck <groeck@juniper.net>
>> ---
>> Hi Bjorn / folks,
>>
>> I had also saught suggestions on how this patch should be modelled.
>> Please find a suggestive alternative here:
>>
>> https://lkml.org/lkml/2014/8/1/186
>>
>> Please let me know your thoughts.
>>
>> Thanks,
>>
>> Rajat
>>
>>
>>
>>
>>  drivers/pci/probe.c |   28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..909ca75 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -740,6 +740,32 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
>>  }
>>  EXPORT_SYMBOL(pci_add_new_bus);
>>
>> +static const struct pci_device_id crs_whitelist[] = {
>> +       { PCI_VDEVICE(INTEL, 0x2f00), },
>> +       { PCI_VDEVICE(INTEL, 0x2f01), },
>> +       { PCI_VDEVICE(INTEL, 0x2f02), },
>> +       { PCI_VDEVICE(INTEL, 0x2f03), },
>> +       { PCI_VDEVICE(INTEL, 0x2f04), },
>> +       { PCI_VDEVICE(INTEL, 0x2f05), },
>> +       { PCI_VDEVICE(INTEL, 0x2f06), },
>> +       { PCI_VDEVICE(INTEL, 0x2f07), },
>> +       { PCI_VDEVICE(INTEL, 0x2f08), },
>> +       { PCI_VDEVICE(INTEL, 0x2f09), },
>> +       { PCI_VDEVICE(INTEL, 0x2f0a), },
>> +       { PCI_VDEVICE(INTEL, 0x2f0b), },
>> +       { },
>> +};
>> +
>> +static void pci_enable_crs(struct pci_dev *dev)
>> +{
>> +       /* Enable CRS Software visibility only for whitelisted systems */
>> +       if (pci_is_pcie(dev) &&
>> +           pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
>> +           pci_match_id(crs_whitelist, dev))
>> +               pcie_capability_set_word(dev, PCI_EXP_RTCTL,
>> +                                        PCI_EXP_RTCTL_CRSSVE);
>> +}
>> +
>>  /*
>>   * If it's a bridge, configure it and scan the bus behind it.
>>   * For CardBus bridges, we don't scan behind as the devices will
>> @@ -787,6 +813,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
>>         pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
>>                               bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>>
>> +       pci_enable_crs(dev);
>> +
>>         if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
>>             !is_cardbus && !broken) {
>>                 unsigned int cmax;
>> --
>> 1.7.9.5
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Sept. 2, 2014, 7:30 p.m. UTC | #6
On Mon, Sep 1, 2014 at 9:14 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> I'm not a fan of adding a whitelist for devices that work correctly.
> I don't think that's a maintainable solution.  Since we haven't had
> many systems yet that care about CRS, some kind of "enable CRS on
> machines newer than X" might work.

I'd suggest trying to just enable CRS unconditionally (ie revert my
old commit ad7edfe04908). We've done other changes in this area since,
and in particular, it's entirely possible that the original problem we
had doesn't even trigger any more.

In particular, looking at one of the old reports, I don't see that
"Device %04x:%02x:%02x.%d not responding" warning that we *should*
have triggered in pci_scan_device(). So I wonder if we had some case
that read the vendor ID without that whole loop, and that _that_ was
our fundamental problem.

It would be great if somebody could test with the old hardware that
triggered the problem originally, but those reports are from 2007, so
it might be hard to find anything relevant today. But trying to just
go back to unconditionally enabling CRS (say, in 3.18-rc1) and seeing
if anybody hollers, might just be the right approach.

            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 6, 2014, 9:21 p.m. UTC | #7
[+cc Josh]

On Tue, Sep 2, 2014 at 1:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Sep 1, 2014 at 9:14 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> I'm not a fan of adding a whitelist for devices that work correctly.
>> I don't think that's a maintainable solution.  Since we haven't had
>> many systems yet that care about CRS, some kind of "enable CRS on
>> machines newer than X" might work.
>
> I'd suggest trying to just enable CRS unconditionally (ie revert my
> old commit ad7edfe04908). We've done other changes in this area since,
> and in particular, it's entirely possible that the original problem we
> had doesn't even trigger any more.
>
> In particular, looking at one of the old reports, I don't see that
> "Device %04x:%02x:%02x.%d not responding" warning that we *should*
> have triggered in pci_scan_device(). So I wonder if we had some case
> that read the vendor ID without that whole loop, and that _that_ was
> our fundamental problem.
>
> It would be great if somebody could test with the old hardware that
> triggered the problem originally, but those reports are from 2007, so
> it might be hard to find anything relevant today. But trying to just
> go back to unconditionally enabling CRS (say, in 3.18-rc1) and seeing
> if anybody hollers, might just be the right approach.

Josh actually does still have the hardware, so he *might* be able to test it.

My original theory was that the device didn't advertise CRS SV
capability, and something went wrong when we enabled it anyway.  But
lspci incorrectly decoded CRS SV from the RootCap, so we don't know
whether it was advertised, and I didn't have an explanation for what
could be going wrong if we enabled it.

But as you point out, we don't see the "not responding" message.  We
only print that if we read 0xffff0001 (device/vendor ID).  But lspci
claims the vendor:device ID is 0001:8168.  So my new theory is:

  - Root Port issues Configuration Request to read four bytes at 0x0.
  - Device generates Completion with Config Request Retry Status,
returning data of 10ec:8168.
  - Root Port (with SV disabled) retries the Config Request until it
succeeds, then returns 0x816810ec.  In this configuration, these
boards work correctly today.
  - Root Port (with SV enabled) replaces Vendor ID with 0001 but fails
to replace Device ID with ffff, so it returns 0x81680001.  This is a
hardware defect per PCIe r3.0, sec 2.3.2.
  - Linux sees 0x81680001, but we're looking for 0xffff0001, so we
think everything's fine and we don't retry or warn about it.

So I think we should try two patches:

  1) Enable CRS SV if support is advertised, i.e., Rajat's current patch
  2) Change pci_bus_read_dev_vendor_id() to test only the two bytes of
the vendor ID and ignore the device ID

I suspect that if Josh tested patch 1) alone, he would see the same
problem he originally reported, and that if he tested both together,
it would work correctly.  Obviously, if it works out this way, I would
apply them in the reverse order to avoid a bisection problem.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Sept. 6, 2014, 11:28 p.m. UTC | #8
On Sat, Sep 6, 2014 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> But as you point out, we don't see the "not responding" message.  We
> only print that if we read 0xffff0001 (device/vendor ID).  But lspci
> claims the vendor:device ID is 0001:8168.  So my new theory is:

[snip]

Sounds like a very reasonable theory and quite possible. And if Josh
actually still has the hardware, it would be lovely to hear the
results. Josh?

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..909ca75 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -740,6 +740,32 @@  struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 }
 EXPORT_SYMBOL(pci_add_new_bus);
 
+static const struct pci_device_id crs_whitelist[] = {
+	{ PCI_VDEVICE(INTEL, 0x2f00), },
+	{ PCI_VDEVICE(INTEL, 0x2f01), },
+	{ PCI_VDEVICE(INTEL, 0x2f02), },
+	{ PCI_VDEVICE(INTEL, 0x2f03), },
+	{ PCI_VDEVICE(INTEL, 0x2f04), },
+	{ PCI_VDEVICE(INTEL, 0x2f05), },
+	{ PCI_VDEVICE(INTEL, 0x2f06), },
+	{ PCI_VDEVICE(INTEL, 0x2f07), },
+	{ PCI_VDEVICE(INTEL, 0x2f08), },
+	{ PCI_VDEVICE(INTEL, 0x2f09), },
+	{ PCI_VDEVICE(INTEL, 0x2f0a), },
+	{ PCI_VDEVICE(INTEL, 0x2f0b), },
+	{ },
+};
+
+static void pci_enable_crs(struct pci_dev *dev)
+{
+	/* Enable CRS Software visibility only for whitelisted systems */
+	if (pci_is_pcie(dev) &&
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+	    pci_match_id(crs_whitelist, dev))
+		pcie_capability_set_word(dev, PCI_EXP_RTCTL,
+					 PCI_EXP_RTCTL_CRSSVE);
+}
+
 /*
  * If it's a bridge, configure it and scan the bus behind it.
  * For CardBus bridges, we don't scan behind as the devices will
@@ -787,6 +813,8 @@  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
 			      bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
 
+	pci_enable_crs(dev);
+
 	if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
 	    !is_cardbus && !broken) {
 		unsigned int cmax;