Message ID | 1501917313-9812-2-git-send-email-dingtianhong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote: > From: Casey Leedom <leedom@chelsio.com> > > The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that > Relaxed Ordering (RO) attribute should not be used for Transaction Layer > Packets (TLP) targetted towards these affected root complexes. Current list > of affected parts include some Intel Xeon processors root complex which suffers from > flow control credits that result in performance issues. On these affected > parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE") > Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to > data-corruption. This needs to include a link to the Intel spec (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf, sec 3.9.1). It should also include a pointer to the AMD erratum, if available, or at least some reference to how we know it doesn't obey the rules. Ashok, thanks for chiming in. Now that you have, I have a few more questions for you: - Is the above doc the one you mentioned as being now public? - Is this considered a hardware erratum? - If so, is there a pointer to that as well? - If this is not considered an erratum, can you provide any guidance about how an OS should determine when it should use RO? Relying on a list of device IDs in an optimization manual is OK for an erratum, but if it's *not* an erratum, it seems like a hole in the specs because as far as I know there's no generic way for the OS to discover whether to use RO. Bjorn
| From: Bjorn Helgaas <helgaas@kernel.org> | Sent: Tuesday, August 8, 2017 4:22 PM | | This needs to include a link to the Intel spec | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf, | sec 3.9.1). In the commit message or as a comment? Regardless, I agree. It's always nice to be able to go back and see what the official documentation says. However, that said, links on the internet are ... fragile as time goes by, so we might want to simply quote section 3.9.1 in the commit message since it's relatively short: 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory and Toward MMIO Regions (P2P) In order to maximize performance for PCIe devices in the processors listed in Table 3-6 below, the soft- ware should determine whether the accesses are toward coherent memory (system memory) or toward MMIO regions (P2P access to other devices). If the access is toward MMIO region, then software can command HW to set the RO bit in the TLP header, as this would allow hardware to achieve maximum throughput for these types of accesses. For accesses toward coherent memory, software can command HW to clear the RO bit in the TLP header (no RO), as this would allow hardware to achieve maximum throughput for these types of accesses. Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe Performance Processor CPU RP Device IDs Intel Xeon processors based on 6F01H-6F0EH Broadwell microarchitecture Intel Xeon processors based on 2F01H-2F0EH Haswell microarchitecture | It should also include a pointer to the AMD erratum, if available, or | at least some reference to how we know it doesn't obey the rules. Getting an ACK from AMD seems like a forlorn cause at this point. My contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me messages almost a year ago saying that all of AMD's energies were being redirected towards upcoming x86 products (likely Ryzen as we now know). As far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM SoC. On the specific issue, I can certainly write up somthing even more extensive than I wrote up for the comment in drivers/pci/quirks.c. Please review the comment I wrote up and tell me if you'd like something even more detailed -- I'm usually acused of writing comments which are too long, so this would be a new one on me ... :-) | Ashok, thanks for chiming in. Now that you have, I have a few more | questions for you: I can answer a few of these: | - Is the above doc the one you mentioned as being now public? Yes. Ashok worked with me to the extent he was allowed prior to the publishing of the public technocal note, but he couldn't say much. (Believe it or not, it is possible to say less than the quoted section above.) When the note was published, Patrick Cramer sent me the note about it and pointed me at section 3.9.1. | - Is this considered a hardware erratum? I certainly consider it a Hardware Bug. And I'm really hoping that Ashok will be able to find a "Chicken Bit" which allows the broken feature to be turned off. Remember, the Relaxed Ordering Attribute on a Transaction Layer Packet is simply a HINT. It is perfectly reasonable for a compliant implementation to simply ignore the Relaxed Ordering Attribute on an incoming TLP Request. The sole responsibility of a compliant implementation is to return the exact same Relaxed Ordering and No Snoop Attributes in any TLP Response (The rules for ID-Based Ordering Attribute are more complex.) Earlier Intel Root Complexes did exactly this: they ignored the Relaxed Ordering Attribute and there was no performance difference for using/not-using it. It's pretty obvious that an attempt was made to implement optimizations surounding the use of Relaxed Ordering and they didn't work. | - If so, is there a pointer to that as well? Intel is historically tight-lipped about admiting any bugs/errata in their products. I'm guessing that the above quoted Section 3.9.1 is likely to be all we ever get. The language above regarding TLPs targetting Coherent Shared Memory are basically as much of an admission that they got it wrong as we're going to get. But heck, maybe we'll get lucky ... Especially with regard to the hoped for "Chicken Bit" ... | - If this is not considered an erratum, can you provide any guidance | about how an OS should determine when it should use RO? Software? We don't need no stinking software! Sorry, I couldn't resist. | Relying on a list of device IDs in an optimization manual is OK for an | erratum, but if it's *not* an erratum, it seems like a hole in the specs | because as far as I know there's no generic way for the OS to discover | whether to use RO. Well, here's to hoping that Ashok and/or Patrick are able to offer more detailed information ... Casey
On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote: > | From: Bjorn Helgaas <helgaas@kernel.org> > | Sent: Tuesday, August 8, 2017 4:22 PM > | > | This needs to include a link to the Intel spec > | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf, > | sec 3.9.1). > > In the commit message or as a comment? Regardless, I agree. It's always > nice to be able to go back and see what the official documentation says. > However, that said, links on the internet are ... fragile as time goes by, > so we might want to simply quote section 3.9.1 in the commit message since > it's relatively short: > > 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory > and Toward MMIO Regions (P2P) > > In order to maximize performance for PCIe devices in the processors > listed in Table 3-6 below, the soft- ware should determine whether the > accesses are toward coherent memory (system memory) or toward MMIO > regions (P2P access to other devices). If the access is toward MMIO > region, then software can command HW to set the RO bit in the TLP > header, as this would allow hardware to achieve maximum throughput for > these types of accesses. For accesses toward coherent memory, software > can command HW to clear the RO bit in the TLP header (no RO), as this > would allow hardware to achieve maximum throughput for these types of > accesses. > > Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing > PCIe Performance > > Processor CPU RP Device IDs > > Intel Xeon processors based on 6F01H-6F0EH > Broadwell microarchitecture > > Intel Xeon processors based on 2F01H-2F0EH > Haswell microarchitecture Agreed, links are prone to being broken. I would include in the changelog the complete title and order number, along with the link as a footnote. Wouldn't hurt to quote the section too, since it's short. > | It should also include a pointer to the AMD erratum, if available, or > | at least some reference to how we know it doesn't obey the rules. > > Getting an ACK from AMD seems like a forlorn cause at this point. My > contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me > messages almost a year ago saying that all of AMD's energies were being > redirected towards upcoming x86 products (likely Ryzen as we now know). As > far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM > SoC. > > On the specific issue, I can certainly write up somthing even more > extensive than I wrote up for the comment in drivers/pci/quirks.c. Please > review the comment I wrote up and tell me if you'd like something even more > detailed -- I'm usually acused of writing comments which are too long, so > this would be a new one on me ... :-) If you have any bug reports with info about how you debugged it and concluded that Seattle is broken, you could include a link (probably in the changelog). But if there isn't anything, there isn't anything. I might reorganize those patches as: 1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that sets it, and the current patch [2/4] that uses it. 2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel details. 3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD details.
On 2017/8/9 11:02, Bjorn Helgaas wrote: > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote: >> | From: Bjorn Helgaas <helgaas@kernel.org> >> | Sent: Tuesday, August 8, 2017 4:22 PM >> | >> | This needs to include a link to the Intel spec >> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf, >> | sec 3.9.1). >> >> In the commit message or as a comment? Regardless, I agree. It's always >> nice to be able to go back and see what the official documentation says. >> However, that said, links on the internet are ... fragile as time goes by, >> so we might want to simply quote section 3.9.1 in the commit message since >> it's relatively short: >> >> 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory >> and Toward MMIO Regions (P2P) >> >> In order to maximize performance for PCIe devices in the processors >> listed in Table 3-6 below, the soft- ware should determine whether the >> accesses are toward coherent memory (system memory) or toward MMIO >> regions (P2P access to other devices). If the access is toward MMIO >> region, then software can command HW to set the RO bit in the TLP >> header, as this would allow hardware to achieve maximum throughput for >> these types of accesses. For accesses toward coherent memory, software >> can command HW to clear the RO bit in the TLP header (no RO), as this >> would allow hardware to achieve maximum throughput for these types of >> accesses. >> >> Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing >> PCIe Performance >> >> Processor CPU RP Device IDs >> >> Intel Xeon processors based on 6F01H-6F0EH >> Broadwell microarchitecture >> >> Intel Xeon processors based on 2F01H-2F0EH >> Haswell microarchitecture > > Agreed, links are prone to being broken. I would include in the > changelog the complete title and order number, along with the link as > a footnote. Wouldn't hurt to quote the section too, since it's short. > OK >> | It should also include a pointer to the AMD erratum, if available, or >> | at least some reference to how we know it doesn't obey the rules. >> >> Getting an ACK from AMD seems like a forlorn cause at this point. My >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me >> messages almost a year ago saying that all of AMD's energies were being >> redirected towards upcoming x86 products (likely Ryzen as we now know). As >> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM >> SoC. >> >> On the specific issue, I can certainly write up somthing even more >> extensive than I wrote up for the comment in drivers/pci/quirks.c. Please >> review the comment I wrote up and tell me if you'd like something even more >> detailed -- I'm usually acused of writing comments which are too long, so >> this would be a new one on me ... :-) > > If you have any bug reports with info about how you debugged it and > concluded that Seattle is broken, you could include a link (probably > in the changelog). But if there isn't anything, there isn't anything. > > I might reorganize those patches as: > > 1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that > sets it, and the current patch [2/4] that uses it. > > 2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel > details. > > 3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD > details. > OK, I could reorganize it, but still need the Casey to give me the link for the Seattle, otherwise I could remove the AMD part and wait until someone show it. Thanks Ding > . >
Hi Bjorn On Tue, Aug 08, 2017 at 06:22:00PM -0500, Bjorn Helgaas wrote: > On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote: > > From: Casey Leedom <leedom@chelsio.com> > > > > Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to > > data-corruption. > > This needs to include a link to the Intel spec > (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf, > sec 3.9.1). > > It should also include a pointer to the AMD erratum, if available, or > at least some reference to how we know it doesn't obey the rules. > > Ashok, thanks for chiming in. Now that you have, I have a few more > questions for you: > > - Is the above doc the one you mentioned as being now public? Yes. > > - Is this considered a hardware erratum? I would think so. I have tried to pursue the publication in that direction but it morphed into the optimization guide instead. Once it got into some open doc i stopped pushing.. but will continue to get this into erratum. i do agree that's the right place holder for this. > > - If so, is there a pointer to that as well? > > - If this is not considered an erratum, can you provide any guidance > about how an OS should determine when it should use RO? The optimization guide states that it only applies to transactions targetting system memory. For peer-2-peer RO is allowed and has perf upside. As Casey pointed out in an earlier thread, we choose the heavy hammer approach because there are some that can lead to data-corruption as opposed to perf degradation. This looks ugly, but maybe we can have 2 flags. one that indicates its a strict no-no, and one that says no to system memory only. That way driver can determine when the device would turn the hint on in the TLP. > > Relying on a list of device IDs in an optimization manual is OK for an > erratum, but if it's *not* an erratum, it seems like a hole in the Good point.. for this specific case its really an erratum, but for some reason they made the decision to use this doc vs. the generic errata data-sheet that would have been the preferred way to document. > specs because as far as I know there's no generic way for the OS to > discover whether to use RO. > Cheers, Ashok
| From: Ding Tianhong <dingtianhong@huawei.com> | Sent: Wednesday, August 9, 2017 5:17 AM | | On 2017/8/9 11:02, Bjorn Helgaas wrote: | > | > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote: | > > | >> | From: Bjorn Helgaas <helgaas@kernel.org> | >> | Sent: Tuesday, August 8, 2017 4:22 PM | >> | ... | >> | It should also include a pointer to the AMD erratum, if available, or | >> | at least some reference to how we know it doesn't obey the rules. | >> | >> Getting an ACK from AMD seems like a forlorn cause at this point. My | >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me | >> messages almost a year ago saying that all of AMD's energies were being | >> redirected towards upcoming x86 products (likely Ryzen as we now know). | >> As far as I can tell AMD has walked away from their A1100 (AKA | >> "Seattle") ARM SoC. | >> | >> On the specific issue, I can certainly write up somthing even more | >> extensive than I wrote up for the comment in drivers/pci/quirks.c. | >> Please review the comment I wrote up and tell me if you'd like | >> something even more detailed -- I'm usually acused of writing comments | >> which are too long, so this would be a new one on me ... :-) | > | > If you have any bug reports with info about how you debugged it and | > concluded that Seattle is broken, you could include a link (probably | > in the changelog). But if there isn't anything, there isn't anything. | ... | OK, I could reorganize it, but still need the Casey to give me the link | for the Seattle, otherwise I could remove the AMD part and wait until | someone show it. Thanks There are no links and I was never given an internal bug number at AMD. As I said, they stopped responding to my notes about a years ago saying that they were moving the focus of all their people and no longer had resources to pursue the issue. Hopefully for them, Ryzen doesn't have the same Data Corruption problem ... As for how we diagnosed it, with our Ingress Packet delivery, we have the Ingress Packet Data delivered (DMA Write) into Free List Buffers, and then then a small message (DMA Write) to a "Response Queue" indicating delivery of the Ingress Packet Data into the Free List Buffers. The Transaction Layer Packets which convey the Ingress Packet Data all have the Relaxed Ordering Attribute set, while the following TLP carring the Ingress Data delivery notification into the Response Queue does not have the Relaxed Ordering Attribute set. The rules for processing TLPs with and without the Relaxed Ordering Attribute set are covered in Section 2.4.1 of the PCIe 3.0 specification (Revision 3.0 November 10, 2010). Table 2-34 "Ordering Rules Summary" covers the cases where one TLP may "pass" (be proccessed earlier) than a preceding TLP. In the case we're talking about, we have a sequence of one or more Posted DMA Write TLPs with the Relaxed Ordering Attribute set and a following Posted DMA Write TLP without the Relaxed Ordering Attribute set. Thus we need to look at the Row A, Column 2 cell of Table 2-34 governing when a Posted Request may "pass" a preceeding Posted Request. In that cell we have: a) No b) Y/N with the explanatory text: A2a A Posted Request must not pass another Posted Request unless A2b applies. A2b A Posted Request with RO[23] Set is permitted to pass another Posted Request[24]. A Posted Request with IDO Set is permitted to pass another Posted Request if the two Requester IDs are different. [23] In this section, "RO" is an abbreviation for the Relaxed Ordering Attribute field. [24] Some usages are enabled by not implementing this passing (see the No RO-enabled PR-PR Passing bit in Section 7.8.15). In our case, we were getting notifications of Ingress Packet Delivery in our Response Queues, but not all of the Ingress Packet Data Posted DMA Write TLPs had been processed yet by the Root Complex. As a result, we were picking up old stale memory data before those lagging Ingress Packet Data TLPs could be processed. This is a clear violation of the PCIe 3.0 TLP processing rules outlined above. Does that help? Casey
| From: Raj, Ashok <ashok.raj@intel.com> | Sent: Wednesday, August 9, 2017 8:58 AM | ... | As Casey pointed out in an earlier thread, we choose the heavy hammer | approach because there are some that can lead to data-corruption as opposed | to perf degradation. Careful. As far as I'm aware, there is no Data Corruption problem whatsoever with Intel Root Ports and processing of Transaction Layer Packets with and without the Relaxed Ordering Attribute set. The only issue which we've discovered with relatively recent Intel Root Port implementations and the use of the Relaxed Ordering Attribute is a performance issue. To the best of our ability to analyze the PCIe traces, it appeared that the Intel Root Complex delayed returning Link Flow Control Credits resulting in lowered performance (total bandwidth). When we used Relaxed Ordering for Ingress Packet Data delivery on a 100Gb/s Ethernet link with 1500-byte MTU, we were pegged at ~75Gb/s. Once we disabled Relaxed Ordering, we were able to deliver Ingress Packet Data to Host Memory at the full link rate. Casey
On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote: > | From: Raj, Ashok <ashok.raj@intel.com> > | Sent: Wednesday, August 9, 2017 8:58 AM > | ... > | As Casey pointed out in an earlier thread, we choose the heavy hammer > | approach because there are some that can lead to data-corruption as opposed > | to perf degradation. > > Careful. As far as I'm aware, there is no Data Corruption problem > whatsoever with Intel Root Ports and processing of Transaction Layer Packets > with and without the Relaxed Ordering Attribute set. That's right.. no data-corruption on Intel parts :-).. It was with other vendor. Only performance issue with intel root-ports in the parts identified by the optimization guide. Cheers, AShok
| From: Raj, Ashok <ashok.raj@intel.com> | Sent: Wednesday, August 9, 2017 11:00 AM | | On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote: | > | From: Raj, Ashok <ashok.raj@intel.com> | > | Sent: Wednesday, August 9, 2017 8:58 AM | > | ... | > | As Casey pointed out in an earlier thread, we choose the heavy hammer | > | approach because there are some that can lead to data-corruption as | > | opposed to perf degradation. | > | > Careful. As far as I'm aware, there is no Data Corruption problem | > whatsoever with Intel Root Ports and processing of Transaction Layer | > Packets with and without the Relaxed Ordering Attribute set. | | That's right.. no data-corruption on Intel parts :-).. It was with other | vendor. Only performance issue with intel root-ports in the parts identified | by the optimization guide. Yes, I didn't want you to get into any trouble over that possible reading of what you wrote. Any progress on the "Chicken Bit" investigation? Being able to disable the non-optimal Relaxed Ordering "optimization" would be the best PCI Quirk of all ... Casey
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6967c6b..5c9e125 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4016,6 +4016,94 @@ static void quirk_tw686x_class(struct pci_dev *pdev) quirk_tw686x_class); /* + * Some devices have problems with Transaction Layer Packets with the Relaxed + * Ordering Attribute set. Such devices should mark themselves and other + * Device Drivers should check before sending TLPs with RO set. + */ +static void quirk_relaxedordering_disable(struct pci_dev *dev) +{ + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING; +} + +/* + * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root + * Complex has a Flow Control Credit issue which can cause performance + * problems with Upstream Transaction Layer Packets with Relaxed Ordering set. + */ +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f07, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f08, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f09, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0a, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0b, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0c, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0d, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0e, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); + +/* + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex + * where Upstream Transaction Layer Packets with the Relaxed Ordering + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering + * set. This is a violation of the PCIe 3.0 Transaction Ordering Rules + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0 + * November 10, 2010). As a result, on this platform we can't use Relaxed + * Ordering for Upstream TLPs. + */ +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8, + quirk_relaxedordering_disable); + +/* * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same * values for the Attribute as were supplied in the header of the * corresponding Request, except as explicitly allowed when IDO is used." diff --git a/include/linux/pci.h b/include/linux/pci.h index 4869e66..412ec1c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -188,6 +188,8 @@ enum pci_dev_flags { * the direct_complete optimization. */ PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11), + /* Don't use Relaxed Ordering for TLPs directed at this device */ + PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12), }; enum pci_irq_reroute_variant {