diff mbox

[1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

Message ID 758d0e431c732fe133e7b0e660bde5fc1beccdba.1493678834.git.leedom@chelsio.com (mailing list archive)
State New, archived
Headers show

Commit Message

Casey Leedom May 1, 2017, 11:13 p.m. UTC
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged.  Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
---
 drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

Comments

Ding Tianhong May 2, 2017, 6:49 a.m. UTC | #1
hi, Casey:

On 2017/5/2 7:13, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* Don't use Relaxed Ordering for TLPs directed at this device */
> +	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };

What about add a new general func to check the RO for several drivers to use them ?

just like:

#define pci_dev_support_relaxed_ordering(struct pci_dev *root) \
	(!(root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))

Thanks
Ding

>  
>  enum pci_irq_reroute_variant {
>
Alexander Duyck May 2, 2017, 4:39 p.m. UTC | #2
On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

So this is a good first step though might I suggest one other change.

We may want to add logic to the core PCIe code that clears the "Enable
Relaxed Ordering" bit in the device control register for all devices
hanging off of this root complex. Assuming the devices conform to the
PCIe spec doing that should disable relaxed ordering in a device
agnostic way that then enables us at a driver level to just enable the
feature always without having to perform any checks for your flag. We
could probably do that as a part of probing the PCIe interfaces
hanging off of these devices.

> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>                               quirk_tw686x_class);
>
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>         /* Get VPD from function 0 VPD */
>         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };
>
>  enum pci_irq_reroute_variant {
> --
> 1.9.1
>
Ashok Raj May 2, 2017, 4:44 p.m. UTC | #3
Hi Casey


On Mon, May 01, 2017 at 04:13:50PM -0700, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +

You might want to add the RP ID's for both HSX/BDX. Tne entire range 
is 2F01H-2F0EH & 6F01H-6F0EH.

Cheers,
Ashok
Ashok Raj May 2, 2017, 4:53 p.m. UTC | #4
On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> > Ordering Attribute should not be used on Transaction Layer Packets destined
> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> 
> So this is a good first step though might I suggest one other change.
> 
> We may want to add logic to the core PCIe code that clears the "Enable
> Relaxed Ordering" bit in the device control register for all devices
> hanging off of this root complex. Assuming the devices conform to the
> PCIe spec doing that should disable relaxed ordering in a device
> agnostic way that then enables us at a driver level to just enable the
> feature always without having to perform any checks for your flag. We
> could probably do that as a part of probing the PCIe interfaces
> hanging off of these devices.

I suppose you don't want to turn off RO completely on the device. When
traffic is targetted to mmio for peer to peer reasons RO has performance
upside. The specific issue with these root ports indicate limitation using 
RO for traffic targetting coherent memory.

> 
> > ---
> >  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h  |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index f754453..4ae78b3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> >                               quirk_tw686x_class);
> >
> >  /*
> > + * Some devices have problems with Transaction Layer Packets with the Relaxed
> > + * Ordering Attribute set.  Such devices should mark themselves and other
> > + * Device Drivers should check before sending TLPs with RO set.
> > + */
> > +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> > +{
> > +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> > +}
> > +
> > +/*
> > + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> > + * cause performance problems with Upstream Transaction Layer Packets with
> > + * Relaxed Ordering set.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> > + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> > + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> > + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> > + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> > + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> > + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> > + * Ordering for Upstream TLPs.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> >   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
> >   * values for the Attribute as were supplied in the header of the
> >   * corresponding Request, except as explicitly allowed when IDO is used."
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index eb3da1a..6764f66 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -178,6 +178,8 @@ enum pci_dev_flags {
> >         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> >         /* Get VPD from function 0 VPD */
> >         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> > +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 1.9.1
> >
Alexander Duyck May 2, 2017, 6:10 p.m. UTC | #5
On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>>
>> So this is a good first step though might I suggest one other change.
>>
>> We may want to add logic to the core PCIe code that clears the "Enable
>> Relaxed Ordering" bit in the device control register for all devices
>> hanging off of this root complex. Assuming the devices conform to the
>> PCIe spec doing that should disable relaxed ordering in a device
>> agnostic way that then enables us at a driver level to just enable the
>> feature always without having to perform any checks for your flag. We
>> could probably do that as a part of probing the PCIe interfaces
>> hanging off of these devices.
>
> I suppose you don't want to turn off RO completely on the device. When
> traffic is targetted to mmio for peer to peer reasons RO has performance
> upside. The specific issue with these root ports indicate limitation using
> RO for traffic targetting coherent memory.

Actually my main concern here is virtualization. If I take the PCIe
function and direct assign it I have no way of seeing the root complex
flag as it is now virtualized away. In the meantime the guest now has
the ability to enable the function and sees nothing that says you
can't enable relaxed ordering which in turn ends up potentially
causing data corruption on the system. I want relaxed ordering
disabled before I even consider assigning it to the guest on the
systems where this would be an issue.

I prefer to err on the side of caution with this. Enabling Relaxed
Ordering is technically a performance enhancement, so we function but
not as well as we would like, while having it enabled when there are
issues can lead to data corruption. I would weigh the risk of data
corruption the thing to be avoided and of much higher priority than
enabling improved performance. As such I say we should default the
relaxed ordering attribute to off in general and look at
"white-listing" it in for various architectures and/or chipsets that
support/need it rather than having it enabled by default and trying to
switch it off after the fact when we find some new issue.

So for example, in the case of x86 it seems like there are multiple
root complexes that have issues, and the gains for enabling it with
standard DMA to host memory are small. As such we may want to default
it to off via the architecture specific PCIe code and then look at
having "white-list" cases where we enable it for things like
peer-to-peer accesses. In the case of SPARC we could look at
defaulting it to on, and only "black-list" any cases where there might
be issues since SPARC relies on this in a significant way for
performance. In the case of ARM and other architectures it is a bit of
a toss-up. I would say we could just default it to on for now and
"black-list" anything that doesn't work, or we could go the other way
like I suggested for x86. It all depends on what the ARM community
would want to agree on for this. I would say unless it makes a
significant difference like it does in the case of SPARC we are
probably better off just defaulting it to off.

- Alex
Casey Leedom May 3, 2017, 4:30 a.m. UTC | #6
| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Tuesday, May 2, 2017 11:10 AM
| ...
| So for example, in the case of x86 it seems like there are multiple
| root complexes that have issues, and the gains for enabling it with
| standard DMA to host memory are small. As such we may want to default
| it to off via the architecture specific PCIe code and then look at
| having "white-list" cases where we enable it for things like
| peer-to-peer accesses. In the case of SPARC we could look at
| defaulting it to on, and only "black-list" any cases where there might
| be issues since SPARC relies on this in a significant way for
| performance. In the case of ARM and other architectures it is a bit of
| a toss-up. I would say we could just default it to on for now and
| "black-list" anything that doesn't work, or we could go the other way
| like I suggested for x86. It all depends on what the ARM community
| would want to agree on for this. I would say unless it makes a
| significant difference like it does in the case of SPARC we are
| probably better off just defaulting it to off.

  Sorry, I forgot to respond to this earlier when someone was rushing me out
to a customer dinner.

  I'm unaware of any other x86 Root Complex Port that has a problem with
Relaxed Ordering other than the performance issue with the current Intel
implementation.  Ashok tells me that Intel is in the final stages of putting
together a technical notice on this issue but I don't know when that will
come out.  Hopefully that will shed much more light on the cause and actual
use of Relaxed Ordering when directed to Coherent Memory on current and past
Intel platforms.  (Note that the performance bug seems to limit us to
~75-85Gb/s DMA Write speed to Coherent Host Memory.)

  The only other Device that I currently know of which has issues with
Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
to Data Corruption.

  But I don't know anything about other x86 Root Complex Ports having issues
with this -- we've been using it ever since commit ef306b50b from December
2010.

  Also, I'm not aware of any performance data which has been gathered on the
use of Relaxed Ordering when directed at Host Memory.  From your note, it
sounds like it's important on SPARC architectures.  But it could conceivably
be important on any architecture or Root Complex/Memory Controller
implementation.  We use it to direct Ingress Packet Data to Free List
Buffers, followed by a TLP without Relaxed Ordering directed at a Host
Memory Message Queue.  The idea here is to give the Root Complex options on
which DMA Memory Write TLPs to process in order to optimize data placement
in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
even more important.

  Basically, I'd hate to come up with a solution where we write off the use
of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
suggesting that, but there are a number of x86 Root Complex implementations
out there -- and some like the new AMD Ryzen have yet to be tested -- as
well as other architectures.

  And, as Ashok and I have both nothed, any solution we come up with needs
to cope with a heterogeneous situation where, on the same PCIe Fabric, it
may be necessesary/desireable to support Relaxed Ordering TLPs directed at
some End Nodes but not others.

Casey
Alexander Duyck May 3, 2017, 4:02 p.m. UTC | #7
On Tue, May 2, 2017 at 9:30 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Tuesday, May 2, 2017 11:10 AM
> | ...
> | So for example, in the case of x86 it seems like there are multiple
> | root complexes that have issues, and the gains for enabling it with
> | standard DMA to host memory are small. As such we may want to default
> | it to off via the architecture specific PCIe code and then look at
> | having "white-list" cases where we enable it for things like
> | peer-to-peer accesses. In the case of SPARC we could look at
> | defaulting it to on, and only "black-list" any cases where there might
> | be issues since SPARC relies on this in a significant way for
> | performance. In the case of ARM and other architectures it is a bit of
> | a toss-up. I would say we could just default it to on for now and
> | "black-list" anything that doesn't work, or we could go the other way
> | like I suggested for x86. It all depends on what the ARM community
> | would want to agree on for this. I would say unless it makes a
> | significant difference like it does in the case of SPARC we are
> | probably better off just defaulting it to off.
>
>   Sorry, I forgot to respond to this earlier when someone was rushing me out
> to a customer dinner.
>
>   I'm unaware of any other x86 Root Complex Port that has a problem with
> Relaxed Ordering other than the performance issue with the current Intel
> implementation.  Ashok tells me that Intel is in the final stages of putting
> together a technical notice on this issue but I don't know when that will
> come out.  Hopefully that will shed much more light on the cause and actual
> use of Relaxed Ordering when directed to Coherent Memory on current and past
> Intel platforms.  (Note that the performance bug seems to limit us to
> ~75-85Gb/s DMA Write speed to Coherent Host Memory.)

So my concern isn't so much about existing issues as it is about where
is the advantage in enabling it. We have had support in the Intel
hardware for enabling relaxed ordering for about 10 years. In all that
time I have yet to see an x86 platform that sees any real benefit from
enabling it for standard DMA. That is why my preference would be to
leave it disabled by default on x86 and we white list it in at some
point when hardware shows that there is a benefit to be had for
enabling it.

>   The only other Device that I currently know of which has issues with
> Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
> AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
> to Data Corruption.
>
>   But I don't know anything about other x86 Root Complex Ports having issues
> with this -- we've been using it ever since commit ef306b50b from December
> 2010.

So the question I would have for you then, is what benefits have you
seen from enabling it on x86? In our case we haven't seen any for
transactions that go through the root complex. If you are seeing
benefits would I be correct in assuming it is for your peer-to-peer
case or were there some x86 platforms that showed gains?

>   Also, I'm not aware of any performance data which has been gathered on the
> use of Relaxed Ordering when directed at Host Memory.  From your note, it
> sounds like it's important on SPARC architectures.  But it could conceivably
> be important on any architecture or Root Complex/Memory Controller
> implementation.  We use it to direct Ingress Packet Data to Free List
> Buffers, followed by a TLP without Relaxed Ordering directed at a Host
> Memory Message Queue.  The idea here is to give the Root Complex options on
> which DMA Memory Write TLPs to process in order to optimize data placement
> in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
> even more important.

The operative term here is "may be". That is my concern. We are
leaving this enabled by default and there are known hardware that have
issues that can be pretty serious. I'm not saying we have to disable
it and keep it disabled, but I would like to see us intelligently
enable this feature so that it is enabled on the platforms that show
benefit and disabled on the ones that don't.

My biggest concern with all this is introducing regressions as drivers
like igb and ixgbe are used on a wide range of platforms beyond even
what is covered by x86 and I would prefer not to suddenly have a
deluge of bugs to sort out triggered by us enabling relaxed ordering
on platforms that historically have not had it enabled on them.

>   Basically, I'd hate to come up with a solution where we write off the use
> of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
> suggesting that, but there are a number of x86 Root Complex implementations
> out there -- and some like the new AMD Ryzen have yet to be tested -- as
> well as other architectures.

I'm not saying that we have to write off the use of Relaxed Ordering,
what I am saying is that we should probably be more judicious about
how we go about enabling it. If a platform and/or architecture has no
benefit to enabling it what is the point in adding the possible risk?

Hopefully the AMD Ryzen platform has already been tested and doesn't
need a quirk to disable relaxed ordering. Really it shouldn't fall on
the likes of us to be testing for those kind of things.

>   And, as Ashok and I have both nothed, any solution we come up with needs
> to cope with a heterogeneous situation where, on the same PCIe Fabric, it
> may be necessesary/desireable to support Relaxed Ordering TLPs directed at
> some End Nodes but not others.
>
> Casey

It sounds like we are more or less in agreement. My only concern is
really what we default this to. On x86 I would say we could probably
default this to disabled for existing platforms since my understanding
is that relaxed ordering doesn't provide much benefit on what is out
there right now when performing DMA through the root complex. As far
as peer-to-peer I would say we should probably look at enabling the
ability to have Relaxed Ordering enabled for some channels but not
others. In those cases the hardware needs to be smart enough to allow
for you to indicate you want it disabled by default for most of your
DMA channels, and then enabled for the select channels that are
handling the peer-to-peer traffic.

- Alex
Casey Leedom May 4, 2017, 9:01 p.m. UTC | #8
| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, May 3, 2017 9:02 AM
| ...
| It sounds like we are more or less in agreement. My only concern is
| really what we default this to. On x86 I would say we could probably
| default this to disabled for existing platforms since my understanding
| is that relaxed ordering doesn't provide much benefit on what is out
| there right now when performing DMA through the root complex. As far
| as peer-to-peer I would say we should probably look at enabling the
| ability to have Relaxed Ordering enabled for some channels but not
| others. In those cases the hardware needs to be smart enough to allow
| for you to indicate you want it disabled by default for most of your
| DMA channels, and then enabled for the select channels that are
| handling the peer-to-peer traffic.

  Yes, I think that we are mostly in agreement.  I had just wanted to make
sure that whatever scheme was developed would allow for simultaneously
supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
Ordering for others within the same system.  I.e. not simply
enabling/disabling/etc.  based solely on System Platform Architecture.

  By the way, I've started our QA folks off looking at what things look like
in Linux Virtual Machines under different Hypervisors to see what
information they may provide to the VM in the way of what Root Complex Port
is being used, etc.  So far they've got Windows HyperV done and there
there's no PCIe Fabric exposed in any way: just the attached device.  I'll
have to see what pci_find_pcie_root_port() returns in that environment.
Maybe NULL?

  With your reservations (which I also share), I think that it probably
makes sense to have a per-architecture definition of the "Can I Use Relaxed
Ordering With TLPs Directed At This End Point" predicate, with the default
being "No" for any architecture which doesn't implement the predicate.  And
if the specified (struct pci_dev *) End Node is NULL, it ought to return
False for that as well.  I can't see any reason to pass in the Source End
Node but I may be missing something.

  At this point, this is pretty far outside my level of expertise.  I'm
happy to give it a go, but I'd be even happier if someone with a lot more
experience in the PCIe Infrastructure were to want to carry the ball
forward.  I'm not super familiar with the Linux Kernel "Rules Of
Engagement", so let me know what my next step should be.  Thanks.

Casey
Alexander Duyck May 5, 2017, 2:04 p.m. UTC | #9
On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Sent: Wednesday, May 3, 2017 9:02 AM
> | ...
> | It sounds like we are more or less in agreement. My only concern is
> | really what we default this to. On x86 I would say we could probably
> | default this to disabled for existing platforms since my understanding
> | is that relaxed ordering doesn't provide much benefit on what is out
> | there right now when performing DMA through the root complex. As far
> | as peer-to-peer I would say we should probably look at enabling the
> | ability to have Relaxed Ordering enabled for some channels but not
> | others. In those cases the hardware needs to be smart enough to allow
> | for you to indicate you want it disabled by default for most of your
> | DMA channels, and then enabled for the select channels that are
> | handling the peer-to-peer traffic.
>
>   Yes, I think that we are mostly in agreement.  I had just wanted to make
> sure that whatever scheme was developed would allow for simultaneously
> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
> Ordering for others within the same system.  I.e. not simply
> enabling/disabling/etc.  based solely on System Platform Architecture.
>
>   By the way, I've started our QA folks off looking at what things look like
> in Linux Virtual Machines under different Hypervisors to see what
> information they may provide to the VM in the way of what Root Complex Port
> is being used, etc.  So far they've got Windows HyperV done and there
> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
> have to see what pci_find_pcie_root_port() returns in that environment.
> Maybe NULL?

I believe NULL is one of the options. It all depends on what qemu is
emulating. Most likely you won't find a pcie root port on KVM because
the default is to emulate an older system that only supports PCI.

>   With your reservations (which I also share), I think that it probably
> makes sense to have a per-architecture definition of the "Can I Use Relaxed
> Ordering With TLPs Directed At This End Point" predicate, with the default
> being "No" for any architecture which doesn't implement the predicate.  And
> if the specified (struct pci_dev *) End Node is NULL, it ought to return
> False for that as well.  I can't see any reason to pass in the Source End
> Node but I may be missing something.
>
>   At this point, this is pretty far outside my level of expertise.  I'm
> happy to give it a go, but I'd be even happier if someone with a lot more
> experience in the PCIe Infrastructure were to want to carry the ball
> forward.  I'm not super familiar with the Linux Kernel "Rules Of
> Engagement", so let me know what my next step should be.  Thanks.
>
> Casey

For now we can probably keep this on the linux-pci mailing list. Going
that route is the most straight forward for now since step one is
probably just making sure we are setting the relaxed ordering bit in
the setups that make sense. I would say we could probably keep it
simple. We just need to enable relaxed ordering by default for SPARC
architectures, on most others we can probably default it to off.

I believe this all had started as Ding Tianhong was hoping to enable
this for the ARM architecture. That is the only one I can think of
where it might be difficult to figure out which way to default as we
were attempting to follow the same code that was enabled for SPARC and
that is what started this tug-of-war about how this should be done.
What we might do is take care of this in two phases. The first one
enables the infrastructure generically but leaves it defaulted to off
for everyone but SPARC. Then we can go through and start enabling it
for other platforms such as some of those on ARM in the platforms that
Ding Tianhong was working with.

- Alex
Ding Tianhong May 6, 2017, 3:08 a.m. UTC | #10
On 2017/5/5 22:04, Alexander Duyck wrote:
> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Sent: Wednesday, May 3, 2017 9:02 AM
>> | ...
>> | It sounds like we are more or less in agreement. My only concern is
>> | really what we default this to. On x86 I would say we could probably
>> | default this to disabled for existing platforms since my understanding
>> | is that relaxed ordering doesn't provide much benefit on what is out
>> | there right now when performing DMA through the root complex. As far
>> | as peer-to-peer I would say we should probably look at enabling the
>> | ability to have Relaxed Ordering enabled for some channels but not
>> | others. In those cases the hardware needs to be smart enough to allow
>> | for you to indicate you want it disabled by default for most of your
>> | DMA channels, and then enabled for the select channels that are
>> | handling the peer-to-peer traffic.
>>
>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>> sure that whatever scheme was developed would allow for simultaneously
>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>> Ordering for others within the same system.  I.e. not simply
>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>
>>   By the way, I've started our QA folks off looking at what things look like
>> in Linux Virtual Machines under different Hypervisors to see what
>> information they may provide to the VM in the way of what Root Complex Port
>> is being used, etc.  So far they've got Windows HyperV done and there
>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>> have to see what pci_find_pcie_root_port() returns in that environment.
>> Maybe NULL?
> 
> I believe NULL is one of the options. It all depends on what qemu is
> emulating. Most likely you won't find a pcie root port on KVM because
> the default is to emulate an older system that only supports PCI.
> 
>>   With your reservations (which I also share), I think that it probably
>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>> Ordering With TLPs Directed At This End Point" predicate, with the default
>> being "No" for any architecture which doesn't implement the predicate.  And
>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>> False for that as well.  I can't see any reason to pass in the Source End
>> Node but I may be missing something.
>>
>>   At this point, this is pretty far outside my level of expertise.  I'm
>> happy to give it a go, but I'd be even happier if someone with a lot more
>> experience in the PCIe Infrastructure were to want to carry the ball
>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>> Engagement", so let me know what my next step should be.  Thanks.
>>
>> Casey
> 
> For now we can probably keep this on the linux-pci mailing list. Going
> that route is the most straight forward for now since step one is
> probably just making sure we are setting the relaxed ordering bit in
> the setups that make sense. I would say we could probably keep it
> simple. We just need to enable relaxed ordering by default for SPARC
> architectures, on most others we can probably default it to off.
> 

Casey, Alexander:

Thanks for the wonderful discussion, it is more clearly that what to do next,
I agree that enable relaxed ordering by default only for SPARC and ARM64
is more safe for all the other platform, as no one want to break anything.

> I believe this all had started as Ding Tianhong was hoping to enable
> this for the ARM architecture. That is the only one I can think of
> where it might be difficult to figure out which way to default as we
> were attempting to follow the same code that was enabled for SPARC and
> that is what started this tug-of-war about how this should be done.
> What we might do is take care of this in two phases. The first one
> enables the infrastructure generically but leaves it defaulted to off
> for everyone but SPARC. Then we can go through and start enabling it
> for other platforms such as some of those on ARM in the platforms that
> Ding Tianhong was working with.
> 

According the suggestion, I could only think of this code:

@@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
                              quirk_tw686x_class);

+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+ if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
+     dev->vendor != PCI_VENDOR_ID_SUN)
+         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
+                       quirk_relaxedordering_disable);
+
 /*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the


What do you think of it?

Thanks
Ding


> - Alex
> 
> .
>
Alexander Duyck May 6, 2017, 6:07 p.m. UTC | #11
On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/5 22:04, Alexander Duyck wrote:
>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>> | ...
>>> | It sounds like we are more or less in agreement. My only concern is
>>> | really what we default this to. On x86 I would say we could probably
>>> | default this to disabled for existing platforms since my understanding
>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>> | there right now when performing DMA through the root complex. As far
>>> | as peer-to-peer I would say we should probably look at enabling the
>>> | ability to have Relaxed Ordering enabled for some channels but not
>>> | others. In those cases the hardware needs to be smart enough to allow
>>> | for you to indicate you want it disabled by default for most of your
>>> | DMA channels, and then enabled for the select channels that are
>>> | handling the peer-to-peer traffic.
>>>
>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>> sure that whatever scheme was developed would allow for simultaneously
>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>> Ordering for others within the same system.  I.e. not simply
>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>
>>>   By the way, I've started our QA folks off looking at what things look like
>>> in Linux Virtual Machines under different Hypervisors to see what
>>> information they may provide to the VM in the way of what Root Complex Port
>>> is being used, etc.  So far they've got Windows HyperV done and there
>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>> Maybe NULL?
>>
>> I believe NULL is one of the options. It all depends on what qemu is
>> emulating. Most likely you won't find a pcie root port on KVM because
>> the default is to emulate an older system that only supports PCI.
>>
>>>   With your reservations (which I also share), I think that it probably
>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>> being "No" for any architecture which doesn't implement the predicate.  And
>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>> False for that as well.  I can't see any reason to pass in the Source End
>>> Node but I may be missing something.
>>>
>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>> experience in the PCIe Infrastructure were to want to carry the ball
>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>> Engagement", so let me know what my next step should be.  Thanks.
>>>
>>> Casey
>>
>> For now we can probably keep this on the linux-pci mailing list. Going
>> that route is the most straight forward for now since step one is
>> probably just making sure we are setting the relaxed ordering bit in
>> the setups that make sense. I would say we could probably keep it
>> simple. We just need to enable relaxed ordering by default for SPARC
>> architectures, on most others we can probably default it to off.
>>
>
> Casey, Alexander:
>
> Thanks for the wonderful discussion, it is more clearly that what to do next,
> I agree that enable relaxed ordering by default only for SPARC and ARM64
> is more safe for all the other platform, as no one want to break anything.
>
>> I believe this all had started as Ding Tianhong was hoping to enable
>> this for the ARM architecture. That is the only one I can think of
>> where it might be difficult to figure out which way to default as we
>> were attempting to follow the same code that was enabled for SPARC and
>> that is what started this tug-of-war about how this should be done.
>> What we might do is take care of this in two phases. The first one
>> enables the infrastructure generically but leaves it defaulted to off
>> for everyone but SPARC. Then we can go through and start enabling it
>> for other platforms such as some of those on ARM in the platforms that
>> Ding Tianhong was working with.
>>
>
> According the suggestion, I could only think of this code:
>
> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>                               quirk_tw686x_class);
>
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
> +     dev->vendor != PCI_VENDOR_ID_SUN)
> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
> +                       quirk_relaxedordering_disable);
> +
>  /*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>
>
> What do you think of it?
>
> Thanks
> Ding
>

This is a bit simplistic but it is a start.

The other bit I was getting at is that we need to update the core PCIe
code so that when we configure devices and the root complex reports no
support for relaxed ordering it should be clearing the relaxed
ordering bits in the PCIe configuration registers on the upstream
facing devices.

The last bit we need in all this is a way to allow for setups where
peer-to-peer wants to perform relaxed ordering but for writes to the
host we have to not use relaxed ordering. For that we need to enable a
special case and that isn't handled right now in any of the solutions
we have coded up so far.

- Alex
Ding Tianhong May 8, 2017, 2:33 p.m. UTC | #12
On 2017/5/7 2:07, Alexander Duyck wrote:
> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>>
>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>> | ...
>>>> | It sounds like we are more or less in agreement. My only concern is
>>>> | really what we default this to. On x86 I would say we could probably
>>>> | default this to disabled for existing platforms since my understanding
>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>> | there right now when performing DMA through the root complex. As far
>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>> | for you to indicate you want it disabled by default for most of your
>>>> | DMA channels, and then enabled for the select channels that are
>>>> | handling the peer-to-peer traffic.
>>>>
>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>> sure that whatever scheme was developed would allow for simultaneously
>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>> Ordering for others within the same system.  I.e. not simply
>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>
>>>>   By the way, I've started our QA folks off looking at what things look like
>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>> information they may provide to the VM in the way of what Root Complex Port
>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>> Maybe NULL?
>>>
>>> I believe NULL is one of the options. It all depends on what qemu is
>>> emulating. Most likely you won't find a pcie root port on KVM because
>>> the default is to emulate an older system that only supports PCI.
>>>
>>>>   With your reservations (which I also share), I think that it probably
>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>> Node but I may be missing something.
>>>>
>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>
>>>> Casey
>>>
>>> For now we can probably keep this on the linux-pci mailing list. Going
>>> that route is the most straight forward for now since step one is
>>> probably just making sure we are setting the relaxed ordering bit in
>>> the setups that make sense. I would say we could probably keep it
>>> simple. We just need to enable relaxed ordering by default for SPARC
>>> architectures, on most others we can probably default it to off.
>>>
>>
>> Casey, Alexander:
>>
>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>> is more safe for all the other platform, as no one want to break anything.
>>
>>> I believe this all had started as Ding Tianhong was hoping to enable
>>> this for the ARM architecture. That is the only one I can think of
>>> where it might be difficult to figure out which way to default as we
>>> were attempting to follow the same code that was enabled for SPARC and
>>> that is what started this tug-of-war about how this should be done.
>>> What we might do is take care of this in two phases. The first one
>>> enables the infrastructure generically but leaves it defaulted to off
>>> for everyone but SPARC. Then we can go through and start enabling it
>>> for other platforms such as some of those on ARM in the platforms that
>>> Ding Tianhong was working with.
>>>
>>
>> According the suggestion, I could only think of this code:
>>
>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>                               quirk_tw686x_class);
>>
>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>> +{
>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>> +                       quirk_relaxedordering_disable);
>> +
>>  /*
>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>   * values for the Attribute as were supplied in the header of the
>>
>>
>> What do you think of it?
>>
>> Thanks
>> Ding
>>
> 
> This is a bit simplistic but it is a start.
> 
> The other bit I was getting at is that we need to update the core PCIe
> code so that when we configure devices and the root complex reports no
> support for relaxed ordering it should be clearing the relaxed
> ordering bits in the PCIe configuration registers on the upstream
> facing devices.

How about this:
rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
that the pcie dev did not support RO mode.

> 
> The last bit we need in all this is a way to allow for setups where
> peer-to-peer wants to perform relaxed ordering but for writes to the
> host we have to not use relaxed ordering. For that we need to enable a
> special case and that isn't handled right now in any of the solutions
> we have coded up so far.
> 

Sorry I am not clear of this way, can you explain more about this or give me
a special case, thanks a lot.

Ding

> - Alex
> 
> .
>
Alexander Duyck May 8, 2017, 3:22 p.m. UTC | #13
On Mon, May 8, 2017 at 7:33 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/7 2:07, Alexander Duyck wrote:
>> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>>
>>>
>>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>>> | ...
>>>>> | It sounds like we are more or less in agreement. My only concern is
>>>>> | really what we default this to. On x86 I would say we could probably
>>>>> | default this to disabled for existing platforms since my understanding
>>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>>> | there right now when performing DMA through the root complex. As far
>>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>>> | for you to indicate you want it disabled by default for most of your
>>>>> | DMA channels, and then enabled for the select channels that are
>>>>> | handling the peer-to-peer traffic.
>>>>>
>>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>>> sure that whatever scheme was developed would allow for simultaneously
>>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>>> Ordering for others within the same system.  I.e. not simply
>>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>>
>>>>>   By the way, I've started our QA folks off looking at what things look like
>>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>>> information they may provide to the VM in the way of what Root Complex Port
>>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>>> Maybe NULL?
>>>>
>>>> I believe NULL is one of the options. It all depends on what qemu is
>>>> emulating. Most likely you won't find a pcie root port on KVM because
>>>> the default is to emulate an older system that only supports PCI.
>>>>
>>>>>   With your reservations (which I also share), I think that it probably
>>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>>> Node but I may be missing something.
>>>>>
>>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>>
>>>>> Casey
>>>>
>>>> For now we can probably keep this on the linux-pci mailing list. Going
>>>> that route is the most straight forward for now since step one is
>>>> probably just making sure we are setting the relaxed ordering bit in
>>>> the setups that make sense. I would say we could probably keep it
>>>> simple. We just need to enable relaxed ordering by default for SPARC
>>>> architectures, on most others we can probably default it to off.
>>>>
>>>
>>> Casey, Alexander:
>>>
>>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>>> is more safe for all the other platform, as no one want to break anything.
>>>
>>>> I believe this all had started as Ding Tianhong was hoping to enable
>>>> this for the ARM architecture. That is the only one I can think of
>>>> where it might be difficult to figure out which way to default as we
>>>> were attempting to follow the same code that was enabled for SPARC and
>>>> that is what started this tug-of-war about how this should be done.
>>>> What we might do is take care of this in two phases. The first one
>>>> enables the infrastructure generically but leaves it defaulted to off
>>>> for everyone but SPARC. Then we can go through and start enabling it
>>>> for other platforms such as some of those on ARM in the platforms that
>>>> Ding Tianhong was working with.
>>>>
>>>
>>> According the suggestion, I could only think of this code:
>>>
>>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>>                               quirk_tw686x_class);
>>>
>>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>>> +{
>>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>>> +}
>>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>>> +                       quirk_relaxedordering_disable);
>>> +
>>>  /*
>>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>>   * values for the Attribute as were supplied in the header of the
>>>
>>>
>>> What do you think of it?
>>>
>>> Thanks
>>> Ding
>>>
>>
>> This is a bit simplistic but it is a start.
>>
>> The other bit I was getting at is that we need to update the core PCIe
>> code so that when we configure devices and the root complex reports no
>> support for relaxed ordering it should be clearing the relaxed
>> ordering bits in the PCIe configuration registers on the upstream
>> facing devices.
>
> How about this:
> rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
> when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
> that the pcie dev did not support RO mode.

The problem is we need to have something that can be communicated
through a VM. Your change doesn't work in that regard. That was why I
suggested just updating the code so that we when we initialized PCIe
devices what we do is either set or clear the relaxed ordering bit in
the PCIe device control register. That way when we direct assign an
interface it could know just based on the bits int the PCIe
configuration if it could use relaxed ordering or not.

At that point the driver code itself becomes very simple since you
could just enable the relaxed ordering by default in the igb/ixgbe
driver and if the bit is set or cleared in the PCIe configuration then
we are either sending with relaxed ordering requests or not and don't
have to try and locate the root complex.

>>
>> The last bit we need in all this is a way to allow for setups where
>> peer-to-peer wants to perform relaxed ordering but for writes to the
>> host we have to not use relaxed ordering. For that we need to enable a
>> special case and that isn't handled right now in any of the solutions
>> we have coded up so far.
>>
>
> Sorry I am not clear of this way, can you explain more about this or give me
> a special case, thanks a lot.

So from the sound of it Casey has a special use case where he doesn't
want to send relaxed ordering frames to the root complex, but instead
would like to send them to another PCIe device. To do that he needs to
have a way to enable the relaxed ordering bit in the PCIe
configuration but then not send any to the root complex. Odds are that
is something he might be able to just implement in the driver, but is
something that may become a more general case in the future. I don't
see our change here impacting it as long as we keep the solution
generic and mostly confined to when we instantiate the devices as the
driver could likely make the decision to change the behavior later.

- Alex
Casey Leedom May 9, 2017, 12:48 a.m. UTC | #14
| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 6, 2017 11:07 AM
| 
| | From: Ding Tianhong <dingtianhong@huawei.com>
| | Date: Fri, May 5, 2017 at 8:08 PM
| | 
| | According the suggestion, I could only think of this code:
| | ..
| 
| This is a bit simplistic but it is a start.

  Yes, something tells me that this is going to be more complicated than any
of us like ...

| The other bit I was getting at is that we need to update the core PCIe
| code so that when we configure devices and the root complex reports no
| support for relaxed ordering it should be clearing the relaxed
| ordering bits in the PCIe configuration registers on the upstream
| facing devices.

  Of course, this can be written to by the Driver at any time ... and is in
the case of the cxgb4 Driver ...

  After a lot of rummaging around, it does look like KVM prohibits writes to
the PCIe Capability Device Control register in drivers/xen/xen-pciback/
conf_space_capability.c and conf_space.c simply because writes aren't
allowed unless "permissive" is set.  So it ~looks~ like a driver running in
a Virtual Machine can't turn Enable Relaxed Ordering back on ...

| The last bit we need in all this is a way to allow for setups where
| peer-to-peer wants to perform relaxed ordering but for writes to the
| host we have to not use relaxed ordering. For that we need to enable a
| special case and that isn't handled right now in any of the solutions
| we have coded up so far.

  Yes, we do need this.


| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 8, 2017 08:22 AM
|
| The problem is we need to have something that can be communicated
| through a VM. Your change doesn't work in that regard. That was why I
| suggested just updating the code so that we when we initialized PCIe
| devices what we do is either set or clear the relaxed ordering bit in
| the PCIe device control register. That way when we direct assign an
| interface it could know just based on the bits int the PCIe
| configuration if it could use relaxed ordering or not.
| 
| At that point the driver code itself becomes very simple since you
| could just enable the relaxed ordering by default in the igb/ixgbe
| driver and if the bit is set or cleared in the PCIe configuration then
| we are either sending with relaxed ordering requests or not and don't
| have to try and locate the root complex.
| 
| So from the sound of it Casey has a special use case where he doesn't
| want to send relaxed ordering frames to the root complex, but instead
| would like to send them to another PCIe device. To do that he needs to
| have a way to enable the relaxed ordering bit in the PCIe
| configuration but then not send any to the root complex. Odds are that
| is something he might be able to just implement in the driver, but is
| something that may become a more general case in the future. I don't
| see our change here impacting it as long as we keep the solution
| generic and mostly confined to when we instantiate the devices as the
| driver could likely make the decision to change the behavior later.

  It's not just me.  Intel has said that while RO directed at the Root
Complex Host Coherent Memory has a performance bug (not Data Corruption),
it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
interested in hearing what the bug is if we get that much detail.  The very
same TLPs directed to the Root Complex Port without Relaxed Ordering set get
good performance.  So this is essentially a bug in the hardware that was
~trying~ to implement a performance win.)

  Meanwhile, I currently only know of a single PCIe End Point which causes
catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
clear that product is even alive anymore since I haven't been able to get
any responses from them for several months.

  What I'm saying is: let's try to architect a solution which doesn't throw
the baby out with the bath water ...

  I think that if a Device's Root Complex Port has problems with Relaxed
Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
Control[Enable Relaxed Ordering] when we assign a device to a Virtual
Machine since the Device Driver can no longer query the Relaxed Ordering
Support of the Root Complex Port.  The only down side of this would be if we
assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
transfers.  But that seems like a hard application to support in any case
since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
match the actual values.

  For Devices running in the base OS/Hypervisor, their Drivers can query the
Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
simple flag within the (struct pci_dev *)->dev_flags would serve for that
along with a per-Architecture/Platform mechanism for setting it ...

Casey
Ding Tianhong May 11, 2017, 1:15 a.m. UTC | #15
On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong <dingtianhong@huawei.com>
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple flag within the (struct pci_dev *)->dev_flags would serve for that
> along with a per-Architecture/Platform mechanism for setting it ...
> 

Hi Casey:

Will you continue to work on this solution or send a new version patches?

Thanks
Ding
> Casey
> 
> .
>
Casey Leedom May 16, 2017, 6:38 p.m. UTC | #16
| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, May 10, 2017 6:15 PM
|
| Hi Casey:
|
| Will you continue to work on this solution or send a new version patches?

I won't be able to work on this any time soon given several other urgent
issues.  If you've got a desire to pick this up, I'd be happy to help code
review your efforts.

Casey
Alexander Duyck May 17, 2017, 2:49 p.m. UTC | #17
On Tue, May 16, 2017 at 11:38 AM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Wednesday, May 10, 2017 6:15 PM
> |
> | Hi Casey:
> |
> | Will you continue to work on this solution or send a new version patches?
>
> I won't be able to work on this any time soon given several other urgent
> issues.  If you've got a desire to pick this up, I'd be happy to help code
> review your efforts.
>
> Casey

Thanks for the heads up. I'll see if I can find somebody in my team
that might be able to take on the task though I can't make any
promises either as we have quite a bit going on.

- Alex
diff mbox

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f754453..4ae78b3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3979,6 +3979,44 @@  static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..6764f66 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* Don't use Relaxed Ordering for TLPs directed at this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {