Message ID | 1499955692-26556-3-git-send-email-dingtianhong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 7/13/2017 10:21 AM, Ding Tianhong wrote: > static void pci_configure_relaxed_ordering(struct pci_dev *dev) > +{ > + /* We should not alter the relaxed ordering bit for the VF */ > + if (dev->is_virtfn) > + return; > + > + /* If the releaxed ordering enable bit is not set, do nothing. */ > + if (!pcie_relaxed_ordering_supported(dev)) > + return; > + > + if (pci_dev_should_disable_relaxed_ordering(dev)) { > + pcie_clear_relaxed_ordering(dev); > + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); > + } > +} I couldn't find anywhere where you actually enable the relaxed ordering like the subject suggests.
On 2017/7/14 5:09, Sinan Kaya wrote: > On 7/13/2017 10:21 AM, Ding Tianhong wrote: >> static void pci_configure_relaxed_ordering(struct pci_dev *dev) >> +{ >> + /* We should not alter the relaxed ordering bit for the VF */ >> + if (dev->is_virtfn) >> + return; >> + >> + /* If the releaxed ordering enable bit is not set, do nothing. */ >> + if (!pcie_relaxed_ordering_supported(dev)) >> + return; >> + >> + if (pci_dev_should_disable_relaxed_ordering(dev)) { >> + pcie_clear_relaxed_ordering(dev); >> + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); >> + } >> +} > > I couldn't find anywhere where you actually enable the relaxed ordering > like the subject suggests. > There is no code to enable the PCIe Relaxed Ordering bit in the configuration space, it is only be enable by default according to the PCIe Standard Specification, what we do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root Complex. Thanks Ding
On 7/13/2017 9:26 PM, Ding Tianhong wrote: > There is no code to enable the PCIe Relaxed Ordering bit in the configuration space, > it is only be enable by default according to the PCIe Standard Specification, what we > do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit > to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root > Complex. Maybe, you should change the patch commit as "Disable PCIe Relaxed Ordering if not supported"...
Hi Sinan, Bjorn: On 2017/7/14 21:54, Sinan Kaya wrote: > On 7/13/2017 9:26 PM, Ding Tianhong wrote: >> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space, >> it is only be enable by default according to the PCIe Standard Specification, what we >> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit >> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root >> Complex. > > Maybe, you should change the patch commit as > "Disable PCIe Relaxed Ordering if not supported"... I agree that to use the new commit title as your suggested, thanks. :) @Bjorn do you want me to spawn a new patchset with the new commit title and the Reviewed-by from Casey on the patch 3, or maybe you could pick this up and modify it own ? thanks. Ding >
On Sat, 22 Jul 2017 12:19:38 +0800 Ding Tianhong <dingtianhong@huawei.com> wrote: > Hi Sinan, Bjorn: > > On 2017/7/14 21:54, Sinan Kaya wrote: > > On 7/13/2017 9:26 PM, Ding Tianhong wrote: > >> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space, > >> it is only be enable by default according to the PCIe Standard Specification, what we > >> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit > >> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root > >> Complex. > > > > Maybe, you should change the patch commit as > > "Disable PCIe Relaxed Ordering if not supported"... > > I agree that to use the new commit title as your suggested, thanks. :) > > @Bjorn do you want me to spawn a new patchset with the new commit title > and the Reviewed-by from Casey on the patch 3, or maybe you could pick this > up and modify it own ? thanks. Hi Ding, Bjorn is currently on holiday so it might be a good idea to respin the series with any updates so nothing is lost. Thanks, Alex
By the way Ding, two issues: 1. Did we ever get any acknowledgement from either Intel or AMD on this patch? I know that we can't ensure that, but it sure would be nice since the PCI Quirks that we're putting in affect their products. 2. I just realized that there's still a small hole in the patch with respect to PCIe SR-IOV Virtual Functions. When the PCI Quirk notices a problematic PCIe Device and marks it to note that it's not "happy" receiving Transaction Layer Packets with the Relaxed Ordering Attribute set, it's my understanding that your current patch iterates down the PCIe Fabric and turns off the PCIe Capability Device Control[Relaxed Ordering Enable]. But this scan may miss any SR-IOV VFs because they probably won't be instantiated at that time. And, at a later time, when they are instantiated, they could well have their Relaxed Ordering Enable set. I think that the patch will need to be extended to modify drivers/pci.c/iov.c:sriov_enable() to explicitly turn off Relaxed Ordering Enable if the Root Complex is marked for no RO TLPs. Casey
| From: Alexander Duyck <alexander.duyck@gmail.com> | Sent: Wednesday, July 26, 2017 11:44 AM | | On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote: | | | | I think that the patch will need to be extended to modify | | drivers/pci.c/iov.c:sriov_enable() to explicitly turn off | | Relaxed Ordering Enable if the Root Complex is marked | for no RO TLPs. | | I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings? Ah yes, you're right. This is covered in section 3.5.4 of the Single Root I/O Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007), governing the PCIe Capability Device Control register. It states that the VF version of that register shall follow the setting of the corresponding PF. So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same way we did for the cxgb4 driver, but that's not critical since the Relaxed Ordering Enable supersedes the internal chip's desire to use the Relaxed Ordering Attribute. Ding, send me a note if you'd like me to work that up for you. | Also I thought most of the VF configuration space is read only. Yes, but not all of it. And when a VF is exported to a Virtual Machine, then the Hypervisor captures and interprets all accesses to the VF's PCIe Configuration Space from the VM. Thanks again for reminding me of the subtle aspect of the SR_IOV specification that I forgot. Casey
On 2017/7/27 3:05, Casey Leedom wrote: > | From: Alexander Duyck <alexander.duyck@gmail.com> > | Sent: Wednesday, July 26, 2017 11:44 AM > | > | On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote: > | | > | | I think that the patch will need to be extended to modify > | | drivers/pci.c/iov.c:sriov_enable() to explicitly turn off > | | Relaxed Ordering Enable if the Root Complex is marked > | for no RO TLPs. > | > | I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings? > > Ah yes, you're right. This is covered in section 3.5.4 of the Single Root I/O > Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007), > governing the PCIe Capability Device Control register. It states that the VF > version of that register shall follow the setting of the corresponding PF. > > So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same > way we did for the cxgb4 driver, but that's not critical since the Relaxed > Ordering Enable supersedes the internal chip's desire to use the Relaxed > Ordering Attribute. > > Ding, send me a note if you'd like me to work that up for you. > Ok, you could send the change log and I could put it in the v8 version together, will you base on the patch 3/3 or build a independence patch? Ding > | Also I thought most of the VF configuration space is read only. > > Yes, but not all of it. And when a VF is exported to a Virtual Machine, > then the Hypervisor captures and interprets all accesses to the VF's > PCIe Configuration Space from the VM. > > Thanks again for reminding me of the subtle aspect of the SR_IOV > specification that I forgot. > > Casey > . >
On 2017/7/27 2:26, Casey Leedom wrote: > By the way Ding, two issues: > > 1. Did we ever get any acknowledgement from either Intel or AMD > on this patch? I know that we can't ensure that, but it sure would > be nice since the PCI Quirks that we're putting in affect their > products. > Still no Intel and AMD guys has ack this, this is what I am worried about, should I ping some man again ? Thanks Ding > > Casey > . >
| From: Ding Tianhong <dingtianhong@huawei.com> | Sent: Wednesday, July 26, 2017 6:01 PM | | On 2017/7/27 3:05, Casey Leedom wrote: | > | > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up | > for you. | | Ok, you could send the change log and I could put it in the v8 version | together, will you base on the patch 3/3 or build a independence patch? Which ever you'd prefer. It would basically mirror the same exact code that you've got for cxgb4. I.e. testing the setting of the VF's PCIe Capability Device Control[Relaxed Ordering Enable], setting a new flag in adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq(). But since the VF's PF will already have disabled the PF's Relaxed Ordering Enable, the VF will also have it's Relaxed Ordering Enable disabled and any effort by the internal chip to send TLPs with the Relaxed Ordering Attribute will be gated by the PCIe logic. So it's not critical that this be in the first patch. Your call. Let me know if you'd like me to send that to you. | From: Ding Tianhong <dingtianhong@huawei.com> | Sent: Wednesday, July 26, 2017 6:08 PM | | On 2017/7/27 2:26, Casey Leedom wrote: | > | > 1. Did we ever get any acknowledgement from either Intel or AMD | > on this patch? I know that we can't ensure that, but it sure would | > be nice since the PCI Quirks that we're putting in affect their | > products. | | Still no Intel and AMD guys has ack this, this is what I am worried about, | should I ping some man again ? By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note yesterday with a link to the official Intel performance tuning documentation which covers this issue: https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf In section 3.9.1 we have: 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory and Toward MMIO Regions (P2P) In order to maximize performance for PCIe devices in the processors listed in Table 3-6 below, the soft- ware should determine whether the accesses are toward coherent memory (system memory) or toward MMIO regions (P2P access to other devices). If the access is toward MMIO region, then software can command HW to set the RO bit in the TLP header, as this would allow hardware to achieve maximum throughput for these types of accesses. For accesses toward coherent memory, software can command HW to clear the RO bit in the TLP header (no RO), as this would allow hardware to achieve maximum throughput for these types of accesses. Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe Performance Processor CPU RP Device IDs Intel Xeon processors based on 6F01H-6F0EH Broadwell microarchitecture Intel Xeon processors based on 2F01H-2F0EH Haswell microarchitecture Unfortunately that's a pretty thin section. But it does expand the set of Intel Root Complexes for which our Linux PCI Quirk will need to cover. So you should add those to the next (and hopefully final) spin of your patch. And, it also verifies the need to handle the use of Relaxed Ordering more subtlely than simply turning it off since the NVMe peer-to-peer example I keep bringing up would fall into the "need to use Relaxed Ordering" case ... It would have been nice to know why this is happening and if any future processor would fix this. After all, Relaxed Ordering, is just supposed to be a hint. At worst, a receiving device could just ignore the attribute entirely. Obviously someone made an effort to implement it but ... it didn't go the way they wanted. And, it also would have been nice to know if there was any hidden register in these Intel Root Complexes which can completely turn off the effort to pay attention to the Relaxed Ordering Attribute. We've spend an enormous amount of effort on this issue here on the Linux PCI email list struggling mightily to come up with a way to determine when it's safe/recommended/not-recommended/unsafe to use Relaxed Ordering when directing TLPs towards the Root Complex. And some architectures require RO for decent performance so we can't just "turn it off" unilatterally. Casey
On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote: > > > On 2017/7/27 2:26, Casey Leedom wrote: >> By the way Ding, two issues: >> >> 1. Did we ever get any acknowledgement from either Intel or AMD >> on this patch? I know that we can't ensure that, but it sure would >> be nice since the PCI Quirks that we're putting in affect their >> products. >> > > Still no Intel and AMD guys has ack this, this is what I am worried about, should I > ping some man again ? > > Thanks > Ding I probably wouldn't worry about it too much. If anything all this patch is doing is disabling relaxed ordering on the platforms we know have issues based on what Casey originally had. If nothing else we can follow up once the patches are in the kernel and if somebody has an issue then. You can include my acked-by, but it is mostly related to how this interacts with NICs, and not so much about the PCI chipsets themselves. Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Hi Casey > | Still no Intel and AMD guys has ack this, this is what I am worried about, > | should I ping some man again ? I can ack the patch set for Intel specific changes. Now that the doc is made public :-). Can you/Ding resend the patch series, i do have the most recent v7, some of the commit message wasn't easy to ready. Seems like this patch has gotten bigger than originally intended, but seems to be for the overall good :-). Sorry for staying silent up until now. Cheers, Ashok
On 2017/7/28 1:44, Casey Leedom wrote: > | From: Ding Tianhong <dingtianhong@huawei.com> > | Sent: Wednesday, July 26, 2017 6:01 PM > | > | On 2017/7/27 3:05, Casey Leedom wrote: > | > > | > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up > | > for you. > | > | Ok, you could send the change log and I could put it in the v8 version > | together, will you base on the patch 3/3 or build a independence patch? > > Which ever you'd prefer. It would basically mirror the same exact code that > you've got for cxgb4. I.e. testing the setting of the VF's PCIe Capability > Device Control[Relaxed Ordering Enable], setting a new flag in > adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq(). > But since the VF's PF will already have disabled the PF's Relaxed Ordering > Enable, the VF will also have it's Relaxed Ordering Enable disabled and any > effort by the internal chip to send TLPs with the Relaxed Ordering Attribute > will be gated by the PCIe logic. So it's not critical that this be in the > first patch. Your call. Let me know if you'd like me to send that to you. > Good, please Send it to me, I will put it together and send the v8 this week, I think Bjorn will be back next week .:) > > | From: Ding Tianhong <dingtianhong@huawei.com> > | Sent: Wednesday, July 26, 2017 6:08 PM > | > | On 2017/7/27 2:26, Casey Leedom wrote: > | > > | > 1. Did we ever get any acknowledgement from either Intel or AMD > | > on this patch? I know that we can't ensure that, but it sure would > | > be nice since the PCI Quirks that we're putting in affect their > | > products. > | > | Still no Intel and AMD guys has ack this, this is what I am worried about, > | should I ping some man again ? > > By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note > yesterday with a link to the official Intel performance tuning documentation > which covers this issue: > > https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf > > In section 3.9.1 we have: > > 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory > and Toward MMIO Regions (P2P) > > In order to maximize performance for PCIe devices in the processors > listed in Table 3-6 below, the soft- ware should determine whether the > accesses are toward coherent memory (system memory) or toward MMIO > regions (P2P access to other devices). If the access is toward MMIO > region, then software can command HW to set the RO bit in the TLP > header, as this would allow hardware to achieve maximum throughput for > these types of accesses. For accesses toward coherent memory, software > can command HW to clear the RO bit in the TLP header (no RO), as this > would allow hardware to achieve maximum throughput for these types of > accesses. > > Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing > PCIe Performance > > Processor CPU RP Device IDs > > Intel Xeon processors based on 6F01H-6F0EH > Broadwell microarchitecture > > Intel Xeon processors based on 2F01H-2F0EH > Haswell microarchitecture > > Unfortunately that's a pretty thin section. But it does expand the set of > Intel Root Complexes for which our Linux PCI Quirk will need to cover. So > you should add those to the next (and hopefully final) spin of your patch. > And, it also verifies the need to handle the use of Relaxed Ordering more > subtlely than simply turning it off since the NVMe peer-to-peer example I > keep bringing up would fall into the "need to use Relaxed Ordering" case ... > > It would have been nice to know why this is happening and if any future > processor would fix this. After all, Relaxed Ordering, is just supposed to > be a hint. At worst, a receiving device could just ignore the attribute > entirely. Obviously someone made an effort to implement it but ... it > didn't go the way they wanted. > > And, it also would have been nice to know if there was any hidden register > in these Intel Root Complexes which can completely turn off the effort to > pay attention to the Relaxed Ordering Attribute. We've spend an enormous > amount of effort on this issue here on the Linux PCI email list struggling > mightily to come up with a way to determine when it's > safe/recommended/not-recommended/unsafe to use Relaxed Ordering when > directing TLPs towards the Root Complex. And some architectures require RO > for decent performance so we can't just "turn it off" unilatterally. > I am glad to hear that more person were focus on this problem, It would be great if they could enter our discussion and give us more suggestion. :) Thanks Ding > Casey > > . >
On 2017/7/28 2:42, Raj, Ashok wrote: > Hi Casey > >> | Still no Intel and AMD guys has ack this, this is what I am worried about, >> | should I ping some man again ? > > > I can ack the patch set for Intel specific changes. Now that the doc is made > public :-). > Good, Thanks. :) > Can you/Ding resend the patch series, i do have the most recent v7, some > of the commit message wasn't easy to ready. Seems like this patch has > gotten bigger than originally intended, but seems to be for the overall > good :-). > OK, I will send v8 patch set and which will update the patch title and add Casey's new modification for his vf driver, thanks. Ding > Sorry for staying silent up until now. > > Cheers, > Ashok > > . >
On 2017/7/28 1:49, Alexander Duyck wrote: > On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote: >> >> >> On 2017/7/27 2:26, Casey Leedom wrote: >>> By the way Ding, two issues: >>> >>> 1. Did we ever get any acknowledgement from either Intel or AMD >>> on this patch? I know that we can't ensure that, but it sure would >>> be nice since the PCI Quirks that we're putting in affect their >>> products. >>> >> >> Still no Intel and AMD guys has ack this, this is what I am worried about, should I >> ping some man again ? >> >> Thanks >> Ding > > > I probably wouldn't worry about it too much. If anything all this > patch is doing is disabling relaxed ordering on the platforms we know > have issues based on what Casey originally had. If nothing else we can > follow up once the patches are in the kernel and if somebody has an > issue then. > > You can include my acked-by, but it is mostly related to how this > interacts with NICs, and not so much about the PCI chipsets > themselves. > > Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> > Thanks, Alex. :) > . >
Hi Ding patch looks good, except would reword the patch description for clarity here is my crack at it, feel free to use. On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote: > The PCIe Device Control Register use the bit 4 to indicate that > whether the device is permitted to enable relaxed ordering or not. > But relaxed ordering is not safe for some platform which could only > use strong write ordering, so devices are allowed (but not required) > to enable relaxed ordering bit by default. > > If a PCIe device didn't enable the relaxed ordering attribute default, > we should not do anything in the PCIe configuration, otherwise we > should check if any of the devices above us do not support relaxed > ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on > the result if we get a return that indicate that the relaxed ordering > is not supported we should update our device to disable relaxed ordering > in configuration space. If the device above us doesn't exist or isn't > the PCIe device, we shouldn't do anything and skip updating relaxed ordering > because we are probably running in a guest machine. When bit4 is set in the PCIe Device Control register, it indicates whether the device is permitted to use relaxed ordering. On some platforms using relaxed ordering can have performance issues or due to erratum can cause data-corruption. In such cases devices must avoid using relaxed ordering. This patch checks if there is any node in the hierarchy that indicates that using relaxed ordering is not safe. In such cases the patch turns off the relaxed ordering by clearing the eapability for this device. > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > --- > drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++ > drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++ > include/linux/pci.h | 2 ++ > 3 files changed, 68 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index d88edf5..7a6b32f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) > EXPORT_SYMBOL(pcie_set_mps); > > /** > + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit > + * @dev: PCI device to query > + * > + * If possible clear relaxed ordering > + */ > +int pcie_clear_relaxed_ordering(struct pci_dev *dev) > +{ > + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, > + PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); > + > +/** > + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support > + * @dev: PCI device to query > + * > + * Returns true if the device support relaxed ordering attribute. > + */ > +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) > +{ > + u16 v; > + > + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); > + > + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); > + > +/** > * pcie_get_minimum_link - determine minimum link settings of a PCI device > * @dev: PCI device to query > * @speed: storage for minimum speed > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index c31310d..48df012 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev) > PCI_EXP_DEVCTL_EXT_TAG); > } > > +/** > + * pci_dev_should_disable_relaxed_ordering - check if the PCI device > + * should disable the relaxed ordering attribute. > + * @dev: PCI device > + * > + * Return true if any of the PCI devices above us do not support > + * relaxed ordering. > + */ > +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) > +{ > + while (dev) { > + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) > + return true; > + > + dev = dev->bus->self; > + } > + > + return false; > +} > + > +static void pci_configure_relaxed_ordering(struct pci_dev *dev) > +{ > + /* We should not alter the relaxed ordering bit for the VF */ > + if (dev->is_virtfn) > + return; > + > + /* If the releaxed ordering enable bit is not set, do nothing. */ > + if (!pcie_relaxed_ordering_supported(dev)) > + return; > + > + if (pci_dev_should_disable_relaxed_ordering(dev)) { > + pcie_clear_relaxed_ordering(dev); > + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); > + } > +} > + > static void pci_configure_device(struct pci_dev *dev) > { > struct hotplug_params hpp; > @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) > > pci_configure_mps(dev); > pci_configure_extended_tags(dev); > + pci_configure_relaxed_ordering(dev); > > memset(&hpp, 0, sizeof(hpp)); > ret = pci_get_hp_params(dev, &hpp); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 412ec1c..3aa23a2 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, > void pci_pme_wakeup_bus(struct pci_bus *bus); > void pci_d3cold_enable(struct pci_dev *dev); > void pci_d3cold_disable(struct pci_dev *dev); > +int pcie_clear_relaxed_ordering(struct pci_dev *dev); > +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); > > /* PCI Virtual Channel */ > int pci_save_vc_state(struct pci_dev *dev); > -- > 1.8.3.1 > >
On 2017/8/3 17:13, Raj, Ashok wrote: > Hi Ding > > patch looks good, except would reword the patch description for clarity > > here is my crack at it, feel free to use. > > On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote: >> The PCIe Device Control Register use the bit 4 to indicate that >> whether the device is permitted to enable relaxed ordering or not. >> But relaxed ordering is not safe for some platform which could only >> use strong write ordering, so devices are allowed (but not required) >> to enable relaxed ordering bit by default. >> >> If a PCIe device didn't enable the relaxed ordering attribute default, >> we should not do anything in the PCIe configuration, otherwise we >> should check if any of the devices above us do not support relaxed >> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on >> the result if we get a return that indicate that the relaxed ordering >> is not supported we should update our device to disable relaxed ordering >> in configuration space. If the device above us doesn't exist or isn't >> the PCIe device, we shouldn't do anything and skip updating relaxed ordering >> because we are probably running in a guest machine. > > When bit4 is set in the PCIe Device Control register, it indicates > whether the device is permitted to use relaxed ordering. > On some platforms using relaxed ordering can have performance issues or > due to erratum can cause data-corruption. In such cases devices must avoid > using relaxed ordering. > > This patch checks if there is any node in the hierarchy that indicates that > using relaxed ordering is not safe. In such cases the patch turns off the > relaxed ordering by clearing the eapability for this device. > Good, thanks for the commit, I will send v8 and update the patch description. Ding >> >> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> >> --- >> drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++ >> drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++ >> include/linux/pci.h | 2 ++ >> 3 files changed, 68 insertions(+) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index d88edf5..7a6b32f 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) >> EXPORT_SYMBOL(pcie_set_mps); >> >> /** >> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit >> + * @dev: PCI device to query >> + * >> + * If possible clear relaxed ordering >> + */ >> +int pcie_clear_relaxed_ordering(struct pci_dev *dev) >> +{ >> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, >> + PCI_EXP_DEVCTL_RELAX_EN); >> +} >> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); >> + >> +/** >> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support >> + * @dev: PCI device to query >> + * >> + * Returns true if the device support relaxed ordering attribute. >> + */ >> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) >> +{ >> + u16 v; >> + >> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); >> + >> + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); >> +} >> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); >> + >> +/** >> * pcie_get_minimum_link - determine minimum link settings of a PCI device >> * @dev: PCI device to query >> * @speed: storage for minimum speed >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index c31310d..48df012 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev) >> PCI_EXP_DEVCTL_EXT_TAG); >> } >> >> +/** >> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device >> + * should disable the relaxed ordering attribute. >> + * @dev: PCI device >> + * >> + * Return true if any of the PCI devices above us do not support >> + * relaxed ordering. >> + */ >> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) >> +{ >> + while (dev) { >> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) >> + return true; >> + >> + dev = dev->bus->self; >> + } >> + >> + return false; >> +} >> + >> +static void pci_configure_relaxed_ordering(struct pci_dev *dev) >> +{ >> + /* We should not alter the relaxed ordering bit for the VF */ >> + if (dev->is_virtfn) >> + return; >> + >> + /* If the releaxed ordering enable bit is not set, do nothing. */ >> + if (!pcie_relaxed_ordering_supported(dev)) >> + return; >> + >> + if (pci_dev_should_disable_relaxed_ordering(dev)) { >> + pcie_clear_relaxed_ordering(dev); >> + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); >> + } >> +} >> + >> static void pci_configure_device(struct pci_dev *dev) >> { >> struct hotplug_params hpp; >> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) >> >> pci_configure_mps(dev); >> pci_configure_extended_tags(dev); >> + pci_configure_relaxed_ordering(dev); >> >> memset(&hpp, 0, sizeof(hpp)); >> ret = pci_get_hp_params(dev, &hpp); >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 412ec1c..3aa23a2 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, >> void pci_pme_wakeup_bus(struct pci_bus *bus); >> void pci_d3cold_enable(struct pci_dev *dev); >> void pci_d3cold_disable(struct pci_dev *dev); >> +int pcie_clear_relaxed_ordering(struct pci_dev *dev); >> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); >> >> /* PCI Virtual Channel */ >> int pci_save_vc_state(struct pci_dev *dev); >> -- >> 1.8.3.1 >> >> > > . >
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d88edf5..7a6b32f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) EXPORT_SYMBOL(pcie_set_mps); /** + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit + * @dev: PCI device to query + * + * If possible clear relaxed ordering + */ +int pcie_clear_relaxed_ordering(struct pci_dev *dev) +{ + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); + +/** + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support + * @dev: PCI device to query + * + * Returns true if the device support relaxed ordering attribute. + */ +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) +{ + u16 v; + + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); + + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); + +/** * pcie_get_minimum_link - determine minimum link settings of a PCI device * @dev: PCI device to query * @speed: storage for minimum speed diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index c31310d..48df012 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev) PCI_EXP_DEVCTL_EXT_TAG); } +/** + * pci_dev_should_disable_relaxed_ordering - check if the PCI device + * should disable the relaxed ordering attribute. + * @dev: PCI device + * + * Return true if any of the PCI devices above us do not support + * relaxed ordering. + */ +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) +{ + while (dev) { + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) + return true; + + dev = dev->bus->self; + } + + return false; +} + +static void pci_configure_relaxed_ordering(struct pci_dev *dev) +{ + /* We should not alter the relaxed ordering bit for the VF */ + if (dev->is_virtfn) + return; + + /* If the releaxed ordering enable bit is not set, do nothing. */ + if (!pcie_relaxed_ordering_supported(dev)) + return; + + if (pci_dev_should_disable_relaxed_ordering(dev)) { + pcie_clear_relaxed_ordering(dev); + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); + } +} + static void pci_configure_device(struct pci_dev *dev) { struct hotplug_params hpp; @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) pci_configure_mps(dev); pci_configure_extended_tags(dev); + pci_configure_relaxed_ordering(dev); memset(&hpp, 0, sizeof(hpp)); ret = pci_get_hp_params(dev, &hpp); diff --git a/include/linux/pci.h b/include/linux/pci.h index 412ec1c..3aa23a2 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, void pci_pme_wakeup_bus(struct pci_bus *bus); void pci_d3cold_enable(struct pci_dev *dev); void pci_d3cold_disable(struct pci_dev *dev); +int pcie_clear_relaxed_ordering(struct pci_dev *dev); +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); /* PCI Virtual Channel */ int pci_save_vc_state(struct pci_dev *dev);
The PCIe Device Control Register use the bit 4 to indicate that whether the device is permitted to enable relaxed ordering or not. But relaxed ordering is not safe for some platform which could only use strong write ordering, so devices are allowed (but not required) to enable relaxed ordering bit by default. If a PCIe device didn't enable the relaxed ordering attribute default, we should not do anything in the PCIe configuration, otherwise we should check if any of the devices above us do not support relaxed ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on the result if we get a return that indicate that the relaxed ordering is not supported we should update our device to disable relaxed ordering in configuration space. If the device above us doesn't exist or isn't the PCIe device, we shouldn't do anything and skip updating relaxed ordering because we are probably running in a guest machine. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> --- drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++ drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++ include/linux/pci.h | 2 ++ 3 files changed, 68 insertions(+)