Message ID | 20161121165352.GM25762@bhelgaas-glaptop.roam.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote: > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: > > Hi Johannes, > > > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote: > > > The Read Completion Boundary (RCB) bit must only be set on a device or > > > endpoint if it is set on the root complex. > > > > I propose the following slightly modified patch. The interesting > > difference is that your patch only touches the _HPX "OR" mask, so it > > refrains from *setting* RCB in some cases, but it never actually > > *clears* it. The only time we clear RCB is when the _HPX "AND" mask > > has RCB == 0. > > > > My intent below is that we completely ignore the _HPX RCB bits, and we > > set an Endpoint's RCB if and only if the Root Port's RCB is set. > > > > I made an ugly ASCII table to think about the cases: > > > > Root EP _HPX _HPX Final Endpoint RCB state > > Port (init) AND OR (curr) (yours) (mine) > > 0) 0 0 0 0 0 0 0 > > 1) 0 0 0 1 1 0 0 > > 2) 0 0 1 0 0 0 0 > > 3) 0 0 1 1 1 0 0 > > 4) 0 1 0 0 0 0 0 > > 5) 0 1 0 1 1 0 0 > > 6) 0 1 1 0 1 1 0 > > 7) 0 1 1 1 1 1 0 > > 8) 1 0 0 0 0 0 1 > > 9) 1 0 0 1 1 1 1 > > A) 1 0 1 0 0 0 1 > > B) 1 0 1 1 1 1 1 > > C) 1 1 0 0 0 0 1 > > D) 1 1 0 1 1 1 1 > > E) 1 1 1 0 1 1 1 > > F) 1 1 1 1 1 1 1 > > > > Cases 0-7 should all result in the Endpoint RCB being zero because the > > Root Port RCB is zero. Case 1 is the bug you're fixing. Cases 3 & 5 > > are similar hypothetical bugs your patch also fixes. > > > > Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't > > tell us to clear it, are hypothetical firmware bugs that your patch > > wouldn't fix. > > > > In cases 8, A, and C, we currently leave the Endpoint RCB cleared, > > either because firmware left it clear and _HPX didn't tell us to set > > it (8 and A), or because firmware set it but _HPX told us to clear it > > (C). > > > > One could argue that 8, A, and C should stay as they currently are, as > > a way for _HPX to work around hardware bugs, e.g., a Root Port that > > advertises a 128-byte RCB but doesn't actually support it. I didn't > > bother with that and set the Endpoint's RCB to 128 in all cases when > > the Root Port claims to support it. > > > > It'd be great if you could test this and comment. > > > > If you get a chance, collect the /proc/iomem contents, too. That's > > not for this bug; it's because I'm curious about the > > > > ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST > > > > problem in your dmesg log. > > Oops, I goofed and forgot to clear RCB by default. > Here's the fixed one. Yep, my contact already noticed. I have heard rumors that the first two patches worked on RHEL and the 3rd one didn't (but that's just rumors) so I try to persuade our field engineer to spend another day testing the patches. But please be aware this is a bit cumbersome as I don't have access to the machine and our field engineer only has remote access as well. Byte, Johannes
On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote: > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: > > Hi Johannes, [...] > > > > If you get a chance, collect the /proc/iomem contents, too. That's > > not for this bug; it's because I'm curious about the > > > > ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST > > > > problem in your dmesg log. It's pasted at the end (not sure whether attaching works). > > Oops, I goofed and forgot to clear RCB by default. > Here's the fixed one. So this version works :-). Do you plan to merge it for 4.9-rc7 or is it more 4.10 material? Byte, Johannes /proc/iomem: 00000000-00000fff : reserved 00001000-0009bfff : System RAM 0009c000-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000fffff : PCI Bus 0000:00 000c0000-000c7fff : Video ROM 000c8000-000c8fff : Adapter ROM 000c9000-000c9fff : Adapter ROM 000ca000-000cafff : Adapter ROM 000cb000-000cbfff : Adapter ROM 000cc000-000d3bff : Adapter ROM 000e0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-b6a32fff : System RAM 01000000-015e8742 : Kernel code 015e8743-01d549ff : Kernel data 01f57000-021cefff : Kernel bss b6a33000-b6a41fff : reserved b6a42000-b7ffefff : System RAM b7fff000-b83fefff : reserved b83ff000-ba3fefff : ACPI Non-volatile Storage ba3ff000-ba7a7fff : ACPI Tables ba7a8000-ba7fffff : System RAM ba800000-bb7fffff : RAM buffer bb800000-cfffffff : reserved c0000000-cfffffff : PCI MMCONFIG 0000 [bus 00-ff] d0004000-d0004fff : dmar3 d0007000-d3ffffff : PCI Bus 0000:00 d0100000-d02fffff : PCI Bus 0000:11 d0300000-d04fffff : PCI Bus 0000:17 d20fd000-d20fdfff : 0000:00:05.4 d20fe000-d20fe3ff : 0000:00:1a.0 d20fe000-d20fe3ff : ehci_hcd d20ff000-d20ff3ff : 0000:00:1d.0 d20ff000-d20ff3ff : ehci_hcd d2100000-d22fffff : PCI Bus 0000:01 d2100000-d213ffff : 0000:01:00.0 d2140000-d217ffff : 0000:01:00.1 d2180000-d21bffff : 0000:01:00.2 d21f0000-d21f3fff : 0000:01:00.0 d21f0000-d21f3fff : igb d21f4000-d21f7fff : 0000:01:00.1 d21f4000-d21f7fff : igb d21f8000-d21fbfff : 0000:01:00.2 d21f8000-d21fbfff : igb d21fc000-d21fffff : 0000:01:00.3 d21fc000-d21fffff : igb d2200000-d223ffff : 0000:01:00.0 d2200000-d223ffff : igb d2240000-d227ffff : 0000:01:00.1 d2240000-d227ffff : igb d2280000-d22bffff : 0000:01:00.2 d2280000-d22bffff : igb d22c0000-d22fffff : 0000:01:00.3 d22c0000-d22fffff : igb d2300000-d24fffff : PCI Bus 0000:07 d23f0000-d23fffff : 0000:07:00.0 d23f0000-d23fffff : megasas: LSI d2400000-d24fffff : 0000:07:00.0 d2500000-d25fffff : PCI Bus 0000:1d d25fe000-d25fffff : 0000:1d:00.0 d25fe000-d25fffff : xhci-hcd d2600000-d2ffffff : PCI Bus 0000:18 d2600000-d2ffffff : PCI Bus 0000:19 d2600000-d26fffff : PCI Bus 0000:1c d2700000-d2ffffff : PCI Bus 0000:1a d2700000-d2ffffff : PCI Bus 0000:1b d27fc000-d27fffff : 0000:1b:00.0 d27fc000-d27fffff : mgadrmfb_mmio d2800000-d2ffffff : 0000:1b:00.0 d3000000-d3ffffff : PCI Bus 0000:18 d3000000-d3ffffff : PCI Bus 0000:19 d3000000-d3ffffff : PCI Bus 0000:1a d3000000-d3ffffff : PCI Bus 0000:1b d3000000-d3ffffff : 0000:1b:00.0 d3000000-d3ffffff : mgadrmfb_vram d4000000-d4000fff : dmar0 d4003000-dbffffff : PCI Bus 0000:40 d4100000-d42fffff : PCI Bus 0000:51 d76ff000-d76fffff : 0000:40:05.4 d7700000-d77fffff : PCI Bus 0000:41 d7700000-d77fffff : 0000:41:00.0 d7800000-db7fffff : PCI Bus 0000:47 db800000-dbffffff : PCI Bus 0000:41 db800000-dbffffff : 0000:41:00.0 dc000000-dc000fff : dmar1 dc003000-ebffffff : PCI Bus 0000:80 dffff000-dfffffff : 0000:80:05.4 e0000000-e3ffffff : PCI Bus 0000:81 e4000000-e7ffffff : PCI Bus 0000:8b e8000000-ebffffff : PCI Bus 0000:95 ec000000-ec000fff : dmar2 ec003000-fbffffff : PCI Bus 0000:c0 effff000-efffffff : 0000:c0:05.4 f0000000-f3ffffff : PCI Bus 0000:c1 f4000000-f7ffffff : PCI Bus 0000:cb f8000000-fbffffff : PCI Bus 0000:d5 fec00000-fecfffff : PNP0003:00 fec00000-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec21000-fec213ff : IOAPIC 2 fec41000-fec413ff : IOAPIC 3 fec61000-fec613ff : IOAPIC 4 fed00000-fed003ff : HPET 0 fed00000-fed003ff : PNP0103:00 fed12000-fed1200f : pnp 00:01 fed12010-fed1201f : pnp 00:01 fed1b000-fed1bfff : pnp 00:01 fed1c000-fed1ffff : reserved fed1f410-fed1f414 : iTCO_wdt.0.auto fed45000-fed8bfff : pnp 00:01 fee00000-feefffff : pnp 00:01 fee00000-fee00fff : Local APIC ff000000-ffffffff : reserved ff000000-ffffffff : pnp 00:01 100000000-803fffffff : System RAM 20000000000-23fffffffff : PCI Bus 0000:00 20000000000-200001fffff : PCI Bus 0000:07 20000200000-200003fffff : PCI Bus 0000:11 20000400000-200005fffff : PCI Bus 0000:17 20000600000-200007fffff : PCI Bus 0000:1d 23fffee0000-23fffee3fff : 0000:00:04.0 23fffee0000-23fffee3fff : ioatdma 23fffee4000-23fffee7fff : 0000:00:04.1 23fffee4000-23fffee7fff : ioatdma 23fffee8000-23fffeebfff : 0000:00:04.2 23fffee8000-23fffeebfff : ioatdma 23fffeec000-23fffeeffff : 0000:00:04.3 23fffeec000-23fffeeffff : ioatdma 23fffef0000-23fffef3fff : 0000:00:04.4 23fffef0000-23fffef3fff : ioatdma 23fffef4000-23fffef7fff : 0000:00:04.5 23fffef4000-23fffef7fff : ioatdma 23fffef8000-23fffefbfff : 0000:00:04.6 23fffef8000-23fffefbfff : ioatdma 23fffefc000-23fffefffff : 0000:00:04.7 23fffefc000-23fffefffff : ioatdma 23ffff00000-23fffffffff : PCI Bus 0000:01 23ffff00000-23ffff1ffff : 0000:01:00.0 23ffff20000-23ffff3ffff : 0000:01:00.0 23ffff40000-23ffff5ffff : 0000:01:00.1 23ffff60000-23ffff7ffff : 0000:01:00.1 23ffff80000-23ffff9ffff : 0000:01:00.2 23ffffa0000-23ffffbffff : 0000:01:00.2 23ffffc0000-23ffffdffff : 0000:01:00.3 23ffffe0000-23fffffffff : 0000:01:00.3 24000000000-27fffffffff : PCI Bus 0000:40 24000000000-240001fffff : PCI Bus 0000:51 26ffffe0000-26ffffe3fff : 0000:40:04.0 26ffffe0000-26ffffe3fff : ioatdma 26ffffe4000-26ffffe7fff : 0000:40:04.1 26ffffe4000-26ffffe7fff : ioatdma 26ffffe8000-26ffffebfff : 0000:40:04.2 26ffffe8000-26ffffebfff : ioatdma 26ffffec000-26ffffeffff : 0000:40:04.3 26ffffec000-26ffffeffff : ioatdma 26fffff0000-26fffff3fff : 0000:40:04.4 26fffff0000-26fffff3fff : ioatdma 26fffff4000-26fffff7fff : 0000:40:04.5 26fffff4000-26fffff7fff : ioatdma 26fffff8000-26fffffbfff : 0000:40:04.6 26fffff8000-26fffffbfff : ioatdma 26fffffc000-26fffffffff : 0000:40:04.7 26fffffc000-26fffffffff : ioatdma 27000000000-27fffffffff : PCI Bus 0000:47 28000000000-2bfffffffff : PCI Bus 0000:80 28ffffe0000-28ffffe3fff : 0000:80:04.0 28ffffe0000-28ffffe3fff : ioatdma 28ffffe4000-28ffffe7fff : 0000:80:04.1 28ffffe4000-28ffffe7fff : ioatdma 28ffffe8000-28ffffebfff : 0000:80:04.2 28ffffe8000-28ffffebfff : ioatdma 28ffffec000-28ffffeffff : 0000:80:04.3 28ffffec000-28ffffeffff : ioatdma 28fffff0000-28fffff3fff : 0000:80:04.4 28fffff0000-28fffff3fff : ioatdma 28fffff4000-28fffff7fff : 0000:80:04.5 28fffff4000-28fffff7fff : ioatdma 28fffff8000-28fffffbfff : 0000:80:04.6 28fffff8000-28fffffbfff : ioatdma 28fffffc000-28fffffffff : 0000:80:04.7 28fffffc000-28fffffffff : ioatdma 29000000000-29fffffffff : PCI Bus 0000:95 2a000000000-2afffffffff : PCI Bus 0000:8b 2b000000000-2bfffffffff : PCI Bus 0000:81 2c000000000-2ffffffffff : PCI Bus 0000:c0 2cffffe0000-2cffffe3fff : 0000:c0:04.0 2cffffe0000-2cffffe3fff : ioatdma 2cffffe4000-2cffffe7fff : 0000:c0:04.1 2cffffe4000-2cffffe7fff : ioatdma 2cffffe8000-2cffffebfff : 0000:c0:04.2 2cffffe8000-2cffffebfff : ioatdma 2cffffec000-2cffffeffff : 0000:c0:04.3 2cffffec000-2cffffeffff : ioatdma 2cfffff0000-2cfffff3fff : 0000:c0:04.4 2cfffff0000-2cfffff3fff : ioatdma 2cfffff4000-2cfffff7fff : 0000:c0:04.5 2cfffff4000-2cfffff7fff : ioatdma 2cfffff8000-2cfffffbfff : 0000:c0:04.6 2cfffff8000-2cfffffbfff : ioatdma 2cfffffc000-2cfffffffff : 0000:c0:04.7 2cfffffc000-2cfffffffff : ioatdma 2d000000000-2dfffffffff : PCI Bus 0000:d5 2e000000000-2efffffffff : PCI Bus 0000:cb 2f000000000-2ffffffffff : PCI Bus 0000:c1
On Mon, Nov 21, 2016 at 9:53 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: >> Hi Johannes, >> >> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote: >> > The Read Completion Boundary (RCB) bit must only be set on a device or >> > endpoint if it is set on the root complex. >> >> I propose the following slightly modified patch. The interesting >> difference is that your patch only touches the _HPX "OR" mask, so it >> refrains from *setting* RCB in some cases, but it never actually >> *clears* it. The only time we clear RCB is when the _HPX "AND" mask >> has RCB == 0. >> >> My intent below is that we completely ignore the _HPX RCB bits, and we >> set an Endpoint's RCB if and only if the Root Port's RCB is set. >> >> I made an ugly ASCII table to think about the cases: >> >> Root EP _HPX _HPX Final Endpoint RCB state >> Port (init) AND OR (curr) (yours) (mine) >> 0) 0 0 0 0 0 0 0 >> 1) 0 0 0 1 1 0 0 >> 2) 0 0 1 0 0 0 0 >> 3) 0 0 1 1 1 0 0 >> 4) 0 1 0 0 0 0 0 >> 5) 0 1 0 1 1 0 0 >> 6) 0 1 1 0 1 1 0 >> 7) 0 1 1 1 1 1 0 >> 8) 1 0 0 0 0 0 1 >> 9) 1 0 0 1 1 1 1 >> A) 1 0 1 0 0 0 1 >> B) 1 0 1 1 1 1 1 >> C) 1 1 0 0 0 0 1 >> D) 1 1 0 1 1 1 1 >> E) 1 1 1 0 1 1 1 >> F) 1 1 1 1 1 1 1 >> >> Cases 0-7 should all result in the Endpoint RCB being zero because the >> Root Port RCB is zero. Case 1 is the bug you're fixing. Cases 3 & 5 >> are similar hypothetical bugs your patch also fixes. >> >> Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't >> tell us to clear it, are hypothetical firmware bugs that your patch >> wouldn't fix. >> >> In cases 8, A, and C, we currently leave the Endpoint RCB cleared, >> either because firmware left it clear and _HPX didn't tell us to set >> it (8 and A), or because firmware set it but _HPX told us to clear it >> (C). >> >> One could argue that 8, A, and C should stay as they currently are, as >> a way for _HPX to work around hardware bugs, e.g., a Root Port that >> advertises a 128-byte RCB but doesn't actually support it. I didn't >> bother with that and set the Endpoint's RCB to 128 in all cases when >> the Root Port claims to support it. >> >> It'd be great if you could test this and comment. >> >> If you get a chance, collect the /proc/iomem contents, too. That's >> not for this bug; it's because I'm curious about the >> >> ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST >> >> problem in your dmesg log. > > Oops, I goofed and forgot to clear RCB by default. > Here's the fixed one. > Testing on our end was successful again with this fix ;) Tested-by: Frank Danapfel <fdanapfe@redhat.com> Acked-by: Myron Stowe <myron.stowe@redhat.com> > > commit b7bff74c2e6babf12906291ee177f16444de81ad > Author: Johannes Thumshirn <jthumshirn@suse.de> > Date: Wed Nov 16 15:47:53 2016 -0600 > > PCI: Set Read Completion Boundary to 128 iff Root Port supports it > > Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB) > determines the naturally aligned address boundaries on which a Read Request > may be serviced with multiple Completions: > > - For a Root Complex, RCB is 64 bytes or 128 bytes > This value is reported in the Link Control Register > > Note: Bridges and Endpoints may implement a corresponding command bit > which may be set by system software to indicate the RCB value for the > Root Complex, allowing the Bridge/Endpoint to optimize its behavior > when the Root Complex’s RCB is 128 bytes. > > - For all other system elements, RCB is 128 bytes > > Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all > downstream devices must be clear, indicating an RCB of 64 bytes. If the > Root Port supports a 128-byte RCB, we may optionally set the RCB of > downstream devices so they know they can generate larger Completions. > > Some BIOSes supply an _HPX that tells us to set RCB, even though the Root > Port doesn't have RCB set, which may lead to Malformed TLP errors if the > Endpoint generates completions larger than the Root Port can handle. > > The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies > such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize: > > mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared) > mlx4_core 0000:41:00.0: device is going to be reset > mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting > mlx4_core 0000:41:00.0: Fail to reset HCA > ------------[ cut here ]------------ > kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193! > > After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") > and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices > with a link"), we apply _HPX settings to *all* devices, not just those > hot-added after boot. > > Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device > worked. After 7a1562d4f2d0, we set its RCB to 128, and it failed. > > Set the RCB to 128 iff the Root Port supports a 128-byte RCB. Otherwise, > set RCB to 64 bytes. This effectively ignores what _HPX tells us about > RCB. > > [bhelgaas: changelog, clear RCB if not set for Root Port] > Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") > Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781 > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: stable@vger.kernel.org # v3.18+ > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index ab00267..104c46d 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp) > dev_warn(&dev->dev, "PCI-X settings not supported\n"); > } > > +static bool pcie_root_rcb_set(struct pci_dev *dev) > +{ > + struct pci_dev *rp = pcie_find_root_port(dev); > + u16 lnkctl; > + > + if (!rp) > + return false; > + > + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl); > + if (lnkctl & PCI_EXP_LNKCTL_RCB) > + return true; > + > + return false; > +} > + > static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) > { > int pos; > @@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) > ~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or); > > /* Initialize Link Control Register */ > - if (pcie_cap_has_lnkctl(dev)) > + if (pcie_cap_has_lnkctl(dev)) { > + > + /* > + * If the Root Port supports Read Completion Boundary of > + * 128, set RCB to 128. Otherwise, clear it. > + */ > + hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB; > + hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB; > + if (pcie_root_rcb_set(dev)) > + hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB; > + > pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL, > ~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or); > + } > > /* Find Advanced Error Reporting Enhanced Capability */ > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote: > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: > > Hi Johannes, > > > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote: > > > The Read Completion Boundary (RCB) bit must only be set on a device or > > > endpoint if it is set on the root complex. > ... > Here's the fixed one. I applied the one below to for-linus for v4.9. I did tweak the preceding patch to only share pcie_find_root_port() inside drivers/pci -- moved the declaration from include/linux/pci.h to drivers/pci/pci.h and moved the definition to drivers/pci/search.c. I don't think it needs to be visible to the whole kernel. > commit b7bff74c2e6babf12906291ee177f16444de81ad > Author: Johannes Thumshirn <jthumshirn@suse.de> > Date: Wed Nov 16 15:47:53 2016 -0600 > > PCI: Set Read Completion Boundary to 128 iff Root Port supports it > > Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB) > determines the naturally aligned address boundaries on which a Read Request > may be serviced with multiple Completions: > > - For a Root Complex, RCB is 64 bytes or 128 bytes > This value is reported in the Link Control Register > > Note: Bridges and Endpoints may implement a corresponding command bit > which may be set by system software to indicate the RCB value for the > Root Complex, allowing the Bridge/Endpoint to optimize its behavior > when the Root Complex’s RCB is 128 bytes. > > - For all other system elements, RCB is 128 bytes > > Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all > downstream devices must be clear, indicating an RCB of 64 bytes. If the > Root Port supports a 128-byte RCB, we may optionally set the RCB of > downstream devices so they know they can generate larger Completions. > > Some BIOSes supply an _HPX that tells us to set RCB, even though the Root > Port doesn't have RCB set, which may lead to Malformed TLP errors if the > Endpoint generates completions larger than the Root Port can handle. > > The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies > such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize: > > mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared) > mlx4_core 0000:41:00.0: device is going to be reset > mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting > mlx4_core 0000:41:00.0: Fail to reset HCA > ------------[ cut here ]------------ > kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193! > > After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") > and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices > with a link"), we apply _HPX settings to *all* devices, not just those > hot-added after boot. > > Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device > worked. After 7a1562d4f2d0, we set its RCB to 128, and it failed. > > Set the RCB to 128 iff the Root Port supports a 128-byte RCB. Otherwise, > set RCB to 64 bytes. This effectively ignores what _HPX tells us about > RCB. > > [bhelgaas: changelog, clear RCB if not set for Root Port] > Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") > Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781 > Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: stable@vger.kernel.org # v3.18+ > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index ab00267..104c46d 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp) > dev_warn(&dev->dev, "PCI-X settings not supported\n"); > } > > +static bool pcie_root_rcb_set(struct pci_dev *dev) > +{ > + struct pci_dev *rp = pcie_find_root_port(dev); > + u16 lnkctl; > + > + if (!rp) > + return false; > + > + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl); > + if (lnkctl & PCI_EXP_LNKCTL_RCB) > + return true; > + > + return false; > +} > + > static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) > { > int pos; > @@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) > ~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or); > > /* Initialize Link Control Register */ > - if (pcie_cap_has_lnkctl(dev)) > + if (pcie_cap_has_lnkctl(dev)) { > + > + /* > + * If the Root Port supports Read Completion Boundary of > + * 128, set RCB to 128. Otherwise, clear it. > + */ > + hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB; > + hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB; > + if (pcie_root_rcb_set(dev)) > + hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB; > + > pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL, > ~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or); > + } > > /* Find Advanced Error Reporting Enhanced Capability */ > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 23, 2016 at 11:31:52AM -0600, Bjorn Helgaas wrote: > On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote: > > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: > > > Hi Johannes, > > > > > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote: > > > > The Read Completion Boundary (RCB) bit must only be set on a device or > > > > endpoint if it is set on the root complex. > > ... > > > Here's the fixed one. > > I applied the one below to for-linus for v4.9. > > I did tweak the preceding patch to only share pcie_find_root_port() > inside drivers/pci -- moved the declaration from include/linux/pci.h > to drivers/pci/pci.h and moved the definition to drivers/pci/search.c. > > I don't think it needs to be visible to the whole kernel. Sigh, I guess it *does*, because the aer_inject user can be a module. I'll fix that. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ab00267..104c46d 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp) dev_warn(&dev->dev, "PCI-X settings not supported\n"); } +static bool pcie_root_rcb_set(struct pci_dev *dev) +{ + struct pci_dev *rp = pcie_find_root_port(dev); + u16 lnkctl; + + if (!rp) + return false; + + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl); + if (lnkctl & PCI_EXP_LNKCTL_RCB) + return true; + + return false; +} + static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) { int pos; @@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp) ~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or); /* Initialize Link Control Register */ - if (pcie_cap_has_lnkctl(dev)) + if (pcie_cap_has_lnkctl(dev)) { + + /* + * If the Root Port supports Read Completion Boundary of + * 128, set RCB to 128. Otherwise, clear it. + */ + hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB; + hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB; + if (pcie_root_rcb_set(dev)) + hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB; + pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL, ~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or); + } /* Find Advanced Error Reporting Enhanced Capability */ pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);