Message ID | 20140913040716.GA28080@google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Bjorn Helgaas <bhelgaas@google.com> writes: > I want to fix this regression before v3.17. Dirk, can you test the > following patch on top of v3.17-rc2? I'm hoping you can try this on your > test machine in conjunction with your acpi_pci_root_add() and > pci_scan_device() patches. If I understand correctly, you were able to > reproduce the FC adapter not showing up, and if you can verify that it does > show with those patches + this revert, I think that's good enough for now. I tried this patch on the test machine but after rebooting I lost remote access -- details will have to wait until tonight. Independent of the result of this test I planned to go to the office, this evening to also do this test on the VX50 and to also try Yinghai's suggestion to reset the PCIe link on it. I'd like to see if the behavior of the VX50 differs from that of the test machine. Probably obvious but did I undestand correctly that Yinghai's patches + echo 1 > /sys/bus/.../pcie_link_disable echo 0 > /sys/bus/.../pcie_link_disable is exactly identical to this? setpci -s ... 0xc0.b=0x18 setpci -s ... 0xc0.b=0x08 Please let me know if there is anything else you want me to test on the VX50. Dirk > I'm not committed to applying this yet, but I'd like to have a working fix > in my back pocket in case we don't come up with a better solution soon. > > Bjorn > > > commit 5945a8d28c416fc390a94c8e7fb8fd0a76f5d710 > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Fri Sep 12 21:58:19 2014 -0600 > > Revert "PCI: Make sure bus number resources stay within their parents bounds" > > This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay > within their parents bounds") because it breaks some systems with LSI Logic > FC949ES Fibre Channel Adapters, apparently by exposing a defect in those > adapters. > > Dirk tested a Tyan VX50 (B4985) with this device that worked like this > prior to 1820ffdccb9b: > > bus: [bus 00-7f] on node 0 link 1 > ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07]) > pci 0000:00:0e.0: PCI bridge to [bus 0a] > pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07]) > pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter) > > Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS > defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't > enforce that aperture, and the FC adapter worked fine at 0a:00.0. > > After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in > the root bridge's aperture, so we reconfigure it so it *is* contained: > > pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring > pci 0000:00:0e.0: PCI bridge to [bus 06-07] > > This effectively moves the FC device from 0a:00.0 to 07:00.0, which should > be legal. But when we enumerate bus 06, the FC device doesn't respond, so > we don't find anything. This is probably a defect in the FC device. > > Possible fixes (due to Yinghai): > > 1) Add a quirk to fix the _CRS information based on what amd_bus.c read > from the hardware > > 2) Reset the FC device after we change its bus number > > 3) Revert 1820ffdccb9b > > Fix 1 would be relatively easy, but it does sweep the LSI FC issue under > the rug. We might want to reconfigure bus numbers in the future for some > other reason, e.g., hotplug, and then we could trip over this again. > > For that reason, I like fix 2, but we don't know whether it actually works, > and we don't have a patch for it yet. > > This revert is fix 3, which also sweeps the LSI FC issue under the rug. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281 > Reported-by: Dirk Gouders <dirk@gouders.net> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: stable@vger.kernel.org # v3.15+ > CC: Yinghai Lu <yinghai@kernel.org> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index e3cf8a2e6292..f0badff77cff 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -775,7 +775,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > /* Check if setup is sensible at all */ > if (!pass && > (primary != bus->number || secondary <= bus->number || > - secondary > subordinate || subordinate > bus->busn_res.end)) { > + secondary > subordinate)) { > dev_info(&dev->dev, "bridge configuration invalid ([bus %02x-%02x]), reconfiguring\n", > secondary, subordinate); > broken = 1; > @@ -853,8 +853,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > child = pci_add_new_bus(bus, dev, max+1); > if (!child) > goto out; > - pci_bus_insert_busn_res(child, max+1, > - bus->busn_res.end); > + pci_bus_insert_busn_res(child, max+1, 0xff); > } > max++; > buses = (buses & 0xff000000) > @@ -913,11 +912,6 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > /* > * Set the subordinate bus number to its real value. > */ > - if (max > bus->busn_res.end) { > - dev_warn(&dev->dev, "max busn %02x is outside %pR\n", > - max, &bus->busn_res); > - max = bus->busn_res.end; > - } > pci_bus_update_busn_res_end(child, max); > pci_write_config_byte(dev, PCI_SUBORDINATE_BUS, max); > } -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 12, 2014 at 10:07:16PM -0600, Bjorn Helgaas wrote: > I want to fix this regression before v3.17. Dirk, can you test the > following patch on top of v3.17-rc2? I'm hoping you can try this on your > test machine in conjunction with your acpi_pci_root_add() and > pci_scan_device() patches. If I understand correctly, you were able to > reproduce the FC adapter not showing up, and if you can verify that it does > show with those patches + this revert, I think that's good enough for now. > > I'm not committed to applying this yet, but I'd like to have a working fix > in my back pocket in case we don't come up with a better solution soon. Since Dirk confirmed that the revert below avoids the problem for now, I applied it to my for-linus branch for v3.17. I don't think this is the right fix, but it will buy us some time to figure out a better fix after v3.17. Bjorn > commit 5945a8d28c416fc390a94c8e7fb8fd0a76f5d710 > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Fri Sep 12 21:58:19 2014 -0600 > > Revert "PCI: Make sure bus number resources stay within their parents bounds" > > This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay > within their parents bounds") because it breaks some systems with LSI Logic > FC949ES Fibre Channel Adapters, apparently by exposing a defect in those > adapters. > > Dirk tested a Tyan VX50 (B4985) with this device that worked like this > prior to 1820ffdccb9b: > > bus: [bus 00-7f] on node 0 link 1 > ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07]) > pci 0000:00:0e.0: PCI bridge to [bus 0a] > pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07]) > pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter) > > Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS > defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't > enforce that aperture, and the FC adapter worked fine at 0a:00.0. > > After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in > the root bridge's aperture, so we reconfigure it so it *is* contained: > > pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring > pci 0000:00:0e.0: PCI bridge to [bus 06-07] > > This effectively moves the FC device from 0a:00.0 to 07:00.0, which should > be legal. But when we enumerate bus 06, the FC device doesn't respond, so > we don't find anything. This is probably a defect in the FC device. > > Possible fixes (due to Yinghai): > > 1) Add a quirk to fix the _CRS information based on what amd_bus.c read > from the hardware > > 2) Reset the FC device after we change its bus number > > 3) Revert 1820ffdccb9b > > Fix 1 would be relatively easy, but it does sweep the LSI FC issue under > the rug. We might want to reconfigure bus numbers in the future for some > other reason, e.g., hotplug, and then we could trip over this again. > > For that reason, I like fix 2, but we don't know whether it actually works, > and we don't have a patch for it yet. > > This revert is fix 3, which also sweeps the LSI FC issue under the rug. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281 > Reported-by: Dirk Gouders <dirk@gouders.net> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: stable@vger.kernel.org # v3.15+ > CC: Yinghai Lu <yinghai@kernel.org> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index e3cf8a2e6292..f0badff77cff 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -775,7 +775,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > /* Check if setup is sensible at all */ > if (!pass && > (primary != bus->number || secondary <= bus->number || > - secondary > subordinate || subordinate > bus->busn_res.end)) { > + secondary > subordinate)) { > dev_info(&dev->dev, "bridge configuration invalid ([bus %02x-%02x]), reconfiguring\n", > secondary, subordinate); > broken = 1; > @@ -853,8 +853,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > child = pci_add_new_bus(bus, dev, max+1); > if (!child) > goto out; > - pci_bus_insert_busn_res(child, max+1, > - bus->busn_res.end); > + pci_bus_insert_busn_res(child, max+1, 0xff); > } > max++; > buses = (buses & 0xff000000) > @@ -913,11 +912,6 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) > /* > * Set the subordinate bus number to its real value. > */ > - if (max > bus->busn_res.end) { > - dev_warn(&dev->dev, "max busn %02x is outside %pR\n", > - max, &bus->busn_res); > - max = bus->busn_res.end; > - } > pci_bus_update_busn_res_end(child, max); > pci_write_config_byte(dev, PCI_SUBORDINATE_BUS, max); > } -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote: > So, I did some tests on the VX50 which probably wasn't the worst idea, > because it behaves different than the test machine. > > Summary: > > 1) Bjorn's back pocket patch works on the VX50. > > On the test machine it causes a trace, mount_root has to do with > it. I tried to use netconsole but it complained the interface were > not ready. OK, that's good. I put this revert patch in for-linus for v3.17. I regard this as a temporary fix, not the real solution. My guess is the test machine doesn't boot because you're missing a driver, so not related to the revert patch. > 3) Reset with Bjorn's commands > > DEV=00:0e.0 > setpci -s$DEV BRIDGE_CONTROL.W=0x0040 > sleep 1 > setpci -s$DEV BRIDGE_CONTROL.W=0x0000 > sleep 1 > echo 1 > /sys/bus/pci/rescan > > let the FC adapter appear but there are errors that I cannot really > interpret. > > 4) Reset with Yinghai's patches and > > echo 1 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable > echo 0 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable > echo 1 > /sys/bus/pci/rescan > > gives a similar resut to 3). Resetting the device or simply disabling and re-enabling the link was enough to fix things from the device's perspective. In both cases, it responded as one would expect: pci_scan_child_bus: pci_bus 0000:06: scanning bus pci 0000:06:00.0: [1000:0646] type 00 class 0x0c0400 pci 0000:06:00.0: reg 0x10: [io 0x0000-0x00ff] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00003fff 64bit] pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit] pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref] Linux tried to assign MMIO space to the device, but failed: pci 0000:06:00.0: BAR 6: assigned [mem 0xd4200000-0xd42fffff pref] pci 0000:06:00.0: BAR 3: no space for [mem size 0x00010000 64bit] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x00010000 64bit] pci 0000:06:00.0: BAR 1: no space for [mem size 0x00004000 64bit] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x00004000 64bit] The upstream bridge windows are: pci 0000:00:0e.0: PCI bridge to [bus 06] # was originally to bus 0a pci 0000:00:0e.0: bridge window [io 0x3000-0x3fff] pci 0000:00:0e.0: bridge window [mem 0xd4200000-0xd42fffff] So the ROM BAR (reg 0x30/BAR 6) takes up the whole window, leaving nothing for BARs 1 and 3. This is something that Linux could do better. For example, we could assign normal BARs first, followed by ROM BARs, since the normal ones are more important. It's possible we could also try to expand the bridge window so all the BARs would fit. In any case, resetting the device is not a simple fix all by itself. So our possibilities are: 1) Quirk to work around _CRS bug. This works but requires us to maintain CPU-specific code that I really don't want. 2) Reset device when changing bus number. This works from the device point of view, but would require additional Linux changes. 3) Revert 1820ffdccb9b. This works but is ugly because we're ignoring some of what _CRS tells us. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas <bhelgaas@google.com> writes: > On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote: >> So, I did some tests on the VX50 which probably wasn't the worst idea, >> because it behaves different than the test machine. >> >> Summary: >> >> 1) Bjorn's back pocket patch works on the VX50. >> >> On the test machine it causes a trace, mount_root has to do with >> it. I tried to use netconsole but it complained the interface were >> not ready. > > OK, that's good. I put this revert patch in for-linus for v3.17. I regard > this as a temporary fix, not the real solution. My guess is the test > machine doesn't boot because you're missing a driver, so not related to the > revert patch. I assumed my limit-host-bridge-aperture-and-ignore-bridges-patch on top of your patch caused this, so I took a closer look. Your patch works fine with current rc5+ on the test machine -- with and without my additional patch. rc2 and "make oldconfig" somehow caused that the root partition couldn't be mounted. With rc5+ everything is fine, again, without touching the configuration myself. Other various today's test results (VX50) will be appended to bugzilla in a few moments. Dirk >> 3) Reset with Bjorn's commands >> >> DEV=00:0e.0 >> setpci -s$DEV BRIDGE_CONTROL.W=0x0040 >> sleep 1 >> setpci -s$DEV BRIDGE_CONTROL.W=0x0000 >> sleep 1 >> echo 1 > /sys/bus/pci/rescan >> >> let the FC adapter appear but there are errors that I cannot really >> interpret. >> >> 4) Reset with Yinghai's patches and >> >> echo 1 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable >> echo 0 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable >> echo 1 > /sys/bus/pci/rescan >> >> gives a similar resut to 3). > > Resetting the device or simply disabling and re-enabling the link was > enough to fix things from the device's perspective. In both cases, it > responded as one would expect: > > pci_scan_child_bus: pci_bus 0000:06: scanning bus > pci 0000:06:00.0: [1000:0646] type 00 class 0x0c0400 > pci 0000:06:00.0: reg 0x10: [io 0x0000-0x00ff] > pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00003fff 64bit] > pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit] > pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref] > > Linux tried to assign MMIO space to the device, but failed: > > pci 0000:06:00.0: BAR 6: assigned [mem 0xd4200000-0xd42fffff pref] > pci 0000:06:00.0: BAR 3: no space for [mem size 0x00010000 64bit] > pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x00010000 64bit] > pci 0000:06:00.0: BAR 1: no space for [mem size 0x00004000 64bit] > pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x00004000 64bit] > > The upstream bridge windows are: > > pci 0000:00:0e.0: PCI bridge to [bus 06] # was originally to bus 0a > pci 0000:00:0e.0: bridge window [io 0x3000-0x3fff] > pci 0000:00:0e.0: bridge window [mem 0xd4200000-0xd42fffff] > > So the ROM BAR (reg 0x30/BAR 6) takes up the whole window, leaving nothing > for BARs 1 and 3. This is something that Linux could do better. For > example, we could assign normal BARs first, followed by ROM BARs, since the > normal ones are more important. It's possible we could also try to expand > the bridge window so all the BARs would fit. > > In any case, resetting the device is not a simple fix all by itself. So > our possibilities are: > > 1) Quirk to work around _CRS bug. This works but requires us to maintain > CPU-specific code that I really don't want. > > 2) Reset device when changing bus number. This works from the device > point of view, but would require additional Linux changes. > > 3) Revert 1820ffdccb9b. This works but is ugly because we're ignoring > some of what _CRS tells us. > > Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 20, 2014 at 12:41 PM, Dirk Gouders <dirk@gouders.net> wrote: > Bjorn Helgaas <bhelgaas@google.com> writes: > >> On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote: >>> So, I did some tests on the VX50 which probably wasn't the worst idea, >>> because it behaves different than the test machine. >>> >>> Summary: >>> >>> 1) Bjorn's back pocket patch works on the VX50. >>> >>> On the test machine it causes a trace, mount_root has to do with >>> it. I tried to use netconsole but it complained the interface were >>> not ready. >> >> OK, that's good. I put this revert patch in for-linus for v3.17. I regard >> this as a temporary fix, not the real solution. My guess is the test >> machine doesn't boot because you're missing a driver, so not related to the >> revert patch. > > I assumed my limit-host-bridge-aperture-and-ignore-bridges-patch on top > of your patch caused this, so I took a closer look. > > Your patch works fine with current rc5+ on the test machine -- with and > without my additional patch. Great, thanks for testing that! > Other various today's test results (VX50) will be appended to bugzilla > in a few moments. The Windows Server 2008 boot shows that Windows reconfigures the 00:0e.0 bridge so it fits inside the [bus 00-07] aperture reported by the host bridge _CRS, and the LSI FC adapter is not enumerated at all. That's basically the same behavior that prompted your bug report. This suggests that Windows does *not* reset the secondary bus when changing the bridge configuration. For v3.17, I reverted 1820ffdccb9b ("PCI: Make sure bus number resources stay within their parents bounds"). For the future, I think we should do a quirk to fix the _CRS, similar to what Andreas has posted, and apply 1820ffdccb9b again. But I think the quirk should be specific to this system and BIOS. I don't want to add a workaround that silently covers up Linux and BIOS bugs. The reason amd_bus.c exists is because Linux was not smart enough to pay attention to _CRS. Linux is now pretty good at that, so the reason for amd_bus.c is mostly gone. I don't want to add new dependencies on amd_bus.c that will prevent us from phasing it out. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 22, 2014 at 4:25 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Sat, Sep 20, 2014 at 12:41 PM, Dirk Gouders <dirk@gouders.net> wrote: >> Bjorn Helgaas <bhelgaas@google.com> writes: >> >>> On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote: >>>> So, I did some tests on the VX50 which probably wasn't the worst idea, >>>> because it behaves different than the test machine. >>>> >>>> Summary: >>>> >>>> 1) Bjorn's back pocket patch works on the VX50. >>>> >>>> On the test machine it causes a trace, mount_root has to do with >>>> it. I tried to use netconsole but it complained the interface were >>>> not ready. >>> >>> OK, that's good. I put this revert patch in for-linus for v3.17. I regard >>> this as a temporary fix, not the real solution. My guess is the test >>> machine doesn't boot because you're missing a driver, so not related to the >>> revert patch. >> >> I assumed my limit-host-bridge-aperture-and-ignore-bridges-patch on top >> of your patch caused this, so I took a closer look. >> >> Your patch works fine with current rc5+ on the test machine -- with and >> without my additional patch. > > Great, thanks for testing that! > >> Other various today's test results (VX50) will be appended to bugzilla >> in a few moments. > > The Windows Server 2008 boot shows that Windows reconfigures the > 00:0e.0 bridge so it fits inside the [bus 00-07] aperture reported by > the host bridge _CRS, and the LSI FC adapter is not enumerated at all. > That's basically the same behavior that prompted your bug report. > This suggests that Windows does *not* reset the secondary bus when > changing the bridge configuration. > > For v3.17, I reverted 1820ffdccb9b ("PCI: Make sure bus number > resources stay within their parents bounds"). For the future, I think > we should do a quirk to fix the _CRS, similar to what Andreas has > posted, and apply 1820ffdccb9b again. > > But I think the quirk should be specific to this system and BIOS. I > don't want to add a workaround that silently covers up Linux and BIOS > bugs. The reason amd_bus.c exists is because Linux was not smart > enough to pay attention to _CRS. Linux is now pretty good at that, so > the reason for amd_bus.c is mostly gone. I don't want to add new > dependencies on amd_bus.c that will prevent us from phasing it out. Why not always trust amd_bus over _CRS? Is there a scenario in which amd_bus is wrong? Are these methods (like _CRS) meant to set limits for us, or are they simply used to report the configuration decisions made by the BIOS? So if _CRS says that the window is [00-07] would it be ok for us to simply increase it (possibly after reprogramming the registers in amd_bus)? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 22, 2014 at 8:53 AM, Andreas Noever <andreas.noever@gmail.com> wrote: > On Mon, Sep 22, 2014 at 4:25 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> On Sat, Sep 20, 2014 at 12:41 PM, Dirk Gouders <dirk@gouders.net> wrote: >>> Bjorn Helgaas <bhelgaas@google.com> writes: >>> >>>> On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote: >>>>> So, I did some tests on the VX50 which probably wasn't the worst idea, >>>>> because it behaves different than the test machine. >>>>> >>>>> Summary: >>>>> >>>>> 1) Bjorn's back pocket patch works on the VX50. >>>>> >>>>> On the test machine it causes a trace, mount_root has to do with >>>>> it. I tried to use netconsole but it complained the interface were >>>>> not ready. >>>> >>>> OK, that's good. I put this revert patch in for-linus for v3.17. I regard >>>> this as a temporary fix, not the real solution. My guess is the test >>>> machine doesn't boot because you're missing a driver, so not related to the >>>> revert patch. >>> >>> I assumed my limit-host-bridge-aperture-and-ignore-bridges-patch on top >>> of your patch caused this, so I took a closer look. >>> >>> Your patch works fine with current rc5+ on the test machine -- with and >>> without my additional patch. >> >> Great, thanks for testing that! >> >>> Other various today's test results (VX50) will be appended to bugzilla >>> in a few moments. >> >> The Windows Server 2008 boot shows that Windows reconfigures the >> 00:0e.0 bridge so it fits inside the [bus 00-07] aperture reported by >> the host bridge _CRS, and the LSI FC adapter is not enumerated at all. >> That's basically the same behavior that prompted your bug report. >> This suggests that Windows does *not* reset the secondary bus when >> changing the bridge configuration. >> >> For v3.17, I reverted 1820ffdccb9b ("PCI: Make sure bus number >> resources stay within their parents bounds"). For the future, I think >> we should do a quirk to fix the _CRS, similar to what Andreas has >> posted, and apply 1820ffdccb9b again. >> >> But I think the quirk should be specific to this system and BIOS. I >> don't want to add a workaround that silently covers up Linux and BIOS >> bugs. The reason amd_bus.c exists is because Linux was not smart >> enough to pay attention to _CRS. Linux is now pretty good at that, so >> the reason for amd_bus.c is mostly gone. I don't want to add new >> dependencies on amd_bus.c that will prevent us from phasing it out. > Why not always trust amd_bus over _CRS? Is there a scenario in which > amd_bus is wrong? amd_bus.c requires ongoing maintenance to keep it working for new processors and topologies. The ACPI description of the platform is the one the OEM intended, and it's the one that is tested. There are cases where the ACPI description omits things that amd_bus.c would find, e.g., when the BIOS reserves hardware that it doesn't want the OS to touch. > Are these methods (like _CRS) meant to set limits for us, or are they > simply used to report the configuration decisions made by the BIOS? So > if _CRS says that the window is [00-07] would it be ok for us to > simply increase it (possibly after reprogramming the registers in > amd_bus)? _CRS tells us how a device is configured. _PRS tells us what other settings are possible. _SRS chooses other settings. If BIOS supplies _PRS and _SRS, we can change the settings. But I've never seen _PRS and _SRS for host bridges, and Linux doesn't support them for host bridges today. It would not be OK for us to use amd_bus.c to reprogram registers. The rest of ACPI assumes that the bridge is configured per _CRS, and it assumes that any changes are done via _SRS. For example, there could be AML that uses those assumptions. I don't think a hybrid ACPI + native solution is really viable. One-off bug workarounds are fine, but in the long run, I think we're better off if we work from the same system description that Windows does. Otherwise we'll continually trip over unexpected things. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index e3cf8a2e6292..f0badff77cff 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -775,7 +775,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) /* Check if setup is sensible at all */ if (!pass && (primary != bus->number || secondary <= bus->number || - secondary > subordinate || subordinate > bus->busn_res.end)) { + secondary > subordinate)) { dev_info(&dev->dev, "bridge configuration invalid ([bus %02x-%02x]), reconfiguring\n", secondary, subordinate); broken = 1; @@ -853,8 +853,7 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) child = pci_add_new_bus(bus, dev, max+1); if (!child) goto out; - pci_bus_insert_busn_res(child, max+1, - bus->busn_res.end); + pci_bus_insert_busn_res(child, max+1, 0xff); } max++; buses = (buses & 0xff000000) @@ -913,11 +912,6 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) /* * Set the subordinate bus number to its real value. */ - if (max > bus->busn_res.end) { - dev_warn(&dev->dev, "max busn %02x is outside %pR\n", - max, &bus->busn_res); - max = bus->busn_res.end; - } pci_bus_update_busn_res_end(child, max); pci_write_config_byte(dev, PCI_SUBORDINATE_BUS, max); }
I want to fix this regression before v3.17. Dirk, can you test the following patch on top of v3.17-rc2? I'm hoping you can try this on your test machine in conjunction with your acpi_pci_root_add() and pci_scan_device() patches. If I understand correctly, you were able to reproduce the FC adapter not showing up, and if you can verify that it does show with those patches + this revert, I think that's good enough for now. I'm not committed to applying this yet, but I'd like to have a working fix in my back pocket in case we don't come up with a better solution soon. Bjorn commit 5945a8d28c416fc390a94c8e7fb8fd0a76f5d710 Author: Bjorn Helgaas <bhelgaas@google.com> Date: Fri Sep 12 21:58:19 2014 -0600 Revert "PCI: Make sure bus number resources stay within their parents bounds" This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay within their parents bounds") because it breaks some systems with LSI Logic FC949ES Fibre Channel Adapters, apparently by exposing a defect in those adapters. Dirk tested a Tyan VX50 (B4985) with this device that worked like this prior to 1820ffdccb9b: bus: [bus 00-7f] on node 0 link 1 ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07]) pci 0000:00:0e.0: PCI bridge to [bus 0a] pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07]) pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter) Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't enforce that aperture, and the FC adapter worked fine at 0a:00.0. After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in the root bridge's aperture, so we reconfigure it so it *is* contained: pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring pci 0000:00:0e.0: PCI bridge to [bus 06-07] This effectively moves the FC device from 0a:00.0 to 07:00.0, which should be legal. But when we enumerate bus 06, the FC device doesn't respond, so we don't find anything. This is probably a defect in the FC device. Possible fixes (due to Yinghai): 1) Add a quirk to fix the _CRS information based on what amd_bus.c read from the hardware 2) Reset the FC device after we change its bus number 3) Revert 1820ffdccb9b Fix 1 would be relatively easy, but it does sweep the LSI FC issue under the rug. We might want to reconfigure bus numbers in the future for some other reason, e.g., hotplug, and then we could trip over this again. For that reason, I like fix 2, but we don't know whether it actually works, and we don't have a patch for it yet. This revert is fix 3, which also sweeps the LSI FC issue under the rug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281 Reported-by: Dirk Gouders <dirk@gouders.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.15+ CC: Yinghai Lu <yinghai@kernel.org> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html