Message ID | 20191030150545.19885-1-mika.westerberg@linux.intel.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Commit | 77adf9355304f8dcf09054280af5e23fc451ab3d |
Headers | show |
Series | ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge | expand |
On Wednesday, October 30, 2019 4:05:45 PM CET Mika Westerberg wrote: > Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / > PCI: Don't scan bridges managed by native hotplug") prevents some recent > LG and HP laptops from booting with endless loop of: > > [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) > [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) > [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) > ... > > What seems to happen is that during boot, after the initial PCI > enumeration when EC is enabled the platform triggers ACPI Notify() to > one of the root ports. The root port itself looks like this: > > [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] > [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] > [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] > > The BIOS has configured the root port so that it does not have I/O > bridge window. > > Now when the ACPI Notify() is triggered ACPI hotplug handler calls > acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system > is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() > to allocate resources. > > The device connected to the root port is a PCIe switch (Thunderbolt > controller) with two hotplug downstream ports. Because of the hotplug > ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 > bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k > as that's the minimum I/O window size so each hotplug port gets 4k I/O > window and the same happens for the root port (which is also hotplug > port). This means 3 * 4k = 12k I/O window. > > Because of this pci_assign_unassigned_bridge_resources() ends up opening > a I/O bridge window for the root port at first available I/O address > which seems to be in range 0x1000 - 0x3fff. Normally this range is used > for ACPI stuff such as GPE bits (below is part of /proc/ioports): > > 1800-1803 : ACPI PM1a_EVT_BLK > 1804-1805 : ACPI PM1a_CNT_BLK > 1808-180b : ACPI PM_TMR > 1810-1815 : ACPI CPU throttle > 1850-1850 : ACPI PM2_CNT_BLK > 1854-1857 : pnp 00:05 > 1860-187f : ACPI GPE0_BLK > > However, when the ACPI Notify() happened this range was not yet reserved > for ACPI/PNP (that happens later) so PCI gets it. It then starts writing > to this range and accidentally stomps over GPE bits among other things > causing the endless stream of messages about missing GPE handler. > > This problem does not happen if "pci=hpiosize=0" is passed in the kernel > command line. The reason is that then the kernel does not try to > allocate the additional 256 bytes for each hotplug port. > > Fix this by allocating resources directly below the non-hotplug bridges > where a new device may appear as a result of ACPI Notify(). This avoids > the hotplug bridges and prevents opening the additional I/O window. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 > Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") > Cc: stable@vger.kernel.org > Reported-by: Valerio Passini <passini.valerio@gmail.com> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> That should work: Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Bjorn, if you want me to take this, please let me know. > --- > I was able to reproduce this without access to the affected system by > forcing ACPI core to send Notify() to the TBT root port like this: > > void acpi_notify_rp(void) > { > struct acpi_device *adev; > acpi_handle handle; > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB.PCI0.RP17", &handle))) > return; > > if (acpi_bus_get_device(handle, &adev)) > return; > > dev_info(&adev->dev, "queueing hotplug\n"); > acpiphp_hotplug_notify(adev, ACPI_NOTIFY_BUS_CHECK); > } > > and calling it from acpi_init() directly after acpi_ec_init(). > > drivers/pci/hotplug/acpiphp_glue.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index e4c46637f32f..b3869951c0eb 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -449,8 +449,15 @@ static void acpiphp_native_scan_bridge(struct pci_dev *bridge) > > /* Scan non-hotplug bridges that need to be reconfigured */ > for_each_pci_bridge(dev, bus) { > - if (!hotplug_is_native(dev)) > - max = pci_scan_bridge(bus, dev, max, 1); > + if (hotplug_is_native(dev)) > + continue; > + > + max = pci_scan_bridge(bus, dev, max, 1); > + if (dev->subordinate) { > + pcibios_resource_survey_bus(dev->subordinate); > + pci_bus_size_bridges(dev->subordinate); > + pci_bus_assign_resources(dev->subordinate); > + } > } > } > > @@ -480,7 +487,6 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (PCI_SLOT(dev->devfn) == slot->device) > acpiphp_native_scan_bridge(dev); > } > - pci_assign_unassigned_bridge_resources(bus->self); > } else { > LIST_HEAD(add_list); > int max, pass; >
[+cc Ben] On Wed, Oct 30, 2019 at 06:05:45PM +0300, Mika Westerberg wrote: > Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / > PCI: Don't scan bridges managed by native hotplug") prevents some recent > LG and HP laptops from booting with endless loop of: > > [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) > [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) > [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) > ... > > What seems to happen is that during boot, after the initial PCI > enumeration when EC is enabled the platform triggers ACPI Notify() to > one of the root ports. The root port itself looks like this: > > [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] > [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] > [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] > > The BIOS has configured the root port so that it does not have I/O > bridge window. > > Now when the ACPI Notify() is triggered ACPI hotplug handler calls > acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system > is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() > to allocate resources. > > The device connected to the root port is a PCIe switch (Thunderbolt > controller) with two hotplug downstream ports. Because of the hotplug > ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 > bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k > as that's the minimum I/O window size so each hotplug port gets 4k I/O > window and the same happens for the root port (which is also hotplug > port). This means 3 * 4k = 12k I/O window. > > Because of this pci_assign_unassigned_bridge_resources() ends up opening > a I/O bridge window for the root port at first available I/O address > which seems to be in range 0x1000 - 0x3fff. Normally this range is used > for ACPI stuff such as GPE bits (below is part of /proc/ioports): > > 1800-1803 : ACPI PM1a_EVT_BLK > 1804-1805 : ACPI PM1a_CNT_BLK > 1808-180b : ACPI PM_TMR > 1810-1815 : ACPI CPU throttle > 1850-1850 : ACPI PM2_CNT_BLK > 1854-1857 : pnp 00:05 > 1860-187f : ACPI GPE0_BLK Nice debugging work! > However, when the ACPI Notify() happened this range was not yet reserved > for ACPI/PNP (that happens later) so PCI gets it. I think this is really the underlying problem. IMO those ACPI/PNP resources ought to be reserved before we start assigning resources to PCI devices. That would be a huge problem to fix, though. > It then starts writing > to this range and accidentally stomps over GPE bits among other things > causing the endless stream of messages about missing GPE handler. > > This problem does not happen if "pci=hpiosize=0" is passed in the kernel > command line. The reason is that then the kernel does not try to > allocate the additional 256 bytes for each hotplug port. > > Fix this by allocating resources directly below the non-hotplug bridges > where a new device may appear as a result of ACPI Notify(). This avoids > the hotplug bridges and prevents opening the additional I/O window. Looking at [1], here's the topology I see: 00:1b.0: Root Port to [bus 02-3a] # pciehp 02:00.0: Switch Upstream Port to [bus 03-3a] # thunderbolt 03:00.0: Switch Downstream Port to [bus 04] # thunderbolt 03:01.0: Switch Downstream Port to [bus 05-39] # thunderbolt, pciehp 03:02.0: Switch Downstream Port to [bus 3a] # thunderbolt 04:00.0: reg 0x10: [mem 0xda000000-0xda03ffff] 04:00.0: reg 0x14: [mem 0xda040000-0xda040fff] 3a:00.0: reg 0x10: [mem 0xd9f00000-0xd9f0ffff] In this topology, I guess the non-hotplug bridges would be 03:00.0 and 03:02.0? And we skip 03:01.0 because hotplug_is_native() is true for that? What would happen if a device below one of the non-hotplug bridges, e.g., 3a:00.0, had an I/O BAR? Would this patch still work? [1] https://bugzilla.kernel.org/show_bug.cgi?id=203617#c49 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 > Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") > Cc: stable@vger.kernel.org > Reported-by: Valerio Passini <passini.valerio@gmail.com> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> > --- > I was able to reproduce this without access to the affected system by > forcing ACPI core to send Notify() to the TBT root port like this: > > void acpi_notify_rp(void) > { > struct acpi_device *adev; > acpi_handle handle; > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB.PCI0.RP17", &handle))) > return; > > if (acpi_bus_get_device(handle, &adev)) > return; > > dev_info(&adev->dev, "queueing hotplug\n"); > acpiphp_hotplug_notify(adev, ACPI_NOTIFY_BUS_CHECK); > } > > and calling it from acpi_init() directly after acpi_ec_init(). > > drivers/pci/hotplug/acpiphp_glue.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index e4c46637f32f..b3869951c0eb 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -449,8 +449,15 @@ static void acpiphp_native_scan_bridge(struct pci_dev *bridge) > > /* Scan non-hotplug bridges that need to be reconfigured */ > for_each_pci_bridge(dev, bus) { > - if (!hotplug_is_native(dev)) > - max = pci_scan_bridge(bus, dev, max, 1); > + if (hotplug_is_native(dev)) > + continue; > + > + max = pci_scan_bridge(bus, dev, max, 1); > + if (dev->subordinate) { > + pcibios_resource_survey_bus(dev->subordinate); > + pci_bus_size_bridges(dev->subordinate); > + pci_bus_assign_resources(dev->subordinate); > + } > } > } > > @@ -480,7 +487,6 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (PCI_SLOT(dev->devfn) == slot->device) > acpiphp_native_scan_bridge(dev); > } > - pci_assign_unassigned_bridge_resources(bus->self); > } else { > LIST_HEAD(add_list); > int max, pass; > -- > 2.23.0 >
On Wed, 2019-11-06 at 17:24 -0600, Bjorn Helgaas wrote:
> [+cc Ben]
Note: I'm travelling this week, I'll try to have a look in the next few
days but can't promise.
Cheers,
Ben.
On Wed, Nov 06, 2019 at 05:24:05PM -0600, Bjorn Helgaas wrote: > [+cc Ben] > > On Wed, Oct 30, 2019 at 06:05:45PM +0300, Mika Westerberg wrote: > > Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / > > PCI: Don't scan bridges managed by native hotplug") prevents some recent > > LG and HP laptops from booting with endless loop of: > > > > [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) > > [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) > > [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) > > ... > > > > What seems to happen is that during boot, after the initial PCI > > enumeration when EC is enabled the platform triggers ACPI Notify() to > > one of the root ports. The root port itself looks like this: > > > > [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] > > [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] > > [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] > > > > The BIOS has configured the root port so that it does not have I/O > > bridge window. > > > > Now when the ACPI Notify() is triggered ACPI hotplug handler calls > > acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system > > is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() > > to allocate resources. > > > > The device connected to the root port is a PCIe switch (Thunderbolt > > controller) with two hotplug downstream ports. Because of the hotplug > > ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 > > bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k > > as that's the minimum I/O window size so each hotplug port gets 4k I/O > > window and the same happens for the root port (which is also hotplug > > port). This means 3 * 4k = 12k I/O window. > > > > Because of this pci_assign_unassigned_bridge_resources() ends up opening > > a I/O bridge window for the root port at first available I/O address > > which seems to be in range 0x1000 - 0x3fff. Normally this range is used > > for ACPI stuff such as GPE bits (below is part of /proc/ioports): > > > > 1800-1803 : ACPI PM1a_EVT_BLK > > 1804-1805 : ACPI PM1a_CNT_BLK > > 1808-180b : ACPI PM_TMR > > 1810-1815 : ACPI CPU throttle > > 1850-1850 : ACPI PM2_CNT_BLK > > 1854-1857 : pnp 00:05 > > 1860-187f : ACPI GPE0_BLK > > Nice debugging work! > > > However, when the ACPI Notify() happened this range was not yet reserved > > for ACPI/PNP (that happens later) so PCI gets it. > > I think this is really the underlying problem. IMO those ACPI/PNP > resources ought to be reserved before we start assigning resources to > PCI devices. That would be a huge problem to fix, though. I agree, changing the order would probably open another can of worms. > > It then starts writing > > to this range and accidentally stomps over GPE bits among other things > > causing the endless stream of messages about missing GPE handler. > > > > This problem does not happen if "pci=hpiosize=0" is passed in the kernel > > command line. The reason is that then the kernel does not try to > > allocate the additional 256 bytes for each hotplug port. > > > > Fix this by allocating resources directly below the non-hotplug bridges > > where a new device may appear as a result of ACPI Notify(). This avoids > > the hotplug bridges and prevents opening the additional I/O window. > > Looking at [1], here's the topology I see: > > 00:1b.0: Root Port to [bus 02-3a] # pciehp > 02:00.0: Switch Upstream Port to [bus 03-3a] # thunderbolt > 03:00.0: Switch Downstream Port to [bus 04] # thunderbolt > 03:01.0: Switch Downstream Port to [bus 05-39] # thunderbolt, pciehp > 03:02.0: Switch Downstream Port to [bus 3a] # thunderbolt > 04:00.0: reg 0x10: [mem 0xda000000-0xda03ffff] > 04:00.0: reg 0x14: [mem 0xda040000-0xda040fff] > 3a:00.0: reg 0x10: [mem 0xd9f00000-0xd9f0ffff] > > In this topology, I guess the non-hotplug bridges would be 03:00.0 > and 03:02.0? And we skip 03:01.0 because hotplug_is_native() is true > for that? Yes. > What would happen if a device below one of the non-hotplug bridges, > e.g., 3a:00.0, had an I/O BAR? Would this patch still work? I think it would still work because now we call pci_bus_size_bridges() only for non-hotplug bridge which do not have I/O window open so pbus_size_io() fails to find the "free" I/O resource on that bus and the kernel then fails to assign that I/O resource for the device. This is based on reading the code - I cannot verify this on real hardware, though.
On Thu, Nov 07, 2019 at 11:03:33AM +0200, Mika Westerberg wrote: > On Wed, Nov 06, 2019 at 05:24:05PM -0600, Bjorn Helgaas wrote: > > On Wed, Oct 30, 2019 at 06:05:45PM +0300, Mika Westerberg wrote: > > > Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / > > > PCI: Don't scan bridges managed by native hotplug") prevents some recent > > > LG and HP laptops from booting with endless loop of: > > > > > > [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) > > > [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) > > > [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) > > > ... > > > > > > What seems to happen is that during boot, after the initial PCI > > > enumeration when EC is enabled the platform triggers ACPI Notify() to > > > one of the root ports. The root port itself looks like this: > > > > > > [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] > > > [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] > > > [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] > > > > > > The BIOS has configured the root port so that it does not have I/O > > > bridge window. > > > > > > Now when the ACPI Notify() is triggered ACPI hotplug handler calls > > > acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system > > > is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() > > > to allocate resources. > > > > > > The device connected to the root port is a PCIe switch (Thunderbolt > > > controller) with two hotplug downstream ports. Because of the hotplug > > > ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 > > > bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k > > > as that's the minimum I/O window size so each hotplug port gets 4k I/O > > > window and the same happens for the root port (which is also hotplug > > > port). This means 3 * 4k = 12k I/O window. > > > > > > Because of this pci_assign_unassigned_bridge_resources() ends up opening > > > a I/O bridge window for the root port at first available I/O address > > > which seems to be in range 0x1000 - 0x3fff. Normally this range is used > > > for ACPI stuff such as GPE bits (below is part of /proc/ioports): > > > > > > 1800-1803 : ACPI PM1a_EVT_BLK > > > 1804-1805 : ACPI PM1a_CNT_BLK > > > 1808-180b : ACPI PM_TMR > > > 1810-1815 : ACPI CPU throttle > > > 1850-1850 : ACPI PM2_CNT_BLK > > > 1854-1857 : pnp 00:05 > > > 1860-187f : ACPI GPE0_BLK > > > > Nice debugging work! > > > > > However, when the ACPI Notify() happened this range was not yet reserved > > > for ACPI/PNP (that happens later) so PCI gets it. > > > > I think this is really the underlying problem. IMO those ACPI/PNP > > resources ought to be reserved before we start assigning resources to > > PCI devices. That would be a huge problem to fix, though. > > I agree, changing the order would probably open another can of worms. Just for the record, I think it's even worse than just reserving them in the wrong order. IIRC, we don't reserve ACPI/PNP resources *at all* except for PNP0C01 and PNP0C02 (drivers/pnp/system.c) and whatever individual drivers reserve. This is like pretending that devices don't respond to their address space until a driver claims them. But I'm not suggesting opening this can of worms for *this* problem. > > > It then starts writing > > > to this range and accidentally stomps over GPE bits among other things > > > causing the endless stream of messages about missing GPE handler. > > > > > > This problem does not happen if "pci=hpiosize=0" is passed in the kernel > > > command line. The reason is that then the kernel does not try to > > > allocate the additional 256 bytes for each hotplug port. > > > > > > Fix this by allocating resources directly below the non-hotplug bridges > > > where a new device may appear as a result of ACPI Notify(). This avoids > > > the hotplug bridges and prevents opening the additional I/O window. > > > > Looking at [1], here's the topology I see: > > > > 00:1b.0: Root Port to [bus 02-3a] # pciehp > > 02:00.0: Switch Upstream Port to [bus 03-3a] # thunderbolt > > 03:00.0: Switch Downstream Port to [bus 04] # thunderbolt > > 03:01.0: Switch Downstream Port to [bus 05-39] # thunderbolt, pciehp > > 03:02.0: Switch Downstream Port to [bus 3a] # thunderbolt > > 04:00.0: reg 0x10: [mem 0xda000000-0xda03ffff] > > 04:00.0: reg 0x14: [mem 0xda040000-0xda040fff] > > 3a:00.0: reg 0x10: [mem 0xd9f00000-0xd9f0ffff] > > > > In this topology, I guess the non-hotplug bridges would be 03:00.0 > > and 03:02.0? And we skip 03:01.0 because hotplug_is_native() is true > > for that? > > Yes. > > > What would happen if a device below one of the non-hotplug bridges, > > e.g., 3a:00.0, had an I/O BAR? Would this patch still work? > > I think it would still work because now we call pci_bus_size_bridges() > only for non-hotplug bridge which do not have I/O window open so > pbus_size_io() fails to find the "free" I/O resource on that bus and the > kernel then fails to assign that I/O resource for the device. Not sure I understand; are you saying that we wouldn't have the EC/GPE issue, but we'd be unable to use a device below 3a:00.0 that happened to have an I/O BAR? That doesn't sound optimal because there is I/O space available that could be routed to 3a:00.0 Bjorn
On Thu, Nov 07, 2019 at 07:52:46AM -0600, Bjorn Helgaas wrote: > > > What would happen if a device below one of the non-hotplug bridges, > > > e.g., 3a:00.0, had an I/O BAR? Would this patch still work? > > > > I think it would still work because now we call pci_bus_size_bridges() > > only for non-hotplug bridge which do not have I/O window open so > > pbus_size_io() fails to find the "free" I/O resource on that bus and the > > kernel then fails to assign that I/O resource for the device. > > Not sure I understand; are you saying that we wouldn't have the EC/GPE > issue, but we'd be unable to use a device below 3a:00.0 that happened > to have an I/O BAR? Yes. > That doesn't sound optimal because there is I/O space available that > could be routed to 3a:00.0 If the none of the upstream bridges up to the PCIe root port does not have I/O window open, I don't think we can do much about it. Unless I'm missing something of course.
On Thu, Nov 07, 2019 at 04:02:59PM +0200, Mika Westerberg wrote: > On Thu, Nov 07, 2019 at 07:52:46AM -0600, Bjorn Helgaas wrote: > > > > What would happen if a device below one of the non-hotplug > > > > bridges, e.g., 3a:00.0, had an I/O BAR? Would this patch > > > > still work? > > > > > > I think it would still work because now we call > > > pci_bus_size_bridges() only for non-hotplug bridge which do not > > > have I/O window open so pbus_size_io() fails to find the "free" > > > I/O resource on that bus and the kernel then fails to assign > > > that I/O resource for the device. > > > > Not sure I understand; are you saying that we wouldn't have the > > EC/GPE issue, but we'd be unable to use a device below 3a:00.0 > > that happened to have an I/O BAR? > > Yes. > > > That doesn't sound optimal because there is I/O space available > > that could be routed to 3a:00.0 > > If the none of the upstream bridges up to the PCIe root port does > not have I/O window open, I don't think we can do much about it. > Unless I'm missing something of course. The path to this hypothetical 3a:00.0 device is: PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] 00:1b.0: Root Port to [bus 02-3a] 02:00.0: Switch Upstream Port to [bus 03-3a] 03:02.0: Switch Downstream Port to [bus 3a] 3a:00.0: reg 0x10: [io 0x????-0x????] None of the bridges (00:1b.0, 02:00.0, 03:02.0) currently has an open I/O window, but there's space available on bus 00 and windows *could* be opened. I guess it comes down to that ordering problem: this Notify() and acpiphp_native_scan_bridge() happens before pnp/system.c reserves things, so we don't yet know what space is actually available. If firmware had configured I/O windows for these bridges, 3a:00.0 would probably work. But it doesn't seem right that we would depend on that firmware configuration. Bjorn
On Thu, Nov 07, 2019 at 08:40:55AM -0600, Bjorn Helgaas wrote: > On Thu, Nov 07, 2019 at 04:02:59PM +0200, Mika Westerberg wrote: > > On Thu, Nov 07, 2019 at 07:52:46AM -0600, Bjorn Helgaas wrote: > > > > > What would happen if a device below one of the non-hotplug > > > > > bridges, e.g., 3a:00.0, had an I/O BAR? Would this patch > > > > > still work? > > > > > > > > I think it would still work because now we call > > > > pci_bus_size_bridges() only for non-hotplug bridge which do not > > > > have I/O window open so pbus_size_io() fails to find the "free" > > > > I/O resource on that bus and the kernel then fails to assign > > > > that I/O resource for the device. > > > > > > Not sure I understand; are you saying that we wouldn't have the > > > EC/GPE issue, but we'd be unable to use a device below 3a:00.0 > > > that happened to have an I/O BAR? > > > > Yes. > > > > > That doesn't sound optimal because there is I/O space available > > > that could be routed to 3a:00.0 > > > > If the none of the upstream bridges up to the PCIe root port does > > not have I/O window open, I don't think we can do much about it. > > Unless I'm missing something of course. > > The path to this hypothetical 3a:00.0 device is: > > PCI host bridge to bus 0000:00 > pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] > pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] > 00:1b.0: Root Port to [bus 02-3a] > 02:00.0: Switch Upstream Port to [bus 03-3a] > 03:02.0: Switch Downstream Port to [bus 3a] > 3a:00.0: reg 0x10: [io 0x????-0x????] > > None of the bridges (00:1b.0, 02:00.0, 03:02.0) currently has an open > I/O window, but there's space available on bus 00 and windows *could* > be opened. Right. > I guess it comes down to that ordering problem: this Notify() and > acpiphp_native_scan_bridge() happens before pnp/system.c reserves > things, so we don't yet know what space is actually available. Yes, exactly. The some parts of the range 0x0d00-0xffff belongs to PNP/ACPI so if we open the window now it might stomp over some crucial ACPI resources. > If firmware had configured I/O windows for these bridges, 3a:00.0 > would probably work. But it doesn't seem right that we would depend > on that firmware configuration. Well, if firmware has configured the topology in such way I think we should at least try to trust it is intentional. Even if the (PCIe) device might have an I/O BAR it should work without it (only legacy endpoints are allowed to create I/O requests, althought many non-legacy seem to include I/O BAR).
On Wed, Oct 30, 2019 at 06:05:45PM +0300, Mika Westerberg wrote: > Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / > PCI: Don't scan bridges managed by native hotplug") prevents some recent > LG and HP laptops from booting with endless loop of: > > [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) > [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) > [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) > ... > > What seems to happen is that during boot, after the initial PCI > enumeration when EC is enabled the platform triggers ACPI Notify() to > one of the root ports. The root port itself looks like this: > > [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] > [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] > [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] > > The BIOS has configured the root port so that it does not have I/O > bridge window. > > Now when the ACPI Notify() is triggered ACPI hotplug handler calls > acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system > is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() > to allocate resources. > > The device connected to the root port is a PCIe switch (Thunderbolt > controller) with two hotplug downstream ports. Because of the hotplug > ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 > bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k > as that's the minimum I/O window size so each hotplug port gets 4k I/O > window and the same happens for the root port (which is also hotplug > port). This means 3 * 4k = 12k I/O window. > > Because of this pci_assign_unassigned_bridge_resources() ends up opening > a I/O bridge window for the root port at first available I/O address > which seems to be in range 0x1000 - 0x3fff. Normally this range is used > for ACPI stuff such as GPE bits (below is part of /proc/ioports): > > 1800-1803 : ACPI PM1a_EVT_BLK > 1804-1805 : ACPI PM1a_CNT_BLK > 1808-180b : ACPI PM_TMR > 1810-1815 : ACPI CPU throttle > 1850-1850 : ACPI PM2_CNT_BLK > 1854-1857 : pnp 00:05 > 1860-187f : ACPI GPE0_BLK > > However, when the ACPI Notify() happened this range was not yet reserved > for ACPI/PNP (that happens later) so PCI gets it. It then starts writing > to this range and accidentally stomps over GPE bits among other things > causing the endless stream of messages about missing GPE handler. > > This problem does not happen if "pci=hpiosize=0" is passed in the kernel > command line. The reason is that then the kernel does not try to > allocate the additional 256 bytes for each hotplug port. > > Fix this by allocating resources directly below the non-hotplug bridges > where a new device may appear as a result of ACPI Notify(). This avoids > the hotplug bridges and prevents opening the additional I/O window. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 > Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") > Cc: stable@vger.kernel.org > Reported-by: Valerio Passini <passini.valerio@gmail.com> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Applied with Rafael's reviewed-by to pci/hotplug for v5.5, thanks! > --- > I was able to reproduce this without access to the affected system by > forcing ACPI core to send Notify() to the TBT root port like this: > > void acpi_notify_rp(void) > { > struct acpi_device *adev; > acpi_handle handle; > > if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB.PCI0.RP17", &handle))) > return; > > if (acpi_bus_get_device(handle, &adev)) > return; > > dev_info(&adev->dev, "queueing hotplug\n"); > acpiphp_hotplug_notify(adev, ACPI_NOTIFY_BUS_CHECK); > } > > and calling it from acpi_init() directly after acpi_ec_init(). > > drivers/pci/hotplug/acpiphp_glue.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index e4c46637f32f..b3869951c0eb 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -449,8 +449,15 @@ static void acpiphp_native_scan_bridge(struct pci_dev *bridge) > > /* Scan non-hotplug bridges that need to be reconfigured */ > for_each_pci_bridge(dev, bus) { > - if (!hotplug_is_native(dev)) > - max = pci_scan_bridge(bus, dev, max, 1); > + if (hotplug_is_native(dev)) > + continue; > + > + max = pci_scan_bridge(bus, dev, max, 1); > + if (dev->subordinate) { > + pcibios_resource_survey_bus(dev->subordinate); > + pci_bus_size_bridges(dev->subordinate); > + pci_bus_assign_resources(dev->subordinate); > + } > } > } > > @@ -480,7 +487,6 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > if (PCI_SLOT(dev->devfn) == slot->device) > acpiphp_native_scan_bridge(dev); > } > - pci_assign_unassigned_bridge_resources(bus->self); > } else { > LIST_HEAD(add_list); > int max, pass; > -- > 2.23.0 >
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index e4c46637f32f..b3869951c0eb 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -449,8 +449,15 @@ static void acpiphp_native_scan_bridge(struct pci_dev *bridge) /* Scan non-hotplug bridges that need to be reconfigured */ for_each_pci_bridge(dev, bus) { - if (!hotplug_is_native(dev)) - max = pci_scan_bridge(bus, dev, max, 1); + if (hotplug_is_native(dev)) + continue; + + max = pci_scan_bridge(bus, dev, max, 1); + if (dev->subordinate) { + pcibios_resource_survey_bus(dev->subordinate); + pci_bus_size_bridges(dev->subordinate); + pci_bus_assign_resources(dev->subordinate); + } } } @@ -480,7 +487,6 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) if (PCI_SLOT(dev->devfn) == slot->device) acpiphp_native_scan_bridge(dev); } - pci_assign_unassigned_bridge_resources(bus->self); } else { LIST_HEAD(add_list); int max, pass;
Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") prevents some recent LG and HP laptops from booting with endless loop of: [ 26.237796] ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) [ 26.238699] ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) [ 26.239306] ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) ... What seems to happen is that during boot, after the initial PCI enumeration when EC is enabled the platform triggers ACPI Notify() to one of the root ports. The root port itself looks like this: [ 0.723757] pci 0000:00:1b.0: PCI bridge to [bus 02-3a] [ 0.723765] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] [ 0.723773] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] The BIOS has configured the root port so that it does not have I/O bridge window. Now when the ACPI Notify() is triggered ACPI hotplug handler calls acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() to allocate resources. The device connected to the root port is a PCIe switch (Thunderbolt controller) with two hotplug downstream ports. Because of the hotplug ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k as that's the minimum I/O window size so each hotplug port gets 4k I/O window and the same happens for the root port (which is also hotplug port). This means 3 * 4k = 12k I/O window. Because of this pci_assign_unassigned_bridge_resources() ends up opening a I/O bridge window for the root port at first available I/O address which seems to be in range 0x1000 - 0x3fff. Normally this range is used for ACPI stuff such as GPE bits (below is part of /proc/ioports): 1800-1803 : ACPI PM1a_EVT_BLK 1804-1805 : ACPI PM1a_CNT_BLK 1808-180b : ACPI PM_TMR 1810-1815 : ACPI CPU throttle 1850-1850 : ACPI PM2_CNT_BLK 1854-1857 : pnp 00:05 1860-187f : ACPI GPE0_BLK However, when the ACPI Notify() happened this range was not yet reserved for ACPI/PNP (that happens later) so PCI gets it. It then starts writing to this range and accidentally stomps over GPE bits among other things causing the endless stream of messages about missing GPE handler. This problem does not happen if "pci=hpiosize=0" is passed in the kernel command line. The reason is that then the kernel does not try to allocate the additional 256 bytes for each hotplug port. Fix this by allocating resources directly below the non-hotplug bridges where a new device may appear as a result of ACPI Notify(). This avoids the hotplug bridges and prevents opening the additional I/O window. Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") Cc: stable@vger.kernel.org Reported-by: Valerio Passini <passini.valerio@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> --- I was able to reproduce this without access to the affected system by forcing ACPI core to send Notify() to the TBT root port like this: void acpi_notify_rp(void) { struct acpi_device *adev; acpi_handle handle; if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB.PCI0.RP17", &handle))) return; if (acpi_bus_get_device(handle, &adev)) return; dev_info(&adev->dev, "queueing hotplug\n"); acpiphp_hotplug_notify(adev, ACPI_NOTIFY_BUS_CHECK); } and calling it from acpi_init() directly after acpi_ec_init(). drivers/pci/hotplug/acpiphp_glue.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)