Message ID | 20190622210310.180905-3-helgaas@kernel.org (mailing list archive) |
---|---|
State | Mainlined, archived |
Commit | 6a381ea694c9da31ba8741c42a7f1b206c156841 |
Headers | show |
Series | PCI: Simplify pci_bus_distribute_available_resources() | expand |
On Sat, Jun 22, 2019 at 04:03:11PM -0500, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the > loop that divides the remaining resources among hotplug-capable bridges > does nothing. > > Check for "hotplug_bridges == 0" earlier, so we don't even have to compute > the amount of remaining resources. No functional change intended. > > --- > > I'm pretty sure this patch preserves the previous behavior of > pci_bus_distribute_available_resources(), but I'm not sure that > behavior is what we want. > > For example, in the following topology, when we process bus 10, we > find two non-hotplug bridges and no hotplug bridges, so IIUC we return > without distributing any resources to them. But I would think we > should try to give 10:1c.0 more space if possible because it has a > hotplug bridge below it. > > 00:1c.0: hotplug bridge to [bus 10-2f] > 10:1c.0: non-hotplug bridge to [bus 11-2e] > 11:00.0: hotplug bridge to [bus 12-2e] > 10:1c.1: non-hotplug bridge to [bus 2f] Yes, I agree in this case we want to preserve more space for 10:1c.0. For this patch, Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
On 2019-06-22 3:03 p.m., Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the > loop that divides the remaining resources among hotplug-capable bridges > does nothing. > > Check for "hotplug_bridges == 0" earlier, so we don't even have to compute > the amount of remaining resources. No functional change intended. Makes sense. Reviewed-by: Logan Gunthorpe <logang@deltatee.com> > --- > > I'm pretty sure this patch preserves the previous behavior of > pci_bus_distribute_available_resources(), but I'm not sure that > behavior is what we want. > > For example, in the following topology, when we process bus 10, we > find two non-hotplug bridges and no hotplug bridges, so IIUC we return > without distributing any resources to them. But I would think we > should try to give 10:1c.0 more space if possible because it has a > hotplug bridge below it. > > 00:1c.0: hotplug bridge to [bus 10-2f] > 10:1c.0: non-hotplug bridge to [bus 11-2e] > 11:00.0: hotplug bridge to [bus 12-2e] > 10:1c.1: non-hotplug bridge to [bus 2f] > --- > drivers/pci/setup-bus.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index af28af898e42..04adeebe8866 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -1887,6 +1887,9 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > return; > } > > + if (hotplug_bridges == 0) > + return; > + > /* > * Calculate the total amount of extra resource space we can > * pass to bridges below this one. This is basically the > @@ -1936,8 +1939,6 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > * Distribute available extra resources equally between > * hotplug-capable downstream ports taking alignment into > * account. > - * > - * Here hotplug_bridges is always != 0. > */ > align = pci_resource_alignment(bridge, io_res); > io = div64_ul(available_io, hotplug_bridges); >
On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote: > > > > I'm pretty sure this patch preserves the previous behavior of > > pci_bus_distribute_available_resources(), but I'm not sure that > > behavior is what we want. > > > > For example, in the following topology, when we process bus 10, we > > find two non-hotplug bridges and no hotplug bridges, so IIUC we > > return > > without distributing any resources to them. But I would think we > > should try to give 10:1c.0 more space if possible because it has a > > hotplug bridge below it. > > > > 00:1c.0: hotplug bridge to [bus 10-2f] > > 10:1c.0: non-hotplug bridge to [bus 11-2e] > > 11:00.0: hotplug bridge to [bus 12-2e] > > 10:1c.1: non-hotplug bridge to [bus 2f] > > Yes, I agree in this case we want to preserve more space for 10:1c.0. I sitll can't make sense of any of this stuff though. We only every distribute resources when using pci_assign_unassigned_bridge_resources which we only use some cases, and it's completely non obvious why we would use it there and not in other places. We also don't distribute during the initial root survey meaning afaik that we get toast for any hotplug bridge that has stuff already there at boot. Also, distributing the "available" space means we leave nothing for potential SR-IOV siblings... have we ended up bloting the very PCIe- centric assumption that it's "unlikely" that a hotplug bridge has an SR-IOV sibling ? This is all a terrible mess and I feel that all these little tweaks here or there are just making it even more impossible to completely grasp or predict how it will behave. Ben.
On Tue, Jun 25, 2019 at 09:45:04AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote: > > > > > > I'm pretty sure this patch preserves the previous behavior of > > > pci_bus_distribute_available_resources(), but I'm not sure that > > > behavior is what we want. > > > > > > For example, in the following topology, when we process bus 10, we > > > find two non-hotplug bridges and no hotplug bridges, so IIUC we > > > return > > > without distributing any resources to them. But I would think we > > > should try to give 10:1c.0 more space if possible because it has a > > > hotplug bridge below it. > > > > > > 00:1c.0: hotplug bridge to [bus 10-2f] > > > 10:1c.0: non-hotplug bridge to [bus 11-2e] > > > 11:00.0: hotplug bridge to [bus 12-2e] > > > 10:1c.1: non-hotplug bridge to [bus 2f] > > > > Yes, I agree in this case we want to preserve more space for 10:1c.0. > > I sitll can't make sense of any of this stuff though. > > We only every distribute resources when using > pci_assign_unassigned_bridge_resources which we only use some cases, > and it's completely non obvious why we would use it there and not in > other places. We added it only for native PCIe hotplug path with the assumption that the boot firmware takes care of the initial resource allocation. I don't see any particular reason why it could not be called for other paths as well, though. > We also don't distribute during the initial root survey meaning afaik > that we get toast for any hotplug bridge that has stuff already there > at boot. The boot firmware obviously needs to follow the same logic. AFAICT recent PCs and Macs using native PCIe hotplug handle it. > Also, distributing the "available" space means we leave nothing for > potential SR-IOV siblings... have we ended up bloting the very PCIe- > centric assumption that it's "unlikely" that a hotplug bridge has an > SR-IOV sibling ? Looking at the code, I'm not sure we reserved any additional resource space for the SR-IOV even before pci_bus_distribute_available_resources() was introduced. We do reserve extra bus numbers for SR-IOV in pci_scan_child_bus_extend() so maybe we can add something similar to resource allocation path.
On Tue, 2019-06-25 at 13:05 +0300, Mika Westerberg wrote: > > We only every distribute resources when using > > pci_assign_unassigned_bridge_resources which we only use some cases, > > and it's completely non obvious why we would use it there and not in > > other places. > > We added it only for native PCIe hotplug path with the assumption that > the boot firmware takes care of the initial resource allocation. I don't > see any particular reason why it could not be called for other paths as > well, though. Ok, we need to look into this for all the platforms who just reassign everything in Linux (ie, ignore whatever the boot firmware did, if it did anything). I feel like all these platforms today will have a hard time getting anything useful out of hotplug with our default "2M" add to the hotplug bridges :) > > We also don't distribute during the initial root survey meaning afaik > > that we get toast for any hotplug bridge that has stuff already there > > at boot. > > The boot firmware obviously needs to follow the same logic. AFAICT > recent PCs and Macs using native PCIe hotplug handle it. What's your experience in that area ? How (well) do they handle it in the boot firmware ? at least on arm64, boot firmwares are rather catastrophic when it comes to PCI, and on other embedded devices they are basically non-existent. > > Also, distributing the "available" space means we leave nothing for > > potential SR-IOV siblings... have we ended up bloting the very PCIe- > > centric assumption that it's "unlikely" that a hotplug bridge has an > > SR-IOV sibling ? > > Looking at the code, I'm not sure we reserved any additional resource > space for the SR-IOV even before pci_bus_distribute_available_resources() > was introduced. We do reserve extra bus numbers for SR-IOV in > pci_scan_child_bus_extend() so maybe we can add something similar to > resource allocation path. Ok. I'll look more. I think we do somewhat cater for SR-IOV in in the bridge sizing code actually. It's a bit obscure... I also need to look a bit more closely at what happens with Thunderbolt. Thanks ! Cheers Ben.
On Tue, Jun 25, 2019 at 09:48:19PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-25 at 13:05 +0300, Mika Westerberg wrote: > > > We only every distribute resources when using > > > pci_assign_unassigned_bridge_resources which we only use some cases, > > > and it's completely non obvious why we would use it there and not in > > > other places. > > > > We added it only for native PCIe hotplug path with the assumption that > > the boot firmware takes care of the initial resource allocation. I don't > > see any particular reason why it could not be called for other paths as > > well, though. > > Ok, we need to look into this for all the platforms who just reassign > everything in Linux (ie, ignore whatever the boot firmware did, if it > did anything). > > I feel like all these platforms today will have a hard time getting > anything useful out of hotplug with our default "2M" add to the hotplug > bridges :) Yeah, at least if Thunderbolt is involved each "daisy-chained" device adds a complete PCIe switch running out of resources rather quick. > > > We also don't distribute during the initial root survey meaning afaik > > > that we get toast for any hotplug bridge that has stuff already there > > > at boot. > > > > The boot firmware obviously needs to follow the same logic. AFAICT > > recent PCs and Macs using native PCIe hotplug handle it. > > What's your experience in that area ? How (well) do they handle it in > the boot firmware ? at least on arm64, boot firmwares are rather > catastrophic when it comes to PCI, and on other embedded devices they > are basically non-existent. Well my experience is quite limited to recent Macs and PCs which usually handle the initial resource allocation just fine. In case of Thunderbolt some "older" PCs handle everything in firmware, even the runtime resource allocation via SMI handler accompanied with ACPI hotplug.
On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote: > > What's your experience in that area ? How (well) do they handle it in > > the boot firmware ? at least on arm64, boot firmwares are rather > > catastrophic when it comes to PCI, and on other embedded devices they > > are basically non-existent. > > Well my experience is quite limited to recent Macs and PCs which usually > handle the initial resource allocation just fine. In case of Thunderbolt > some "older" PCs handle everything in firmware, even the runtime > resource allocation via SMI handler accompanied with ACPI hotplug. Ah so this is what Lenovo calls "Thunderbolt firmware assist" in the BIOS ? I turned that on, it did help with Linux :) Cheers, Ben.
On Tue, Jun 25, 2019 at 10:23:33PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote: > > > What's your experience in that area ? How (well) do they handle it in > > > the boot firmware ? at least on arm64, boot firmwares are rather > > > catastrophic when it comes to PCI, and on other embedded devices they > > > are basically non-existent. > > > > Well my experience is quite limited to recent Macs and PCs which usually > > handle the initial resource allocation just fine. In case of Thunderbolt > > some "older" PCs handle everything in firmware, even the runtime > > resource allocation via SMI handler accompanied with ACPI hotplug. > > Ah so this is what Lenovo calls "Thunderbolt firmware assist" in the > BIOS ? Yes, exactly. > I turned that on, it did help with Linux :) Well, it should also work using native PCIe with recent kernels. At least that's what I've been trying to get working since last year ;-)
On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote: > > What's your experience in that area ? How (well) do they handle it in > > the boot firmware ? at least on arm64, boot firmwares are rather > > catastrophic when it comes to PCI, and on other embedded devices they > > are basically non-existent. > > Well my experience is quite limited to recent Macs and PCs which usually > handle the initial resource allocation just fine. In case of Thunderbolt > some "older" PCs handle everything in firmware, even the runtime > resource allocation via SMI handler accompanied with ACPI hotplug. So I'm tempted to toy a bit with the "realloc everything" platforms (all non-x86 embedded basically) use pci_bus_distribute_available_resources on the PCIe root ports unconditionally. I don't think it will be a problem with SR-IOV as I very much doubt you'll end up with a setup where we have under the root port SR-IOV devices that are *siblings* with a switch. If that ever becomes the case, SR-IOV should be hanlded somewhat as part of the add_list of the bus sizing anyway. I might do that as a test patch or behind a test config option, see how it goes, after I've consolidated all those platforms to go through the same generic code path, it will be a lot easier. I'm keen to limit that to PCIe root ports though, old stuff can stay as it is and die happily :) Cheers, Ben.
On Tue, Jun 25, 2019 at 09:45:04AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote: > > > > > > I'm pretty sure this patch preserves the previous behavior of > > > pci_bus_distribute_available_resources(), but I'm not sure that > > > behavior is what we want. > > > > > > For example, in the following topology, when we process bus 10, we > > > find two non-hotplug bridges and no hotplug bridges, so IIUC we > > > return > > > without distributing any resources to them. But I would think we > > > should try to give 10:1c.0 more space if possible because it has a > > > hotplug bridge below it. > > > > > > 00:1c.0: hotplug bridge to [bus 10-2f] > > > 10:1c.0: non-hotplug bridge to [bus 11-2e] > > > 11:00.0: hotplug bridge to [bus 12-2e] > > > 10:1c.1: non-hotplug bridge to [bus 2f] > > > > Yes, I agree in this case we want to preserve more space for 10:1c.0. > > I sitll can't make sense of any of this stuff though. > > We only every distribute resources when using > pci_assign_unassigned_bridge_resources which we only use some cases, > and it's completely non obvious why we would use it there and not in > other places. > > We also don't distribute during the initial root survey meaning afaik > that we get toast for any hotplug bridge that has stuff already there > at boot. > > Also, distributing the "available" space means we leave nothing for > potential SR-IOV siblings... have we ended up bloting the very PCIe- > centric assumption that it's "unlikely" that a hotplug bridge has an > SR-IOV sibling ? > > This is all a terrible mess and I feel that all these little tweaks > here or there are just making it even more impossible to completely > grasp or predict how it will behave. No argument about it being a mess. I agree that tweaks clutter the history, which is definitely a downside. Do you think these actually change the way things work or make the code harder to read? I think there is value in even minor simplifications that make the code easier to understand. Small improvements compound over time and expose opportunities for more significant improvement. Bjorn
On Wed, 2019-06-26 at 12:35 -0500, Bjorn Helgaas wrote: > No argument about it being a mess. > > I agree that tweaks clutter the history, which is definitely a > downside. Do you think these actually change the way things work or > make the code harder to read? > > I think there is value in even minor simplifications that make the > code easier to understand. Small improvements compound over time and > expose opportunities for more significant improvement. Oh I absolutely agree. And I love that your patches come with more cset comment than actual patch lines :-) The main issue I've had so far trying to untangle things is the sheer amount of subtle changes and tweaks that went in over the year without any useful explanation as to why things are done. For example, do you have any idea why this: d65245c3297ac63abc51a976d92f45f2195d2854 PCI: don't shrink bridge resources Was added by Yinghai in 2010 ? :-) The main issue with this is that previous to this commit, pbus_size_{io,mem} would essentially ignore the previous state of the bridge resources, and calculate from scratch (provided the resources are unclaimed). After this, it has this subtle dependency. This is what broke Lorenzo attempts at moving pci_read_bridge_bases() to the geneneric code a couple of years ago for example. There may be a good reason to do that on x86, though it's not explained, but it's definitely not right if the platform requires a full re-assignment. Now I'll "work around" it by making that function look at the assignment policy set by the arch/platform, but it's a fix on top of a quirk on top of a band-aid. However, what else can we do without understanding the root issue that lead to that patch being created ? Cheers, Ben.
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index af28af898e42..04adeebe8866 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1887,6 +1887,9 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, return; } + if (hotplug_bridges == 0) + return; + /* * Calculate the total amount of extra resource space we can * pass to bridges below this one. This is basically the @@ -1936,8 +1939,6 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, * Distribute available extra resources equally between * hotplug-capable downstream ports taking alignment into * account. - * - * Here hotplug_bridges is always != 0. */ align = pci_resource_alignment(bridge, io_res); io = div64_ul(available_io, hotplug_bridges);
From: Bjorn Helgaas <bhelgaas@google.com> If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the loop that divides the remaining resources among hotplug-capable bridges does nothing. Check for "hotplug_bridges == 0" earlier, so we don't even have to compute the amount of remaining resources. No functional change intended. --- I'm pretty sure this patch preserves the previous behavior of pci_bus_distribute_available_resources(), but I'm not sure that behavior is what we want. For example, in the following topology, when we process bus 10, we find two non-hotplug bridges and no hotplug bridges, so IIUC we return without distributing any resources to them. But I would think we should try to give 10:1c.0 more space if possible because it has a hotplug bridge below it. 00:1c.0: hotplug bridge to [bus 10-2f] 10:1c.0: non-hotplug bridge to [bus 11-2e] 11:00.0: hotplug bridge to [bus 12-2e] 10:1c.1: non-hotplug bridge to [bus 2f] --- drivers/pci/setup-bus.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)