diff mbox series

PCI/bwctrl: Fix NULL pointer deref on bus number exhaustion

Message ID 3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de (mailing list archive)
State Accepted
Delegated to: Krzysztof Wilczyński
Headers show
Series PCI/bwctrl: Fix NULL pointer deref on bus number exhaustion | expand

Commit Message

Lukas Wunner March 22, 2025, 6:52 p.m. UTC
When BIOS neglects to assign bus numbers to PCI bridges, the kernel
attempts to correct that during PCI device enumeration.  If it runs out
of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
the bridge's pci_dev remains NULL.

The PCIe bandwidth controller erroneously does not check for a NULL
subordinate pointer and dereferences it on probe.

Bandwidth control of unusable devices below the bridge is of questionable
utility, so simply error out instead.  This mirrors what PCIe hotplug does
since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp
probe").

The PCI core emits a message with KERN_INFO severity if it has run out of
bus numbers.  PCIe hotplug emits an additional message with KERN_ERR
severity to inform the user that hotplug functionality is disabled at the
bridge.  A similar message for bandwidth control does not seem merited,
given that its only purpose so far is to expose an up-to-date link speed
in sysfs and throttle the link speed on certain laptops with limited
Thermal Design Power.  So error out silently.

User-visible messages:

  pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  [...]
  pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
  pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
  [...]
  pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
  [...]
  BUG: kernel NULL pointer dereference
  RIP: pcie_update_link_speed
  pcie_bwnotif_enable
  pcie_bwnotif_probe
  pcie_port_probe_service
  really_probe

Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.13+
---
 drivers/pci/pcie/bwctrl.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Krzysztof Wilczyński March 23, 2025, 11:24 a.m. UTC | #1
Hello,

> When BIOS neglects to assign bus numbers to PCI bridges, the kernel
> attempts to correct that during PCI device enumeration.  If it runs out
> of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
> the bridge's pci_dev remains NULL.
> 
> The PCIe bandwidth controller erroneously does not check for a NULL
> subordinate pointer and dereferences it on probe.
> 
> Bandwidth control of unusable devices below the bridge is of questionable
> utility, so simply error out instead.  This mirrors what PCIe hotplug does
> since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp
> probe").
> 
> The PCI core emits a message with KERN_INFO severity if it has run out of
> bus numbers.  PCIe hotplug emits an additional message with KERN_ERR
> severity to inform the user that hotplug functionality is disabled at the
> bridge.  A similar message for bandwidth control does not seem merited,
> given that its only purpose so far is to expose an up-to-date link speed
> in sysfs and throttle the link speed on certain laptops with limited
> Thermal Design Power.  So error out silently.
> 
> User-visible messages:
> 
>   pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>   [...]
>   pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
>   pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
>   [...]
>   pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
>   [...]
>   BUG: kernel NULL pointer dereference
>   RIP: pcie_update_link_speed
>   pcie_bwnotif_enable
>   pcie_bwnotif_probe
>   pcie_port_probe_service
>   really_probe

Applied to bwctrl, thank you!

	Krzysztof
Krzysztof Wilczyński March 23, 2025, 12:56 p.m. UTC | #2
Hello,

> The patch is working, no kernel crashes and shutting down after hotplugging the Thunderbolt 4 dock does not hang anymore, thanks!

Thank you for testing!  Appreciated!

> I still see messages in the kernel log that 'devices behind bridge are unusable' because 'no bus can be assigned to them', 'Hotplug bridge without secondary bus, ignoring', etc. These all refer to the Thunderbolt 4 bridge. Adding "pci=hpbussize=0x33" to the kernel doesn't make a difference. Adding "pci=realloc,asssign-busses,hpbussize=0x33" actually does 'fix' the bus allocation failures (or at least I don't see any messages in the log anymore), but then amdgpu fails to initialize the IGP.

We can fix any outstanding issues or drop this patch.  I leave it to hot
plug experts like Ilpo, Bjorn and Lukas to make the call here.

> Anyway, all devices connected to the Thunderbolt 4 dock appear to work (USB, screens, ethernet) even despite these bus allocation failures, so I will just ignore them. 
> 
> Thanks again!

Would you be happy to provide your "Tested-by:" tag?

	Krzysztof
Wouter Bijlsma March 23, 2025, 1:07 p.m. UTC | #3
Hi Krzysztof,

> Would you be happy to provide your "Tested-by:" tag?

You can use 'Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>'

-Wouter


On Sun, Mar 23, 2025, at 1:56 PM, Krzysztof Wilczyński wrote:
> Hello,
> 
> > The patch is working, no kernel crashes and shutting down after hotplugging the Thunderbolt 4 dock does not hang anymore, thanks!
> 
> Thank you for testing!  Appreciated!
> 
> > I still see messages in the kernel log that 'devices behind bridge are unusable' because 'no bus can be assigned to them', 'Hotplug bridge without secondary bus, ignoring', etc. These all refer to the Thunderbolt 4 bridge. Adding "pci=hpbussize=0x33" to the kernel doesn't make a difference. Adding "pci=realloc,asssign-busses,hpbussize=0x33" actually does 'fix' the bus allocation failures (or at least I don't see any messages in the log anymore), but then amdgpu fails to initialize the IGP.
> 
> We can fix any outstanding issues or drop this patch.  I leave it to hot
> plug experts like Ilpo, Bjorn and Lukas to make the call here.
> 
> > Anyway, all devices connected to the Thunderbolt 4 dock appear to work (USB, screens, ethernet) even despite these bus allocation failures, so I will just ignore them. 
> > 
> > Thanks again!
> 
> Would you be happy to provide your "Tested-by:" tag?
> 
> Krzysztof
>
Krzysztof Wilczyński March 23, 2025, 1:39 p.m. UTC | #4
Hello,

> > Would you be happy to provide your "Tested-by:" tag?
> 
> You can use 'Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>'

Added to the other tags.  Thank you!

	Krzysztof
diff mbox series

Patch

diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 0a5e7efbce2c..0e5807ae4083 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -294,6 +294,10 @@  static int pcie_bwnotif_probe(struct pcie_device *srv)
 	struct pci_dev *port = srv->port;
 	int ret;
 
+	/* Can happen if we run out of bus numbers during enumeration */
+	if (!port->subordinate)
+		return -ENODEV;
+
 	struct pcie_bwctrl_data *data = devm_kzalloc(&srv->device,
 						     sizeof(*data), GFP_KERNEL);
 	if (!data)