Message ID | f3db713f4a737756782be6e94fcea3eda352e39f.1714838173.git.namcao@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | abort hot-plug if pci_hp_add_bridge() fails | expand |
[cc += Ilpo, Mika] On Sat, May 04, 2024 at 06:15:22PM +0200, Nam Cao wrote: > If a bridge is hot-added without any bus number available for its > downstream bus, pci_hp_add_bridge() will fail. However, the driver > proceeds regardless, and the kernel crashes. [...] > Fix this by aborting the hot-plug if pci_hp_add_bridge() fails. [...] > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl) > goto out; > } > > - for_each_pci_bridge(dev, parent) > - pci_hp_add_bridge(dev); > + for_each_pci_bridge(dev, parent) { > + if (pci_hp_add_bridge(dev)) > + goto out; > + } > > pci_assign_unassigned_bridge_resources(bridge); > pcie_bus_configure_settings(parent); Are the curly braces even necessary? FWIW, the rationale for returning 0 (success) in this case is that pciehp has done its job by bringing up the slot and enumerating the bridge in the slot. It's not pciehp's fault that the hierarchy cannot be extended further below the hot-added bridge. Have you gone through the testing steps you spoke of earlier (replacing the hot-added bridge with an Ethernet card) and do they work correctly with this patch? Reviewed-by: Lukas Wunner <lukas@wunner.de> Thanks, Lukas
On Sun, May 05, 2024 at 07:45:13AM +0200, Lukas Wunner wrote: > On Sat, May 04, 2024 at 06:15:22PM +0200, Nam Cao wrote: > > If a bridge is hot-added without any bus number available for its > > downstream bus, pci_hp_add_bridge() will fail. However, the driver > > proceeds regardless, and the kernel crashes. > [...] > > Fix this by aborting the hot-plug if pci_hp_add_bridge() fails. > [...] > > --- a/drivers/pci/hotplug/pciehp_pci.c > > +++ b/drivers/pci/hotplug/pciehp_pci.c > > @@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl) > > goto out; > > } > > > > - for_each_pci_bridge(dev, parent) > > - pci_hp_add_bridge(dev); > > + for_each_pci_bridge(dev, parent) { > > + if (pci_hp_add_bridge(dev)) > > + goto out; > > + } > > > > pci_assign_unassigned_bridge_resources(bridge); > > pcie_bus_configure_settings(parent); > > Are the curly braces even necessary? Nope. I thought this is the kernel's coding style, since checkpatch.pl didn't complain. But checkpatch also doesn't complain after I remove it, so no I guess that's not necessary. > FWIW, the rationale for returning 0 (success) in this case is that > pciehp has done its job by bringing up the slot and enumerating the > bridge in the slot. It's not pciehp's fault that the hierarchy > cannot be extended further below the hot-added bridge. > > Have you gone through the testing steps you spoke of earlier > (replacing the hot-added bridge with an Ethernet card) and do > they work correctly with this patch? Yes. > Reviewed-by: Lukas Wunner <lukas@wunner.de> Thanks for the review. I will send v3 with the brackets removed. Best regards, Nam
On Sun, May 05, 2024 at 09:15:00AM +0200, Nam Cao wrote: > On Sun, May 05, 2024 at 07:45:13AM +0200, Lukas Wunner wrote: > > Have you gone through the testing steps you spoke of earlier > > (replacing the hot-added bridge with an Ethernet card) and do > > they work correctly with this patch? > > Yes. I just discovered a new crash scenario with shpchp: First, hot-add a bridge: (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1 But with the current patch, the hot-plug is still successful, just that the bridge is not properly configured: $ lspci -t -[0000:00]-+-00.0 +-01.0 +-02.0 +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge +-1f.0 +-1f.2 \-1f.3 But! Now I leave the hot-added bridge as is, and hot-add an ethernet card: (qemu) device_add e1000,bus=br1,id=eth0,addr=2 this command crashes the kernel (full crash log below). The problem is because during hot-plugging of this ethernet card, pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't been marked as "added" yet, so the driver attempts to add this bridge again, leading to the crash. Now for pciehp, this scenario cannot happen, because there is only a single slot, so we cannot hot-plug another device. But still, this makes me think perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured state as this patch is doing. I cannot think of a scenario that would crash pciehp similarly to shpchp. But that's just me, maybe there is a rare scenario that can crash pciehp, but I just haven't seen yet. My point is: better safe than sorry. I propose going back to the original solution: calling pci_stop_and_remove_bus_device() and return a negative error, and get rid of this improperly configured bridge. It is useless anyways without a subordinate bus number. What do you think? Best regards, Nam (qemu) device_add e1000,bus=br1,id=eth0,addr=2 [ 140.536933] shpchp 0000:01:00.0: Latch close on Slot(2) [ 140.538466] shpchp 0000:01:00.0: Button pressed on Slot(2) [ 140.539992] shpchp 0000:01:00.0: Card present on Slot(2) [ 140.541600] shpchp 0000:01:00.0: PCI slot #2 - powering on due to button press [ 146.604107] pci 0000:02:02.0: [8086:100e] type 00 class 0x020000 conventional PCI endpoint [ 146.606679] pci 0000:02:02.0: BAR 0 [mem 0x00000000-0x0001ffff] [ 146.608438] pci 0000:02:02.0: BAR 1 [io 0x0000-0x003f] [ 146.610340] pci 0000:02:02.0: ROM [mem 0x00000000-0x0003ffff pref] [ 146.612326] pci 0000:02:02.0: vgaarb: pci_notify [ 146.613730] pci 0000:02:02.0: ROM [mem 0xfe600000-0xfe63ffff pref]: assigned [ 146.615705] pci 0000:02:02.0: BAR 0 [mem 0xfe640000-0xfe65ffff]: assigned [ 146.617685] pci 0000:02:01.0: BAR 0 [mem 0xfe660000-0xfe6600ff 64bit]: assigned [ 146.621397] pci 0000:02:02.0: BAR 1 [io 0xc000-0xc03f]: assigned [ 146.623184] shpchp 0000:01:00.0: PCI bridge to [bus 02] [ 146.624704] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] [ 146.628853] shpchp 0000:01:00.0: bridge window [mem 0xfe600000-0xfe7fffff] [ 146.632415] shpchp 0000:01:00.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref] [ 146.637717] pcieport 0000:02:01.0: vgaarb: pci_notify [ 146.639172] pcieport 0000:02:01.0: runtime IRQ mapping not provided by arch [ 146.641635] pcieport 0000:02:01.0: vgaarb: pci_notify [ 146.643093] shpchp 0000:02:01.0: vgaarb: pci_notify [ 146.644501] shpchp 0000:02:01.0: runtime IRQ mapping not provided by arch [ 146.647967] shpchp 0000:02:01.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0 [ 146.650106] shpchp 0000:02:01.0: enabling device (0000 -> 0002) [ 146.653840] ACPI: \_SB_.GSIE: Enabled at IRQ 20 [ 146.656699] shpchp 0000:02:01.0: enabling bus mastering [ 146.659567] BUG: kernel NULL pointer dereference, address: 00000000000000da [ 146.661532] #PF: supervisor write access in kernel mode [ 146.663006] #PF: error_code(0x0002) - not-present page [ 146.664453] PGD 0 P4D 0 [ 146.665208] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 146.666447] CPU: 1 PID: 35 Comm: kworker/1:1 Not tainted 6.9.0-rc1-00003-g5ef667cf77c0 #59 [ 146.668743] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 146.671314] Workqueue: shpchp-2 shpchp_pushbutton_thread [ 146.672817] RIP: 0010:shpc_init+0x3fb/0x9d0 [ 146.674282] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f [ 146.679118] RSP: 0018:ffffc90000133ab0 EFLAGS: 00010246 [ 146.680245] RAX: 0000000000000000 RBX: ffff8880056fd600 RCX: 0000000000000000 [ 146.681782] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83059101 [ 146.683307] RBP: ffffc90000133af8 R08: ffff88800364f500 R09: 0000000000000000 [ 146.684831] R10: 0000000000000000 R11: ffff88800470b840 R12: ffff888006af7000 [ 146.686361] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f [ 146.687882] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 146.689597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 146.690821] CR2: 00000000000000da CR3: 000000000432a000 CR4: 00000000000006f0 [ 146.692331] Call Trace: [ 146.692869] <TASK> [ 146.693351] ? show_regs+0x63/0x70 [ 146.694097] ? __die+0x23/0x70 [ 146.694771] ? page_fault_oops+0x17a/0x470 [ 146.695659] ? search_module_extables+0x18/0x60 [ 146.696644] ? shpc_init+0x3fb/0x9d0 [ 146.697435] ? kernelmode_fixup_or_oops+0x9b/0x120 [ 146.698474] ? __bad_area_nosemaphore+0x16e/0x240 [ 146.699495] ? bad_area_nosemaphore+0x11/0x20 [ 146.700442] ? do_user_addr_fault+0x2a8/0x610 [ 146.701405] ? exc_page_fault+0x6d/0x160 [ 146.702263] ? asm_exc_page_fault+0x2b/0x30 [ 146.703171] ? shpc_init+0x3fb/0x9d0 [ 146.704213] shpc_probe+0x92/0x390 [ 146.704880] local_pci_probe+0x46/0xa0 [ 146.705626] pci_device_probe+0xb0/0x190 [ 146.706386] really_probe+0xc7/0x390 [ 146.707080] ? __pfx___device_attach_driver+0x10/0x10 [ 146.708044] __driver_probe_device+0x78/0x150 [ 146.708880] driver_probe_device+0x1f/0x90 [ 146.709679] __device_attach_driver+0x93/0x120 [ 146.710538] bus_for_each_drv+0x96/0xf0 [ 146.711277] __device_attach+0xb1/0x1c0 [ 146.712018] device_attach+0xf/0x20 [ 146.712695] pci_bus_add_device+0x58/0x90 [ 146.713480] pci_bus_add_devices+0x30/0x70 [ 146.714292] shpchp_configure_device+0xd8/0x150 [ 146.715200] board_added+0x115/0x420 [ 146.715898] shpchp_enable_slot+0x114/0x3b0 [ 146.716711] shpchp_pushbutton_thread+0x77/0xa0 [ 146.717605] process_one_work+0x153/0x390 [ 146.718387] worker_thread+0x2c9/0x400 [ 146.719111] ? __pfx_worker_thread+0x10/0x10 [ 146.719945] kthread+0xd7/0x100 [ 146.720566] ? __pfx_kthread+0x10/0x10 [ 146.721306] ret_from_fork+0x3c/0x60 [ 146.722006] ? __pfx_kthread+0x10/0x10 [ 146.722734] ret_from_fork_asm+0x1a/0x30 [ 146.723498] </TASK> [ 146.723931] Modules linked in: [ 146.724529] CR2: 00000000000000da [ 146.725182] ---[ end trace 0000000000000000 ]--- [ 146.726065] RIP: 0010:shpc_init+0x3fb/0x9d0 [ 146.726862] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f [ 146.730372] RSP: 0018:ffffc90000133ab0 EFLAGS: 00010246 [ 146.731363] RAX: 0000000000000000 RBX: ffff8880056fd600 RCX: 0000000000000000 [ 146.732714] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83059101 [ 146.734072] RBP: ffffc90000133af8 R08: ffff88800364f500 R09: 0000000000000000 [ 146.735429] R10: 0000000000000000 R11: ffff88800470b840 R12: ffff888006af7000 [ 146.736779] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f [ 146.738143] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 146.739673] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 146.740764] CR2: 00000000000000da CR3: 000000000432a000 CR4: 00000000000006f0 [ 146.742116] note: kworker/1:1[35] exited with irqs disabled [ 146.743232] kworker/1:1 (35) used greatest stack depth: 12352 bytes left
On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote: > I just discovered a new crash scenario with shpchp: > > First, hot-add a bridge: > (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1 > > But with the current patch, the hot-plug is still successful, just that the > bridge is not properly configured: > $ lspci -t > -[0000:00]-+-00.0 > +-01.0 > +-02.0 > +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge > +-1f.0 > +-1f.2 > \-1f.3 > > But! Now I leave the hot-added bridge as is, and hot-add an ethernet card: > (qemu) device_add e1000,bus=br1,id=eth0,addr=2 > > this command crashes the kernel (full crash log below). > > The problem is because during hot-plugging of this ethernet card, > pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't > been marked as "added" yet, so the driver attempts to add this bridge > again, leading to the crash. > > Now for pciehp, this scenario cannot happen, because there is only a single > slot, so we cannot hot-plug another device. But still, this makes me think > perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured > state as this patch is doing. I cannot think of a scenario that would crash pciehp > similarly to shpchp. But that's just me, maybe there is a rare scenario > that can crash pciehp, but I just haven't seen yet. > > My point is: better safe than sorry. I propose going back to the original > solution: calling pci_stop_and_remove_bus_device() and return a negative > error, and get rid of this improperly configured bridge. It is useless > anyways without a subordinate bus number. > > What do you think? When the kernel runs out of BAR address space for devices downstream of a bridge, it doesn't de-enumerate the bridge. The devices are just unusable. So my first intuition is that running out of bus numbers shouldn't result in de-enumeration either. Remind me, how exactly does the NULL pointer deref occur? I think it's because no struct pci_bus was allocated for the subordinate bus of the hot-plugged bridge, right? Because AFAICS that would happen in pci_hp_add_bridge() pci_can_bridge_extend() pci_add_new_bus() pci_alloc_child_bus() but we never get that far because pci_hp_add_bridge() bails out with -1. So the subordinate pointer remains a NULL pointer. Perhaps we should allocate an empty struct pci_bus instead. Or check for a NULL subordinate pointer instead of crashing. I can't really say off the top of my head which solution is appropriate here, it requires going through the code paths and identifying the complexity associated with each solution. Thanks, Lukas
On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote: > On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote: > > I just discovered a new crash scenario with shpchp: > > > > First, hot-add a bridge: > > (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1 > > > > But with the current patch, the hot-plug is still successful, just that the > > bridge is not properly configured: > > $ lspci -t > > -[0000:00]-+-00.0 > > +-01.0 > > +-02.0 > > +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge > > +-1f.0 > > +-1f.2 > > \-1f.3 > > > > But! Now I leave the hot-added bridge as is, and hot-add an ethernet card: > > (qemu) device_add e1000,bus=br1,id=eth0,addr=2 > > > > this command crashes the kernel (full crash log below). > > > > The problem is because during hot-plugging of this ethernet card, > > pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't > > been marked as "added" yet, so the driver attempts to add this bridge > > again, leading to the crash. > > > > Now for pciehp, this scenario cannot happen, because there is only a single > > slot, so we cannot hot-plug another device. But still, this makes me think > > perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured > > state as this patch is doing. I cannot think of a scenario that would crash pciehp > > similarly to shpchp. But that's just me, maybe there is a rare scenario > > that can crash pciehp, but I just haven't seen yet. > > > > My point is: better safe than sorry. I propose going back to the original > > solution: calling pci_stop_and_remove_bus_device() and return a negative > > error, and get rid of this improperly configured bridge. It is useless > > anyways without a subordinate bus number. > > > > What do you think? > > When the kernel runs out of BAR address space for devices downstream of > a bridge, it doesn't de-enumerate the bridge. The devices are just > unusable. > > So my first intuition is that running out of bus numbers shouldn't result > in de-enumeration either. The BAR case is a bit different: it is a legal configuration for a bridge to have no address space downstream: we can configure the bridge with limit<base to indicate that there is no downstream addresses. There is nothing similar for secondary bus number: there is no legal bus number configuration for the bridge in this case. > Remind me, how exactly does the NULL pointer deref occur? I think it's > because no struct pci_bus was allocated for the subordinate bus of the > hot-plugged bridge, right? Because AFAICS that would happen in > > pci_hp_add_bridge() > pci_can_bridge_extend() > pci_add_new_bus() > pci_alloc_child_bus() > > but we never get that far because pci_hp_add_bridge() bails out with -1. > So the subordinate pointer remains a NULL pointer. This is correct. NULL deference happens due to subordinate pointer being NULL. > Perhaps we should allocate an empty struct pci_bus instead. This feels a bit hacky. What bus number could we set to this struct? And I doubt that the PCI subsystem is written with the expectation that struct pci_bus can be a dummy one, so I would guess that the probability of breaking thing is high with this approach. > Or check for a NULL subordinate pointer instead of crashing. I think this is a possible solution, but it is a bit complicated: all usage of subordinate pointers will need to be looked at. Further more, secondary bus number being zero is currently taken as unconfigured/invalid (for example in pci_scan_bridge_extend()), that needs to be changed as well. Doing this has a good chance of breaking things. I don't really see the upside of the above, compared to just deleting this bridge. It is not really counter-intuitive that the OS rejects a hot-plugged device that cannot be configured, right? Also this solution is wayyy simpler. Unless there is problem with this that I am not seeing? Best regards, Nam
On Tue, May 07, 2024 at 04:27:38PM +0200, Nam Cao wrote: > On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote: > > Remind me, how exactly does the NULL pointer deref occur? I think it's > > because no struct pci_bus was allocated for the subordinate bus of the > > hot-plugged bridge, right? Because AFAICS that would happen in > > > > pci_hp_add_bridge() > > pci_can_bridge_extend() > > pci_add_new_bus() > > pci_alloc_child_bus() > > > > but we never get that far because pci_hp_add_bridge() bails out with -1. > > So the subordinate pointer remains a NULL pointer. > > This is correct. NULL deference happens due to subordinate pointer being > NULL. > > > Or check for a NULL subordinate pointer instead of crashing. > > I think this is a possible solution, but it is a bit complicated: all usage > of subordinate pointers will need to be looked at. We already check for a NULL subordinate pointer in various places. See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp probe"). If we're missing such checks, I'd suggest to add those. If you believe having a NULL subordinate pointer is wrong and the bridge should be de-enumerated altogether, I think you would have to remove these NULL pointer checks as they'd otherwise become pointless with your change. Just adding missing NULL pointer checks seems to be the most straightforward solution to me. Thanks, Lukas
On Mon, May 27, 2024 at 11:15:55AM +0200, Lukas Wunner wrote: > On Tue, May 07, 2024 at 04:27:38PM +0200, Nam Cao wrote: > > On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote: > > > Remind me, how exactly does the NULL pointer deref occur? I think it's > > > because no struct pci_bus was allocated for the subordinate bus of the > > > hot-plugged bridge, right? Because AFAICS that would happen in > > > > > > pci_hp_add_bridge() > > > pci_can_bridge_extend() > > > pci_add_new_bus() > > > pci_alloc_child_bus() > > > > > > but we never get that far because pci_hp_add_bridge() bails out with -1. > > > So the subordinate pointer remains a NULL pointer. > > > > This is correct. NULL deference happens due to subordinate pointer being > > NULL. > > > > > Or check for a NULL subordinate pointer instead of crashing. > > > > I think this is a possible solution, but it is a bit complicated: all usage > > of subordinate pointers will need to be looked at. > > We already check for a NULL subordinate pointer in various places. > See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during > pciehp probe"). Ah, so bridge without subordinate bus is allowed in the kernel. > If we're missing such checks, I'd suggest to add those. > > If you believe having a NULL subordinate pointer is wrong and the > bridge should be de-enumerated altogether, I think you would have > to remove these NULL pointer checks as they'd otherwise become > pointless with your change. > > Just adding missing NULL pointer checks seems to be the most > straightforward solution to me. If the kernel do permits bridges without subordinate bus number, I am happy to go this direction. I expect going this way will require many more patches, I will dig into it and come back later. Thanks for your patience providing me advices & information. Best regards, Nam
On Mon, May 27, 2024 at 11:23:22AM +0200, Nam Cao wrote: > On Mon, May 27, 2024 at 11:15:55AM +0200, Lukas Wunner wrote: > > We already check for a NULL subordinate pointer in various places. > > See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during > > pciehp probe"). > > Ah, so bridge without subordinate bus is allowed in the kernel. > > > If we're missing such checks, I'd suggest to add those. > > > > If you believe having a NULL subordinate pointer is wrong and the > > bridge should be de-enumerated altogether, I think you would have > > to remove these NULL pointer checks as they'd otherwise become > > pointless with your change. > > > > Just adding missing NULL pointer checks seems to be the most > > straightforward solution to me. > > If the kernel do permits bridges without subordinate bus number, I am > happy to go this direction. I expect going this way will require many more > patches, I will dig into it and come back later. It seems a lot of functions use a "if (dev->subordinate)" check to distinguish between bridges and endpoints. A bridge with a NULL subordinate pointer is then basically treated like an endpoint. There are places which use other ways to recognize bridges, e.g. by looking at dev->hdr_type. Only these code paths will have to check for a NULL subordinate pointer. In the case of pciehp, portdrv binds to anything with class PCI_CLASS_BRIDGE_PCI_NORMAL, without consideration for the subordinate pointer, hence a check was needed in pciehp. pciehp is used a lot by Thunderbolt/USB4, SD 7.0 and for NVMe drives. I think shpchp isn't used as much, so its code paths aren't exercised as heavily and bugs aren't found and fixed as quickly. So chances are that more checks are missing in shpchp than in pciehp. Thanks, Lukas
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c index ad12515a4a12..200a7f4a12e0 100644 --- a/drivers/pci/hotplug/pciehp_pci.c +++ b/drivers/pci/hotplug/pciehp_pci.c @@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl) goto out; } - for_each_pci_bridge(dev, parent) - pci_hp_add_bridge(dev); + for_each_pci_bridge(dev, parent) { + if (pci_hp_add_bridge(dev)) + goto out; + } pci_assign_unassigned_bridge_resources(bridge); pcie_bus_configure_settings(parent);
If a bridge is hot-added without any bus number available for its downstream bus, pci_hp_add_bridge() will fail. However, the driver proceeds regardless, and the kernel crashes. This crash can be reproduced with the QEMU command: qemu-system-x86_64 -machine pc-q35-2.10 \ -kernel bzImage \ -drive "file=img,format=raw" \ -m 2048 -smp 2 -enable-kvm \ -append "console=ttyS0 root=/dev/sda" \ -nographic \ -device pcie-root-port,bus=pcie.0,id=rp1,slot=1,bus-reserve=0 then hot-plug a bridge at runtime with the QEMU command: device_add pcie-pci-bridge,id=br1,bus=rp1 and the kernel crashes: pcieport 0000:00:03.0: pciehp: Slot(1): Button press: will power on in 5 sec pcieport 0000:00:03.0: pciehp: Slot(1): Card present pcieport 0000:00:03.0: pciehp: Slot(1): Link Up pci 0000:01:00.0: [1b36:000e] type 01 class 0x060400 PCIe to PCI/PCI-X bridge pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000000ff 64bit] pci 0000:01:00.0: PCI bridge to [bus 00] pci 0000:01:00.0: bridge window [io 0x0000-0x0fff] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] pci 0000:01:00.0: No bus number available for hot-added bridge (note: kernel should abort hot-plugging right here) pci 0000:01:00.0: BAR 0 [mem 0xfe800000-0xfe8000ff 64bit]: assigned pcieport 0000:00:03.0: PCI bridge to [bus 01] pcieport 0000:00:03.0: bridge window [io 0x1000-0x1fff] pcieport 0000:00:03.0: bridge window [mem 0xfe800000-0xfe9fffff] pcieport 0000:00:03.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref] shpchp 0000:01:00.0: HPC vendor_id 1b36 device_id e ss_vid 0 ss_did 0 shpchp 0000:01:00.0: enabling device (0000 -> 0002) BUG: kernel NULL pointer dereference, address: 00000000000000da PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 46 Comm: irq/24-pciehp Not tainted 6.9.0-rc1-00001-g2e0239d47d75 #33 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:shpc_init+0x3fb/0x9d0 [stack dump and register dump cut out] Fix this by aborting the hot-plug if pci_hp_add_bridge() fails. Fixes: 0eb3bcfd088e ("[PATCH] pciehp: allow bridged card hotplug") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> --- v2: - remove "Cc: Rajesh Shah <rajesh.shah@intel.com>" (this address bounce) - add more information to commit message - return 0 instead of -EINVAL - remove pci_stop_and_remove_bus_device() drivers/pci/hotplug/pciehp_pci.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)