Message ID | 20250324090108.965229-2-Bo.Sun.CN@windriver.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Marvell CN96XX/CN10XXX quirk and bus-range omission | expand |
On Mon, Mar 24, 2025 at 05:01:07PM +0800, Bo Sun wrote: > On our Marvell OCTEON CN96XX board, we observed the following panic on > the latest kernel: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 > CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20 > Hardware name: Marvell OcteonTX CN96XX board (DT) > pc : of_pci_add_properties+0x278/0x4c8 > Call trace: > of_pci_add_properties+0x278/0x4c8 (P) > of_pci_make_dev_node+0xe0/0x158 > pci_bus_add_device+0x158/0x228 > pci_bus_add_devices+0x40/0x98 > pci_host_probe+0x94/0x118 > pci_host_common_probe+0x130/0x1b0 > platform_probe+0x70/0xf0 > > The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range: > pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00 > pci_bus 0002:00: root bus resource [bus 00-ff] > pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0 > pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0 > pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0 > pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0 > pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0 > pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0 > pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0 > pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0 > pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring > pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0 > pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0 > pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0 > pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0 > pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0 > pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0 > pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0 > pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0 > > This regression was introduced by commit 7246a4520b4b ("PCI: Use > preserve_config in place of pci_flags"). On our board, the 0002:00:07.0 > bridge is misconfigured by the bootloader. Both its secondary and > subordinate bus numbers are initialized to 0, while its fixed secondary > bus number is set to 8. However, bus number 8 is also assigned to another > bridge (0002:00:0f.0). Although this is a bootloader issue, before the > change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set > by default when PCI_PROBE_ONLY was not enabled, ensuing that all the > bus number for these bridges were reassigned, avoiding any conflicts. > > After the change introduced in commit 7246a4520b4b, the bus numbers > the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure > 0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by > bootloader. However, since a pci_bus has already been allocated for > bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for > 0002:00:07.0. This results in a pci bridge device without a pci_bus > attached (pdev->subordinate == NULL). Consequently, accessing > pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer > dereference. > > To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when > PCI_PROBE_ONLY is not enabled in order to work around issue like the > one described above. > > Cc: stable@vger.kernel.org > Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags") > Signed-off-by: Bo Sun <Bo.Sun.CN@windriver.com> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> - Mani > --- > Changes in v3: > - Make 'PCI_REASSIGN_ALL_BUS' unconditional, as suggested by Mani. > - Update comment as requested by Mani. > > drivers/pci/quirks.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 82b21e34c545..787a5e75cd80 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -6181,6 +6181,19 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect); > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect); > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect); > > +/* > + * Quirk for Marvell CN96XX/CN10XXX boards: > + * > + * Adds PCI_REASSIGN_ALL_BUS to force bus number reassignment to > + * avoid conflicts caused by bootloader misconfigured PCI bridges. > + */ > +static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev) > +{ > + pci_add_flags(PCI_REASSIGN_ALL_BUS); > +} > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002, > + quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr); > + > #ifdef CONFIG_PCIEASPM > /* > * Several Intel DG2 graphics devices advertise that they can only tolerate > -- > 2.49.0 >
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 82b21e34c545..787a5e75cd80 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -6181,6 +6181,19 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect); +/* + * Quirk for Marvell CN96XX/CN10XXX boards: + * + * Adds PCI_REASSIGN_ALL_BUS to force bus number reassignment to + * avoid conflicts caused by bootloader misconfigured PCI bridges. + */ +static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev) +{ + pci_add_flags(PCI_REASSIGN_ALL_BUS); +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002, + quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr); + #ifdef CONFIG_PCIEASPM /* * Several Intel DG2 graphics devices advertise that they can only tolerate
On our Marvell OCTEON CN96XX board, we observed the following panic on the latest kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20 Hardware name: Marvell OcteonTX CN96XX board (DT) pc : of_pci_add_properties+0x278/0x4c8 Call trace: of_pci_add_properties+0x278/0x4c8 (P) of_pci_make_dev_node+0xe0/0x158 pci_bus_add_device+0x158/0x228 pci_bus_add_devices+0x40/0x98 pci_host_probe+0x94/0x118 pci_host_common_probe+0x130/0x1b0 platform_probe+0x70/0xf0 The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range: pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00 pci_bus 0002:00: root bus resource [bus 00-ff] pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0 pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0 pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0 pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0 pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0 pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0 pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0 pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0 pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0 pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0 pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0 pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0 pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0 pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0 pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0 pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0 This regression was introduced by commit 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags"). On our board, the 0002:00:07.0 bridge is misconfigured by the bootloader. Both its secondary and subordinate bus numbers are initialized to 0, while its fixed secondary bus number is set to 8. However, bus number 8 is also assigned to another bridge (0002:00:0f.0). Although this is a bootloader issue, before the change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set by default when PCI_PROBE_ONLY was not enabled, ensuing that all the bus number for these bridges were reassigned, avoiding any conflicts. After the change introduced in commit 7246a4520b4b, the bus numbers the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure 0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by bootloader. However, since a pci_bus has already been allocated for bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for 0002:00:07.0. This results in a pci bridge device without a pci_bus attached (pdev->subordinate == NULL). Consequently, accessing pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer dereference. To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when PCI_PROBE_ONLY is not enabled in order to work around issue like the one described above. Cc: stable@vger.kernel.org Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags") Signed-off-by: Bo Sun <Bo.Sun.CN@windriver.com> --- Changes in v3: - Make 'PCI_REASSIGN_ALL_BUS' unconditional, as suggested by Mani. - Update comment as requested by Mani. drivers/pci/quirks.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)