Message ID | ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de (mailing list archive) |
---|---|
State | Accepted |
Commit | c0d0b726f923c6b8c239e8d936878d39468b62a4 |
Headers | show |
Series | [for-linus,v2] PCI/bwctrl: Fix NULL pointer deref on unbind and bind | expand |
On Mon, Jan 06, 2025 at 12:26:35PM +0100, Lukas Wunner wrote: > The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), > dereferences a "data" pointer. > > On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However > the interrupt handler may still be invoked afterwards and will dereference > that NULL pointer. > > That's because the interrupt is requested using a devm_*() helper and the > driver core releases devm_*() resources *after* calling ->remove(). > > pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt > Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link > Control Register, but that won't prevent execution of pcie_bwnotif_irq(): > The interrupt for bandwidth notifications may be shared with AER, DPC, > PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the > interrupt is requested. > > There's a similar race on bind: pcie_bwnotif_probe() requests the > interrupt when the "data" pointer still points to NULL. A NULL pointer > deref may thus likewise occur if AER, DPC, PME or hotplug raise an > interrupt in-between the bandwidth controller's call to devm_request_irq() > and assignment of the "data" pointer. > > Drop the devm_*() usage and reorder requesting of the interrupt to fix the > issue. > > While at it, drop a stray but harmless no_free_ptr() invocation when > assigning the "data" pointer in pcie_bwnotif_probe(). > > Ilpo points out that the locking on unbind and bind needs to be symmetric, > so move the call to pcie_bwnotif_disable() inside the critical section > protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. > > Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. > The issue is no longer reproducible with the present commit. > > Evert found that attaching a USB-C monitor prevented the hang. The > machine contains an ASMedia USB 3.2 controller below a hotplug-capable > Root Port. So one possible explanation is that the controller gets > hot-removed on shutdown unless something is connected. And the ensuing > hotplug interrupt occurs exactly when the bandwidth controller is > unregistering. The precise cause could not be determined because the > screen had already turned black when the hang occurred. > > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Reported-by: Evert Vorster <evorster@gmail.com> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Tested-by: Evert Vorster <evorster@gmail.com> Replaced the patch on pci/for-linus with this one, headed for v6.13, thanks! > --- > Changes v1 -> v2 (in response to Ilpo's review): > Move request_irq() inside critical section on bind. > Move free_irq() + pcie_bwnotif_disable() inside critical section on unbind. > Amend commit message with a paragraph explaining these changes. > > drivers/pci/pcie/bwctrl.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > index b59cacc740fa..0a5e7efbce2c 100644 > --- a/drivers/pci/pcie/bwctrl.c > +++ b/drivers/pci/pcie/bwctrl.c > @@ -303,14 +303,17 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > if (ret) > return ret; > > - ret = devm_request_irq(&srv->device, srv->irq, pcie_bwnotif_irq, > - IRQF_SHARED, "PCIe bwctrl", srv); > - if (ret) > - return ret; > - > scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > - port->link_bwctrl = no_free_ptr(data); > + port->link_bwctrl = data; > + > + ret = request_irq(srv->irq, pcie_bwnotif_irq, > + IRQF_SHARED, "PCIe bwctrl", srv); > + if (ret) { > + port->link_bwctrl = NULL; > + return ret; > + } > + > pcie_bwnotif_enable(srv); > } > } > @@ -331,11 +334,15 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) > > pcie_cooling_device_unregister(data->cdev); > > - pcie_bwnotif_disable(srv->port); > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > + pcie_bwnotif_disable(srv->port); > + > + free_irq(srv->irq, srv); > > - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > - scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > srv->port->link_bwctrl = NULL; > + } > + } > } > > static int pcie_bwnotif_suspend(struct pcie_device *srv) > -- > 2.43.0 >
On Mon, 6 Jan 2025, Lukas Wunner wrote: > The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), > dereferences a "data" pointer. > > On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However > the interrupt handler may still be invoked afterwards and will dereference > that NULL pointer. > > That's because the interrupt is requested using a devm_*() helper and the > driver core releases devm_*() resources *after* calling ->remove(). > > pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt > Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link > Control Register, but that won't prevent execution of pcie_bwnotif_irq(): > The interrupt for bandwidth notifications may be shared with AER, DPC, > PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the > interrupt is requested. > > There's a similar race on bind: pcie_bwnotif_probe() requests the > interrupt when the "data" pointer still points to NULL. A NULL pointer > deref may thus likewise occur if AER, DPC, PME or hotplug raise an > interrupt in-between the bandwidth controller's call to devm_request_irq() > and assignment of the "data" pointer. > > Drop the devm_*() usage and reorder requesting of the interrupt to fix the > issue. > > While at it, drop a stray but harmless no_free_ptr() invocation when > assigning the "data" pointer in pcie_bwnotif_probe(). > > Ilpo points out that the locking on unbind and bind needs to be symmetric, > so move the call to pcie_bwnotif_disable() inside the critical section > protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. > > Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. > The issue is no longer reproducible with the present commit. > > Evert found that attaching a USB-C monitor prevented the hang. The > machine contains an ASMedia USB 3.2 controller below a hotplug-capable > Root Port. So one possible explanation is that the controller gets > hot-removed on shutdown unless something is connected. And the ensuing > hotplug interrupt occurs exactly when the bandwidth controller is > unregistering. The precise cause could not be determined because the > screen had already turned black when the hang occurred. > > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Reported-by: Evert Vorster <evorster@gmail.com> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Tested-by: Evert Vorster <evorster@gmail.com> > --- > Changes v1 -> v2 (in response to Ilpo's review): > Move request_irq() inside critical section on bind. > Move free_irq() + pcie_bwnotif_disable() inside critical section on unbind. > Amend commit message with a paragraph explaining these changes. > > drivers/pci/pcie/bwctrl.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > index b59cacc740fa..0a5e7efbce2c 100644 > --- a/drivers/pci/pcie/bwctrl.c > +++ b/drivers/pci/pcie/bwctrl.c > @@ -303,14 +303,17 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > if (ret) > return ret; > > - ret = devm_request_irq(&srv->device, srv->irq, pcie_bwnotif_irq, > - IRQF_SHARED, "PCIe bwctrl", srv); > - if (ret) > - return ret; > - > scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > - port->link_bwctrl = no_free_ptr(data); > + port->link_bwctrl = data; > + > + ret = request_irq(srv->irq, pcie_bwnotif_irq, > + IRQF_SHARED, "PCIe bwctrl", srv); > + if (ret) { > + port->link_bwctrl = NULL; > + return ret; > + } > + > pcie_bwnotif_enable(srv); > } > } > @@ -331,11 +334,15 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) > > pcie_cooling_device_unregister(data->cdev); > > - pcie_bwnotif_disable(srv->port); > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > + pcie_bwnotif_disable(srv->port); > + > + free_irq(srv->irq, srv); > > - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > - scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > srv->port->link_bwctrl = NULL; > + } > + } > } > > static int pcie_bwnotif_suspend(struct pcie_device *srv) Thanks for the update, Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> I suppose Niklas has not had time to test if this solved the other issue?
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c index b59cacc740fa..0a5e7efbce2c 100644 --- a/drivers/pci/pcie/bwctrl.c +++ b/drivers/pci/pcie/bwctrl.c @@ -303,14 +303,17 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) if (ret) return ret; - ret = devm_request_irq(&srv->device, srv->irq, pcie_bwnotif_irq, - IRQF_SHARED, "PCIe bwctrl", srv); - if (ret) - return ret; - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { - port->link_bwctrl = no_free_ptr(data); + port->link_bwctrl = data; + + ret = request_irq(srv->irq, pcie_bwnotif_irq, + IRQF_SHARED, "PCIe bwctrl", srv); + if (ret) { + port->link_bwctrl = NULL; + return ret; + } + pcie_bwnotif_enable(srv); } } @@ -331,11 +334,15 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) pcie_cooling_device_unregister(data->cdev); - pcie_bwnotif_disable(srv->port); + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { + pcie_bwnotif_disable(srv->port); + + free_irq(srv->irq, srv); - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) - scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) srv->port->link_bwctrl = NULL; + } + } } static int pcie_bwnotif_suspend(struct pcie_device *srv)