Message ID | 20210817063521.22450-1-a.fatoum@pengutronix.de (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | brcmfmac: pcie: fix oops on failure to resume and reprobe | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 17.08.21 08:35, Ahmad Fatoum wrote: > When resuming from suspend, brcmf_pcie_pm_leave_D3 will first attempt a > hot resume and then fall back to removing the PCI device and then > reprobing. If this probe fails, the kernel will oops, because brcmf_err, > which is called to report the failure will dereference the stale bus > pointer. Open code and use the default bus-less brcmf_err to avoid this. Should've included a Fixes tag: Fixes: 8602e62441ab ("brcmfmac: pass bus to the __brcmf_err() in pcie.c") Please let me know if I should resend with the tag added. Cheers, Ahmad > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > --- > To: Arend van Spriel <aspriel@gmail.com> > To: Franky Lin <franky.lin@broadcom.com> > To: Hante Meuleman <hante.meuleman@broadcom.com> > To: Chi-hsien Lin <chi-hsien.lin@infineon.com> > To: Wright Feng <wright.feng@infineon.com> > To: Chung-hsien Hsu <chung-hsien.hsu@infineon.com> > Cc: SHA-cyfmac-dev-list@infineon.com > Cc: brcm80211-dev-list.pdl@broadcom.com > Cc: netdev@vger.kernel.org > Cc: linux-wireless@vger.kernel.org > Cc: Kalle Valo <kvalo@codeaurora.org> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: linux-kernel@vger.kernel.org > --- > drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c > index 9ef94d7a7ca7..d824bea4b79d 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c > @@ -2209,7 +2209,7 @@ static int brcmf_pcie_pm_leave_D3(struct device *dev) > > err = brcmf_pcie_probe(pdev, NULL); > if (err) > - brcmf_err(bus, "probe after resume failed, err=%d\n", err); > + __brcmf_err(NULL, __func__, "probe after resume failed, err=%d\n", err); > > return err; > } >
On 17.08.21 13:02, Andy Shevchenko wrote: > On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > >> When resuming from suspend, brcmf_pcie_pm_leave_D3 will first attempt a >> hot resume and then fall back to removing the PCI device and then >> reprobing. If this probe fails, the kernel will oops, because brcmf_err, >> which is called to report the failure will dereference the stale bus >> pointer. Open code and use the default bus-less brcmf_err to avoid this. >> >> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> >> --- >> To: Arend van Spriel <aspriel@gmail.com> >> To: Franky Lin <franky.lin@broadcom.com> >> To: Hante Meuleman <hante.meuleman@broadcom.com> >> To: Chi-hsien Lin <chi-hsien.lin@infineon.com> >> To: Wright Feng <wright.feng@infineon.com> >> To: Chung-hsien Hsu <chung-hsien.hsu@infineon.com> >> Cc: SHA-cyfmac-dev-list@infineon.com >> Cc: brcm80211-dev-list.pdl@broadcom.com >> Cc: netdev@vger.kernel.org >> Cc: linux-wireless@vger.kernel.org >> Cc: Kalle Valo <kvalo@codeaurora.org> >> Cc: Jakub Kicinski <kuba@kernel.org> >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: linux-kernel@vger.kernel.org >> --- >> drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c >> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c >> index 9ef94d7a7ca7..d824bea4b79d 100644 >> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c >> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c >> @@ -2209,7 +2209,7 @@ static int brcmf_pcie_pm_leave_D3(struct device *dev) >> >> err = brcmf_pcie_probe(pdev, NULL); >> if (err) >> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); >> + __brcmf_err(NULL, __func__, "probe after resume failed, >> err=%d\n", > > > This is weird looking line now. Why can’t you simply use dev_err() / > netdev_err()? That's what brcmf_err normally expands to, but in this file the macro is overridden to add the extra first argument. The brcmf_ logging function write to brcmf trace buffers. This is not done with netdev_err/dev_err (and replacing the existing logging is out of scope for a regression fix anyway). Cheers, Ahmad > > >> >> return err; >> } >> -- >> 2.30.2 >> >> >
On Tue, Aug 17, 2021 at 2:11 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > On 17.08.21 13:02, Andy Shevchenko wrote: > > On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: ... > >> err = brcmf_pcie_probe(pdev, NULL); > >> if (err) > >> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); > >> + __brcmf_err(NULL, __func__, "probe after resume failed, > >> err=%d\n", > > > > > > This is weird looking line now. Why can’t you simply use dev_err() / > > netdev_err()? > > That's what brcmf_err normally expands to, but in this file the macro > is overridden to add the extra first argument. So, then the problem is in macro here. You need another portion of macro(s) that will use the dev pointer directly. When you have a valid device, use it. And here it seems the case. > The brcmf_ logging function write to brcmf trace buffers. This is not > done with netdev_err/dev_err (and replacing the existing logging > is out of scope for a regression fix anyway). I see.
On 17.08.21 13:54, Andy Shevchenko wrote: > On Tue, Aug 17, 2021 at 2:11 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: >> On 17.08.21 13:02, Andy Shevchenko wrote: >>> On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > > ... > >>>> err = brcmf_pcie_probe(pdev, NULL); >>>> if (err) >>>> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); >>>> + __brcmf_err(NULL, __func__, "probe after resume failed, >>>> err=%d\n", >>> >>> >>> This is weird looking line now. Why can’t you simply use dev_err() / >>> netdev_err()? >> >> That's what brcmf_err normally expands to, but in this file the macro >> is overridden to add the extra first argument. > > So, then the problem is in macro here. You need another portion of > macro(s) that will use the dev pointer directly. When you have a valid > device, use it. And here it seems the case. Ah, you mean using pdev instead of the stale bus. Ye, I could do that. Thanks for pointing out. > >> The brcmf_ logging function write to brcmf trace buffers. This is not >> done with netdev_err/dev_err (and replacing the existing logging >> is out of scope for a regression fix anyway). > > I see. >
On 17.08.21 14:03, Ahmad Fatoum wrote: > On 17.08.21 13:54, Andy Shevchenko wrote: >> On Tue, Aug 17, 2021 at 2:11 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: >>> On 17.08.21 13:02, Andy Shevchenko wrote: >>>> On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: >> >> ... >> >>>>> err = brcmf_pcie_probe(pdev, NULL); >>>>> if (err) >>>>> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); >>>>> + __brcmf_err(NULL, __func__, "probe after resume failed, >>>>> err=%d\n", >>>> >>>> >>>> This is weird looking line now. Why can’t you simply use dev_err() / >>>> netdev_err()? >>> >>> That's what brcmf_err normally expands to, but in this file the macro >>> is overridden to add the extra first argument. >> >> So, then the problem is in macro here. You need another portion of >> macro(s) that will use the dev pointer directly. When you have a valid >> device, use it. And here it seems the case. > > Ah, you mean using pdev instead of the stale bus. Ye, I could do that. > Thanks for pointing out. Ah, not so easy: __brcmf_err accepts a struct brcmf_bus * as first argument, but there is none I can pass along. As the whole file uses the brcm_ logging functions, I'd just leave this one without a device. > >> >>> The brcmf_ logging function write to brcmf trace buffers. This is not >>> done with netdev_err/dev_err (and replacing the existing logging >>> is out of scope for a regression fix anyway). >> >> I see. >> > >
On Tue, Aug 17, 2021 at 3:07 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > On 17.08.21 14:03, Ahmad Fatoum wrote: > > On 17.08.21 13:54, Andy Shevchenko wrote: > >> On Tue, Aug 17, 2021 at 2:11 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > >>> On 17.08.21 13:02, Andy Shevchenko wrote: > >>>> On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: ... > >>>>> err = brcmf_pcie_probe(pdev, NULL); > >>>>> if (err) > >>>>> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); > >>>>> + __brcmf_err(NULL, __func__, "probe after resume failed, > >>>>> err=%d\n", > >>>> > >>>> > >>>> This is weird looking line now. Why can’t you simply use dev_err() / > >>>> netdev_err()? > >>> > >>> That's what brcmf_err normally expands to, but in this file the macro > >>> is overridden to add the extra first argument. > >> > >> So, then the problem is in macro here. You need another portion of > >> macro(s) that will use the dev pointer directly. When you have a valid > >> device, use it. And here it seems the case. > > > > Ah, you mean using pdev instead of the stale bus. Ye, I could do that. > > Thanks for pointing out. > > Ah, not so easy: __brcmf_err accepts a struct brcmf_bus * as first argument, > but there is none I can pass along. As the whole file uses the brcm_ > logging functions, I'd just leave this one without a device. And what exactly prevents you to split that to something like __brcm_dev_err() // as current __brcm_err with dev argument { ... } __brsm_err(bus, ...) __brcm_dev_err(bus->dev, ...) ?
On 17.08.21 15:06, Andy Shevchenko wrote: > On Tue, Aug 17, 2021 at 3:07 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: >> On 17.08.21 14:03, Ahmad Fatoum wrote: >>> On 17.08.21 13:54, Andy Shevchenko wrote: >>>> On Tue, Aug 17, 2021 at 2:11 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: >>>>> On 17.08.21 13:02, Andy Shevchenko wrote: >>>>>> On Tuesday, August 17, 2021, Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > > ... > >>>>>>> err = brcmf_pcie_probe(pdev, NULL); >>>>>>> if (err) >>>>>>> - brcmf_err(bus, "probe after resume failed, err=%d\n", err); >>>>>>> + __brcmf_err(NULL, __func__, "probe after resume failed, >>>>>>> err=%d\n", >>>>>> >>>>>> >>>>>> This is weird looking line now. Why can’t you simply use dev_err() / >>>>>> netdev_err()? >>>>> >>>>> That's what brcmf_err normally expands to, but in this file the macro >>>>> is overridden to add the extra first argument. >>>> >>>> So, then the problem is in macro here. You need another portion of >>>> macro(s) that will use the dev pointer directly. When you have a valid >>>> device, use it. And here it seems the case. >>> >>> Ah, you mean using pdev instead of the stale bus. Ye, I could do that. >>> Thanks for pointing out. >> >> Ah, not so easy: __brcmf_err accepts a struct brcmf_bus * as first argument, >> but there is none I can pass along. As the whole file uses the brcm_ >> logging functions, I'd just leave this one without a device. > > And what exactly prevents you to split that to something like > > __brcm_dev_err() // as current __brcm_err with dev argument > { > ... > } > > __brsm_err(bus, ...) __brcm_dev_err(bus->dev, ...) > > ? I like my regression fixes to be short and to the point. Cheers, Ahmad
Ahmad Fatoum <a.fatoum@pengutronix.de> wrote: > When resuming from suspend, brcmf_pcie_pm_leave_D3 will first attempt a > hot resume and then fall back to removing the PCI device and then > reprobing. If this probe fails, the kernel will oops, because brcmf_err, > which is called to report the failure will dereference the stale bus > pointer. Open code and use the default bus-less brcmf_err to avoid this. > > Fixes: 8602e62441ab ("brcmfmac: pass bus to the __brcmf_err() in pcie.c") > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Patch applied to wireless-drivers-next.git, thanks. d745ca4f2c4a brcmfmac: pcie: fix oops on failure to resume and reprobe
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c index 9ef94d7a7ca7..d824bea4b79d 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c @@ -2209,7 +2209,7 @@ static int brcmf_pcie_pm_leave_D3(struct device *dev) err = brcmf_pcie_probe(pdev, NULL); if (err) - brcmf_err(bus, "probe after resume failed, err=%d\n", err); + __brcmf_err(NULL, __func__, "probe after resume failed, err=%d\n", err); return err; }
When resuming from suspend, brcmf_pcie_pm_leave_D3 will first attempt a hot resume and then fall back to removing the PCI device and then reprobing. If this probe fails, the kernel will oops, because brcmf_err, which is called to report the failure will dereference the stale bus pointer. Open code and use the default bus-less brcmf_err to avoid this. Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> --- To: Arend van Spriel <aspriel@gmail.com> To: Franky Lin <franky.lin@broadcom.com> To: Hante Meuleman <hante.meuleman@broadcom.com> To: Chi-hsien Lin <chi-hsien.lin@infineon.com> To: Wright Feng <wright.feng@infineon.com> To: Chung-hsien Hsu <chung-hsien.hsu@infineon.com> Cc: SHA-cyfmac-dev-list@infineon.com Cc: brcm80211-dev-list.pdl@broadcom.com Cc: netdev@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-kernel@vger.kernel.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)