Message ID | 20200128221457.12467-1-linux@roeck-us.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 863844ee3bd38219c88e82966d1df36a77716f3e |
Delegated to: | Kalle Valo |
Headers | show |
Series | brcmfmac: abort and release host after error | expand |
Hi, On Tue, Jan 28, 2020 at 2:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > With commit 216b44000ada ("brcmfmac: Fix use after free in > brcmf_sdio_readframes()") applied, we see locking timeouts in > brcmf_sdio_watchdog_thread(). > > brcmfmac: brcmf_escan_timeout: timer expired > INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. > Not tainted 4.19.94-07984-g24ff99a0f713 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 > [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) > [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) > [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) > [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) > > In addition to restarting or exiting the loop, it is also necessary to > abort the command and to release the host. > > Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") > Cc: Dan Carpenter <dan.carpenter@oracle.com> > Cc: Matthias Kaehlcke <mka@chromium.org> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Douglas Anderson <dianders@chromium.org> > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > index f9df95bc7fa1..2e1c23c7269d 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) > if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, > BRCMF_SDIO_FT_NORMAL)) { > rd->len = 0; > + brcmf_sdio_rxfail(bus, true, true); > + sdio_release_host(bus->sdiodev->func1); I don't know much about this driver so I don't personally know if "true, true" is the correct thing to pass to brcmf_sdio_rxfail(), but it seems plausible. Definitely the fix to call sdio_release_host() is sane. Thus, unless someone knows for sure that brcmf_sdio_rxfail()'s parameters should be different: Reviewed-by: Douglas Anderson <dianders@chromium.org>
On Tue, Jan 28, 2020 at 03:14:45PM -0800, Doug Anderson wrote: > Hi, > > On Tue, Jan 28, 2020 at 2:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > With commit 216b44000ada ("brcmfmac: Fix use after free in > > brcmf_sdio_readframes()") applied, we see locking timeouts in > > brcmf_sdio_watchdog_thread(). > > > > brcmfmac: brcmf_escan_timeout: timer expired > > INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. > > Not tainted 4.19.94-07984-g24ff99a0f713 #1 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 > > [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) > > [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) > > [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) > > [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) > > > > In addition to restarting or exiting the loop, it is also necessary to > > abort the command and to release the host. > > > > Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > Cc: Matthias Kaehlcke <mka@chromium.org> > > Cc: Brian Norris <briannorris@chromium.org> > > Cc: Douglas Anderson <dianders@chromium.org> > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > > --- > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > index f9df95bc7fa1..2e1c23c7269d 100644 > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) > > if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, > > BRCMF_SDIO_FT_NORMAL)) { > > rd->len = 0; > > + brcmf_sdio_rxfail(bus, true, true); > > + sdio_release_host(bus->sdiodev->func1); > > I don't know much about this driver so I don't personally know if > "true, true" is the correct thing to pass to brcmf_sdio_rxfail(), but > it seems plausible. Definitely the fix to call sdio_release_host() is > sane. > > Thus, unless someone knows for sure that brcmf_sdio_rxfail()'s > parameters should be different: > Actually, looking at brcmf_sdio_hdparse() and its other callers, I think it may not be needed at all - other callers don't do it, and there already are some calls to brcmf_sdio_rxfail() in that function. It would be nice though to get a confirmation before I submit v2. Guenter
On Tue, Jan 28, 2020 at 4:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Tue, Jan 28, 2020 at 03:14:45PM -0800, Doug Anderson wrote: > > Hi, > > > > On Tue, Jan 28, 2020 at 2:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > With commit 216b44000ada ("brcmfmac: Fix use after free in > > > brcmf_sdio_readframes()") applied, we see locking timeouts in > > > brcmf_sdio_watchdog_thread(). > > > > > > brcmfmac: brcmf_escan_timeout: timer expired > > > INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. > > > Not tainted 4.19.94-07984-g24ff99a0f713 #1 > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 > > > [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) > > > [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) > > > [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) > > > [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) > > > > > > In addition to restarting or exiting the loop, it is also necessary to > > > abort the command and to release the host. > > > > > > Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") > > > Cc: Dan Carpenter <dan.carpenter@oracle.com> > > > Cc: Matthias Kaehlcke <mka@chromium.org> > > > Cc: Brian Norris <briannorris@chromium.org> > > > Cc: Douglas Anderson <dianders@chromium.org> > > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > > > --- > > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > index f9df95bc7fa1..2e1c23c7269d 100644 > > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) > > > if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, > > > BRCMF_SDIO_FT_NORMAL)) { > > > rd->len = 0; > > > + brcmf_sdio_rxfail(bus, true, true); > > > + sdio_release_host(bus->sdiodev->func1); > > > > I don't know much about this driver so I don't personally know if > > "true, true" is the correct thing to pass to brcmf_sdio_rxfail(), but > > it seems plausible. Definitely the fix to call sdio_release_host() is > > sane. > > > > Thus, unless someone knows for sure that brcmf_sdio_rxfail()'s > > parameters should be different: > > > Actually, looking at brcmf_sdio_hdparse() and its other callers, > I think it may not be needed at all - other callers don't do it, and > there already are some calls to brcmf_sdio_rxfail() in that function. > It would be nice though to get a confirmation before I submit v2. I think invoking rxfail with both abort and NACK set to true is the right thing to do here so that the pipeline can be properly purged. Thanks! Acked-by: franky.lin@broadcom.com
On Tue, Jan 28, 2020 at 02:14:57PM -0800, Guenter Roeck wrote: > With commit 216b44000ada ("brcmfmac: Fix use after free in > brcmf_sdio_readframes()") applied, we see locking timeouts in > brcmf_sdio_watchdog_thread(). > > brcmfmac: brcmf_escan_timeout: timer expired > INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. > Not tainted 4.19.94-07984-g24ff99a0f713 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 > [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) > [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) > [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) > [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) > > In addition to restarting or exiting the loop, it is also necessary to > abort the command and to release the host. > > Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") Huh... Thanks for fixing the bug. That seems to indicate that we were triggering the use after free but no one noticed at runtime. With kfree(), a use after free can be harmless if you don't have poisoning enabled and no other thread has re-used the memory. I'm not sure about kfree_skb() but presumably it's the same. Acked-by: Dan Carpenter <dan.carpenter@oracle.com> regards, dan carpenter
On 1/28/20 7:32 PM, Dan Carpenter wrote: > On Tue, Jan 28, 2020 at 02:14:57PM -0800, Guenter Roeck wrote: >> With commit 216b44000ada ("brcmfmac: Fix use after free in >> brcmf_sdio_readframes()") applied, we see locking timeouts in >> brcmf_sdio_watchdog_thread(). >> >> brcmfmac: brcmf_escan_timeout: timer expired >> INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. >> Not tainted 4.19.94-07984-g24ff99a0f713 #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 >> [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) >> [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) >> [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) >> [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) >> >> In addition to restarting or exiting the loop, it is also necessary to >> abort the command and to release the host. >> >> Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") > > Huh... Thanks for fixing the bug. That seems to indicate that we were > triggering the use after free but no one noticed at runtime. With Actually, we did see the problem. We just didn't realize it. > kfree(), a use after free can be harmless if you don't have poisoning > enabled and no other thread has re-used the memory. I'm not sure about > kfree_skb() but presumably it's the same. > Not really; it ultimately does result in a crash. We see that in ChromeOS R80 (and probably in all earlier releases, but I didn't check), which does not (yet) include 216b44000ada. The upcoming R81, which does include 216b44000ada, doesn't crash but there are lots of stalls like the one above. The combination of both (ie the difference in behavior) helped tracking down the problem. Guenter
Hi Franky, [I'm very unfamiliar with this driver, but I had the same questions as Guenter, I think:] On Tue, Jan 28, 2020 at 04:57:59PM -0800, Franky Lin wrote: > On Tue, Jan 28, 2020 at 4:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Tue, Jan 28, 2020 at 03:14:45PM -0800, Doug Anderson wrote: > > > On Tue, Jan 28, 2020 at 2:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > > @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) > > > > if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, > > > > BRCMF_SDIO_FT_NORMAL)) { > > > > rd->len = 0; > > > > + brcmf_sdio_rxfail(bus, true, true); > > > > + sdio_release_host(bus->sdiodev->func1); > > > > > > I don't know much about this driver so I don't personally know if > > > "true, true" is the correct thing to pass to brcmf_sdio_rxfail(), but > > > it seems plausible. Definitely the fix to call sdio_release_host() is > > > sane. > > > > > > Thus, unless someone knows for sure that brcmf_sdio_rxfail()'s > > > parameters should be different: > > > > > Actually, looking at brcmf_sdio_hdparse() and its other callers, > > I think it may not be needed at all - other callers don't do it, and > > there already are some calls to brcmf_sdio_rxfail() in that function. > > It would be nice though to get a confirmation before I submit v2. > > I think invoking rxfail with both abort and NACK set to true is the > right thing to do here so that the pipeline can be properly purged. Thanks for looking here. I'm not sure I totally understand your answer: brcmf_sdio_hdparse() already calls brcmf_sdio_rxfail() in several error cases. Is it really OK to call it twice in a row? Brian
On Wed, Jan 29, 2020 at 10:04 AM Brian Norris <briannorris@chromium.org> wrote: > > Hi Franky, > > [I'm very unfamiliar with this driver, but I had the same questions as > Guenter, I think:] > > On Tue, Jan 28, 2020 at 04:57:59PM -0800, Franky Lin wrote: > > On Tue, Jan 28, 2020 at 4:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > On Tue, Jan 28, 2020 at 03:14:45PM -0800, Doug Anderson wrote: > > > > On Tue, Jan 28, 2020 at 2:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c > > > > > @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) > > > > > if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, > > > > > BRCMF_SDIO_FT_NORMAL)) { > > > > > rd->len = 0; > > > > > + brcmf_sdio_rxfail(bus, true, true); > > > > > + sdio_release_host(bus->sdiodev->func1); > > > > > > > > I don't know much about this driver so I don't personally know if > > > > "true, true" is the correct thing to pass to brcmf_sdio_rxfail(), but > > > > it seems plausible. Definitely the fix to call sdio_release_host() is > > > > sane. > > > > > > > > Thus, unless someone knows for sure that brcmf_sdio_rxfail()'s > > > > parameters should be different: > > > > > > > Actually, looking at brcmf_sdio_hdparse() and its other callers, > > > I think it may not be needed at all - other callers don't do it, and > > > there already are some calls to brcmf_sdio_rxfail() in that function. > > > It would be nice though to get a confirmation before I submit v2. > > > > I think invoking rxfail with both abort and NACK set to true is the > > right thing to do here so that the pipeline can be properly purged. > > Thanks for looking here. I'm not sure I totally understand your answer: > brcmf_sdio_hdparse() already calls brcmf_sdio_rxfail() in several error > cases. Is it really OK to call it twice in a row? Yes. brcmf_sdio_rxglom does the same thing that calls brcmf_sdio_rxfail again in error handling. For this instance I think it's better using the same logic as the length mismatch block below ( calling brcmf_sdio_rxfail with true ture). Thanks, - Franky
Guenter Roeck <linux@roeck-us.net> wrote: > With commit 216b44000ada ("brcmfmac: Fix use after free in > brcmf_sdio_readframes()") applied, we see locking timeouts in > brcmf_sdio_watchdog_thread(). > > brcmfmac: brcmf_escan_timeout: timer expired > INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. > Not tainted 4.19.94-07984-g24ff99a0f713 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 > [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) > [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) > [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) > [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) > > In addition to restarting or exiting the loop, it is also necessary to > abort the command and to release the host. > > Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") > Cc: Dan Carpenter <dan.carpenter@oracle.com> > Cc: Matthias Kaehlcke <mka@chromium.org> > Cc: Brian Norris <briannorris@chromium.org> > Cc: Douglas Anderson <dianders@chromium.org> > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > Reviewed-by: Douglas Anderson <dianders@chromium.org> > Acked-by: franky.lin@broadcom.com > Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Patch applied to wireless-drivers-next.git, thanks. 863844ee3bd3 brcmfmac: abort and release host after error
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index f9df95bc7fa1..2e1c23c7269d 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -1938,6 +1938,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, BRCMF_SDIO_FT_NORMAL)) { rd->len = 0; + brcmf_sdio_rxfail(bus, true, true); + sdio_release_host(bus->sdiodev->func1); brcmu_pkt_buf_free_skb(pkt); continue; }
With commit 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") applied, we see locking timeouts in brcmf_sdio_watchdog_thread(). brcmfmac: brcmf_escan_timeout: timer expired INFO: task brcmf_wdog/mmc1:621 blocked for more than 120 seconds. Not tainted 4.19.94-07984-g24ff99a0f713 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. brcmf_wdog/mmc1 D 0 621 2 0x00000000 last_sleep: 2440793077. last_runnable: 2440766827 [<c0aa1e60>] (__schedule) from [<c0aa2100>] (schedule+0x98/0xc4) [<c0aa2100>] (schedule) from [<c0853830>] (__mmc_claim_host+0x154/0x274) [<c0853830>] (__mmc_claim_host) from [<bf10c5b8>] (brcmf_sdio_watchdog_thread+0x1b0/0x1f8 [brcmfmac]) [<bf10c5b8>] (brcmf_sdio_watchdog_thread [brcmfmac]) from [<c02570b8>] (kthread+0x178/0x180) In addition to restarting or exiting the loop, it is also necessary to abort the command and to release the host. Fixes: 216b44000ada ("brcmfmac: Fix use after free in brcmf_sdio_readframes()") Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Brian Norris <briannorris@chromium.org> Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 ++ 1 file changed, 2 insertions(+)