diff mbox series

[v5] brcmfmac: prevent double-free on hardware-reset

Message ID id1HN6qCMAirApBzTA6fT7ZFWBBGCJhULpflxQ7NT6cgCboVnn3RHpiOFjA9SbRqzBRFLk9ES0C4FNvO6fUQsNg7pqF6ZSNAYUo99nHy8PY=@dannyvanheumen.nl (mailing list archive)
State Accepted
Commit cb774bd35318c1b4cb61f6f2caac85537d07fbde
Delegated to: Kalle Valo
Headers show
Series [v5] brcmfmac: prevent double-free on hardware-reset | expand

Commit Message

Danny van Heumen July 11, 2022, 11:21 p.m. UTC
In case of buggy firmware, brcmfmac may perform a hardware reset. If during
reset and subsequent probing an early failure occurs, a memory region is
accidentally double-freed. With hardened memory allocation enabled, this error
will be detected.

- return early where appropriate to skip unnecessary clean-up.
- set '.freezer' pointer to NULL to prevent double-freeing under possible
  other circumstances and to re-align result under various different
  behaviors of memory allocation freeing.
- correctly claim host on func1 for disabling func2.
- after reset, do not initiate probing immediately, but rely on events.

Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
early, which includes calling 'brcmf_sdiod_remove'. In both cases
'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
has not yet been re-allocated the second time.

Stacktrace of (failing) hardware reset after firmware-crash:

Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
 ret_from_fork+0x10/0x20
 kthread+0x154/0x160
 worker_thread+0x188/0x504
 process_one_work+0x1f4/0x490
 brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
 brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
 brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
 brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
 kfree+0x210/0x220
 __slab_free+0x58/0x40c
Call trace:
x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
sp : ffff80000a22bbf0
lr : kfree+0x210/0x220
pc : __slab_free+0x58/0x40c
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Workqueue: events brcmf_core_bus_reset [brcmfmac]
Hardware name: Pine64 Pinebook Pro (DT)
CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
 nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
Internal error: Oops - BUG: 0 [#1] SMP
kernel BUG at mm/slub.c:379!

Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
---
 .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c   | 13 +++++--------
 .../net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 10 +---------
 2 files changed, 6 insertions(+), 17 deletions(-)

--
2.34.1

Comments

Arend Van Spriel July 12, 2022, 8:08 a.m. UTC | #1
On 7/12/2022 1:21 AM, Danny van Heumen wrote:
> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> reset and subsequent probing an early failure occurs, a memory region is
> accidentally double-freed. With hardened memory allocation enabled, this error
> will be detected.
> 
> - return early where appropriate to skip unnecessary clean-up.
> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>    other circumstances and to re-align result under various different
>    behaviors of memory allocation freeing.
> - correctly claim host on func1 for disabling func2.
> - after reset, do not initiate probing immediately, but rely on events.
> 
> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> has not yet been re-allocated the second time.
> 
> Stacktrace of (failing) hardware reset after firmware-crash:
> 
> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>   ret_from_fork+0x10/0x20
>   kthread+0x154/0x160
>   worker_thread+0x188/0x504
>   process_one_work+0x1f4/0x490
>   brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>   brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>   brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>   brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>   kfree+0x210/0x220
>   __slab_free+0x58/0x40c
> Call trace:
> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
> sp : ffff80000a22bbf0
> lr : kfree+0x210/0x220
> pc : __slab_free+0x58/0x40c
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> Workqueue: events brcmf_core_bus_reset [brcmfmac]
> Hardware name: Pine64 Pinebook Pro (DT)
> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>   nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
> Internal error: Oops - BUG: 0 [#1] SMP
> kernel BUG at mm/slub.c:379!

Reviewed-by: Arend van Spriel <aspriel.gmail.com>
> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
> ---
>   .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c   | 13 +++++--------
>   .../net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 10 +---------
>   2 files changed, 6 insertions(+), 17 deletions(-)
Ulf Hansson July 14, 2022, 10:04 a.m. UTC | #2
On Tue, 12 Jul 2022 at 01:22, Danny van Heumen <danny@dannyvanheumen.nl> wrote:
>
> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> reset and subsequent probing an early failure occurs, a memory region is
> accidentally double-freed. With hardened memory allocation enabled, this error
> will be detected.
>
> - return early where appropriate to skip unnecessary clean-up.
> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>   other circumstances and to re-align result under various different
>   behaviors of memory allocation freeing.
> - correctly claim host on func1 for disabling func2.
> - after reset, do not initiate probing immediately, but rely on events.
>
> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> has not yet been re-allocated the second time.
>
> Stacktrace of (failing) hardware reset after firmware-crash:
>
> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>  ret_from_fork+0x10/0x20
>  kthread+0x154/0x160
>  worker_thread+0x188/0x504
>  process_one_work+0x1f4/0x490
>  brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>  brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>  brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>  brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>  kfree+0x210/0x220
>  __slab_free+0x58/0x40c
> Call trace:
> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
> sp : ffff80000a22bbf0
> lr : kfree+0x210/0x220
> pc : __slab_free+0x58/0x40c
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> Workqueue: events brcmf_core_bus_reset [brcmfmac]
> Hardware name: Pine64 Pinebook Pro (DT)
> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>  nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
> Internal error: Oops - BUG: 0 [#1] SMP
> kernel BUG at mm/slub.c:379!
>
> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>

I have to admit that, to me, it looks a bit weird to have one driver
instance managing two different SDIO func devices. On the other hand,
I don't know the HW so there might be good reasons for why.

In any case, I want to point out a commit [1] for the mwifiex driver,
which could serve as a good inspiration of how to make use of the
mmc_hw_reset(). For example, one may look at the return code from
mmc_hw_reset() to understand whether the reset will be done
synchronous or asynchronous (via device re-registration).

That said, I think the $subject patch looks reasonable to me. So feel
free to add:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe

[1]
cdb2256f795e ("mwifiex: Re-work support for SDIO HW reset")

> ---
>  .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c   | 13 +++++--------
>  .../net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 10 +---------
>  2 files changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> index ac02244a6fdf..414ee21f42e3 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> @@ -802,6 +802,7 @@ static void brcmf_sdiod_freezer_detach(struct brcmf_sdio_dev *sdiodev)
>         if (sdiodev->freezer) {
>                 WARN_ON(atomic_read(&sdiodev->freezer->freezing));
>                 kfree(sdiodev->freezer);
> +               sdiodev->freezer = NULL;
>         }
>  }
>
> @@ -875,13 +876,9 @@ int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev)
>
>         brcmf_sdiod_freezer_detach(sdiodev);
>
> -       /* Disable Function 2 */
> -       sdio_claim_host(sdiodev->func2);
> -       sdio_disable_func(sdiodev->func2);
> -       sdio_release_host(sdiodev->func2);
> -
> -       /* Disable Function 1 */
> +       /* Disable functions 2 then 1. */
>         sdio_claim_host(sdiodev->func1);
> +       sdio_disable_func(sdiodev->func2);
>         sdio_disable_func(sdiodev->func1);
>         sdio_release_host(sdiodev->func1);
>
> @@ -911,7 +908,7 @@ int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
>         if (ret) {
>                 brcmf_err("Failed to set F1 blocksize\n");
>                 sdio_release_host(sdiodev->func1);
> -               goto out;
> +               return ret;
>         }
>         switch (sdiodev->func2->device) {
>         case SDIO_DEVICE_ID_BROADCOM_CYPRESS_4373:
> @@ -933,7 +930,7 @@ int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
>         if (ret) {
>                 brcmf_err("Failed to set F2 blocksize\n");
>                 sdio_release_host(sdiodev->func1);
> -               goto out;
> +               return ret;
>         } else {
>                 brcmf_dbg(SDIO, "set F2 blocksize to %d\n", f2_blksz);
>         }
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index 212fbbe1cd7e..2ed70f809097 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -4152,7 +4152,6 @@ int brcmf_sdio_get_fwname(struct device *dev, const char *ext, u8 *fw_name)
>
>  static int brcmf_sdio_bus_reset(struct device *dev)
>  {
> -       int ret = 0;
>         struct brcmf_bus *bus_if = dev_get_drvdata(dev);
>         struct brcmf_sdio_dev *sdiodev = bus_if->bus_priv.sdio;
>
> @@ -4169,14 +4168,7 @@ static int brcmf_sdio_bus_reset(struct device *dev)
>         sdio_release_host(sdiodev->func1);
>
>         brcmf_bus_change_state(sdiodev->bus_if, BRCMF_BUS_DOWN);
> -
> -       ret = brcmf_sdiod_probe(sdiodev);
> -       if (ret) {
> -               brcmf_err("Failed to probe after sdio device reset: ret %d\n",
> -                         ret);
> -       }
> -
> -       return ret;
> +       return 0;
>  }
>
>  static const struct brcmf_bus_ops brcmf_sdio_bus_ops = {
> --
> 2.34.1
>
Arend Van Spriel July 14, 2022, 12:49 p.m. UTC | #3
On 7/14/2022 12:04 PM, Ulf Hansson wrote:
> On Tue, 12 Jul 2022 at 01:22, Danny van Heumen <danny@dannyvanheumen.nl> wrote:
>>
>> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
>> reset and subsequent probing an early failure occurs, a memory region is
>> accidentally double-freed. With hardened memory allocation enabled, this error
>> will be detected.
>>
>> - return early where appropriate to skip unnecessary clean-up.
>> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>>    other circumstances and to re-align result under various different
>>    behaviors of memory allocation freeing.
>> - correctly claim host on func1 for disabling func2.
>> - after reset, do not initiate probing immediately, but rely on events.
>>
>> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
>> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
>> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
>> early, which includes calling 'brcmf_sdiod_remove'. In both cases
>> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
>> has not yet been re-allocated the second time.
>>
>> Stacktrace of (failing) hardware reset after firmware-crash:
>>
>> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>>   ret_from_fork+0x10/0x20
>>   kthread+0x154/0x160
>>   worker_thread+0x188/0x504
>>   process_one_work+0x1f4/0x490
>>   brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>>   brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>>   brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>>   brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>>   kfree+0x210/0x220
>>   __slab_free+0x58/0x40c
>> Call trace:
>> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
>> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
>> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
>> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
>> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
>> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
>> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
>> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
>> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
>> sp : ffff80000a22bbf0
>> lr : kfree+0x210/0x220
>> pc : __slab_free+0x58/0x40c
>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> Workqueue: events brcmf_core_bus_reset [brcmfmac]
>> Hardware name: Pine64 Pinebook Pro (DT)
>> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>>   nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
>> Internal error: Oops - BUG: 0 [#1] SMP
>> kernel BUG at mm/slub.c:379!
>>
>> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
> 
> I have to admit that, to me, it looks a bit weird to have one driver
> instance managing two different SDIO func devices. On the other hand,
> I don't know the HW so there might be good reasons for why.
> 
> In any case, I want to point out a commit [1] for the mwifiex driver,
> which could serve as a good inspiration of how to make use of the
> mmc_hw_reset(). For example, one may look at the return code from
> mmc_hw_reset() to understand whether the reset will be done
> synchronous or asynchronous (via device re-registration).

Thanks, Uffe

Could the API be extended so the caller can request the reset to be 
asynchronous.

Regards,
Arend
Ulf Hansson July 14, 2022, 2:39 p.m. UTC | #4
On Thu, 14 Jul 2022 at 14:49, Arend Van Spriel <aspriel@gmail.com> wrote:
>
> On 7/14/2022 12:04 PM, Ulf Hansson wrote:
> > On Tue, 12 Jul 2022 at 01:22, Danny van Heumen <danny@dannyvanheumen.nl> wrote:
> >>
> >> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> >> reset and subsequent probing an early failure occurs, a memory region is
> >> accidentally double-freed. With hardened memory allocation enabled, this error
> >> will be detected.
> >>
> >> - return early where appropriate to skip unnecessary clean-up.
> >> - set '.freezer' pointer to NULL to prevent double-freeing under possible
> >>    other circumstances and to re-align result under various different
> >>    behaviors of memory allocation freeing.
> >> - correctly claim host on func1 for disabling func2.
> >> - after reset, do not initiate probing immediately, but rely on events.
> >>
> >> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> >> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> >> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> >> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> >> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> >> has not yet been re-allocated the second time.
> >>
> >> Stacktrace of (failing) hardware reset after firmware-crash:
> >>
> >> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
> >>   ret_from_fork+0x10/0x20
> >>   kthread+0x154/0x160
> >>   worker_thread+0x188/0x504
> >>   process_one_work+0x1f4/0x490
> >>   brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
> >>   brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
> >>   brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
> >>   brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
> >>   kfree+0x210/0x220
> >>   __slab_free+0x58/0x40c
> >> Call trace:
> >> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
> >> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
> >> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
> >> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
> >> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
> >> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
> >> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
> >> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
> >> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
> >> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
> >> sp : ffff80000a22bbf0
> >> lr : kfree+0x210/0x220
> >> pc : __slab_free+0x58/0x40c
> >> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> Workqueue: events brcmf_core_bus_reset [brcmfmac]
> >> Hardware name: Pine64 Pinebook Pro (DT)
> >> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
> >>   nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
> >> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
> >> Internal error: Oops - BUG: 0 [#1] SMP
> >> kernel BUG at mm/slub.c:379!
> >>
> >> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
> >
> > I have to admit that, to me, it looks a bit weird to have one driver
> > instance managing two different SDIO func devices. On the other hand,
> > I don't know the HW so there might be good reasons for why.
> >
> > In any case, I want to point out a commit [1] for the mwifiex driver,
> > which could serve as a good inspiration of how to make use of the
> > mmc_hw_reset(). For example, one may look at the return code from
> > mmc_hw_reset() to understand whether the reset will be done
> > synchronous or asynchronous (via device re-registration).
>
> Thanks, Uffe
>
> Could the API be extended so the caller can request the reset to be
> asynchronous.

Yes, that should be fine. The current behaviour is that it will be
asynchronous if there are more than one SDIO func device bound to a
driver.

Kind regards
Uffe
Arend Van Spriel July 15, 2022, 1:19 p.m. UTC | #5
On 7/14/2022 4:39 PM, Ulf Hansson wrote:
> On Thu, 14 Jul 2022 at 14:49, Arend Van Spriel <aspriel@gmail.com> wrote:
>>
>> On 7/14/2022 12:04 PM, Ulf Hansson wrote:
>>> On Tue, 12 Jul 2022 at 01:22, Danny van Heumen <danny@dannyvanheumen.nl> wrote:
>>>>
>>>> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
>>>> reset and subsequent probing an early failure occurs, a memory region is
>>>> accidentally double-freed. With hardened memory allocation enabled, this error
>>>> will be detected.
>>>>
>>>> - return early where appropriate to skip unnecessary clean-up.
>>>> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>>>>     other circumstances and to re-align result under various different
>>>>     behaviors of memory allocation freeing.
>>>> - correctly claim host on func1 for disabling func2.
>>>> - after reset, do not initiate probing immediately, but rely on events.
>>>>
>>>> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
>>>> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
>>>> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
>>>> early, which includes calling 'brcmf_sdiod_remove'. In both cases
>>>> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
>>>> has not yet been re-allocated the second time.
>>>>
>>>> Stacktrace of (failing) hardware reset after firmware-crash:
>>>>
>>>> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>>>>    ret_from_fork+0x10/0x20
>>>>    kthread+0x154/0x160
>>>>    worker_thread+0x188/0x504
>>>>    process_one_work+0x1f4/0x490
>>>>    brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>>>>    brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>>>>    brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>>>>    brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>>>>    kfree+0x210/0x220
>>>>    __slab_free+0x58/0x40c
>>>> Call trace:
>>>> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
>>>> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
>>>> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
>>>> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
>>>> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
>>>> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
>>>> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
>>>> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
>>>> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
>>>> sp : ffff80000a22bbf0
>>>> lr : kfree+0x210/0x220
>>>> pc : __slab_free+0x58/0x40c
>>>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> Workqueue: events brcmf_core_bus_reset [brcmfmac]
>>>> Hardware name: Pine64 Pinebook Pro (DT)
>>>> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>>>>    nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
>>>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
>>>> Internal error: Oops - BUG: 0 [#1] SMP
>>>> kernel BUG at mm/slub.c:379!
>>>>
>>>> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
>>>
>>> I have to admit that, to me, it looks a bit weird to have one driver
>>> instance managing two different SDIO func devices. On the other hand,
>>> I don't know the HW so there might be good reasons for why.
>>>
>>> In any case, I want to point out a commit [1] for the mwifiex driver,
>>> which could serve as a good inspiration of how to make use of the
>>> mmc_hw_reset(). For example, one may look at the return code from
>>> mmc_hw_reset() to understand whether the reset will be done
>>> synchronous or asynchronous (via device re-registration).
>>
>> Thanks, Uffe
>>
>> Could the API be extended so the caller can request the reset to be
>> asynchronous.
> 
> Yes, that should be fine. The current behaviour is that it will be
> asynchronous if there are more than one SDIO func device bound to a
> driver.

I see. So for brcmfmac we fall into that category. Thanks for the info.

Regards,
Arend
Danny van Heumen July 18, 2022, 5:17 p.m. UTC | #6
------- Original Message -------
On Thursday, July 14th, 2022 at 12:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:

> [..]
>
> That said, I think the $subject patch looks reasonable to me. So feel
> free to add:
>
> Reviewed-by: Ulf Hansson ulf.hansson@linaro.org

I am a first-time contributor. Is this your way of saying that I should submit
the patch somewhere other than 'linux-wireless@...'? I suppose this fix is not
urgent enough for 'stable@...', or is it?

I would appreciate information on who will and/or how to follow-up.

Regards,
Danny
Arend Van Spriel July 18, 2022, 8:29 p.m. UTC | #7
On 7/18/2022 7:17 PM, Danny van Heumen wrote:
> ------- Original Message -------
> On Thursday, July 14th, 2022 at 12:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> 
>> [..]
>>
>> That said, I think the $subject patch looks reasonable to me. So feel
>> free to add:
>>
>> Reviewed-by: Ulf Hansson ulf.hansson@linaro.org
> 
> I am a first-time contributor. Is this your way of saying that I should submit
> the patch somewhere other than 'linux-wireless@...'? I suppose this fix is not
> urgent enough for 'stable@...', or is it?
> 
> I would appreciate information on who will and/or how to follow-up.

Hi Danny,

The above means that Uffe is okay with the patch being included. Kalle 
Valo is the linux-wireless maintainer and he will apply the patch to the 
linux-wireless repo. The status of your patch can be monitored through 
patchwork [1].

By the way, you can add the Fixes: tag referring to the commit that 
introduced the issue. Not sure which one should be considered here, but 
it is either this one ...

Fixes: 9982464379e8 ("brcmfmac: make sdio suspend wait for threads to 
freeze")

... or ...

Fixes: 7836102a750a ("brcmfmac: reset SDIO bus on a firmware crash")

How to create the tag is described here [2]. With git configured 
properly you can simply do:

$ git log --pretty=fixes -1 <commit_sha1>

Stable patches have their additional rules [3]. The bus reset was added 
in kernel v5.9. Hope this helps.

Regards,
Arend

[1] 
https://patchwork.kernel.org/project/linux-wireless/patch/id1HN6qCMAirApBzTA6fT7ZFWBBGCJhULpflxQ7NT6cgCboVnn3RHpiOFjA9SbRqzBRFLk9ES0C4FNvO6fUQsNg7pqF6ZSNAYUo99nHy8PY=@dannyvanheumen.nl/
[2] 
https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html#describe-your-changes
[3] 
https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#procedure-for-submitting-patches-to-the-stable-tree
Kalle Valo July 28, 2022, 9:59 a.m. UTC | #8
Danny van Heumen <danny@dannyvanheumen.nl> wrote:

> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> reset and subsequent probing an early failure occurs, a memory region is
> accidentally double-freed. With hardened memory allocation enabled, this error
> will be detected.
> 
> - return early where appropriate to skip unnecessary clean-up.
> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>   other circumstances and to re-align result under various different
>   behaviors of memory allocation freeing.
> - correctly claim host on func1 for disabling func2.
> - after reset, do not initiate probing immediately, but rely on events.
> 
> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> has not yet been re-allocated the second time.
> 
> Stacktrace of (failing) hardware reset after firmware-crash:
> 
> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>  ret_from_fork+0x10/0x20
>  kthread+0x154/0x160
>  worker_thread+0x188/0x504
>  process_one_work+0x1f4/0x490
>  brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>  brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>  brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>  brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>  kfree+0x210/0x220
>  __slab_free+0x58/0x40c
> Call trace:
> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
> sp : ffff80000a22bbf0
> lr : kfree+0x210/0x220
> pc : __slab_free+0x58/0x40c
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> Workqueue: events brcmf_core_bus_reset [brcmfmac]
> Hardware name: Pine64 Pinebook Pro (DT)
> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>  nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
> Internal error: Oops - BUG: 0 [#1] SMP
> kernel BUG at mm/slub.c:379!
> 
> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
> Reviewed-by: Arend van Spriel <aspriel.gmail.com>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Patch applied to wireless-next.git, thanks.

cb774bd35318 wifi: brcmfmac: prevent double-free on hardware-reset
diff mbox series

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index ac02244a6fdf..414ee21f42e3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -802,6 +802,7 @@  static void brcmf_sdiod_freezer_detach(struct brcmf_sdio_dev *sdiodev)
 	if (sdiodev->freezer) {
 		WARN_ON(atomic_read(&sdiodev->freezer->freezing));
 		kfree(sdiodev->freezer);
+		sdiodev->freezer = NULL;
 	}
 }

@@ -875,13 +876,9 @@  int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev)

 	brcmf_sdiod_freezer_detach(sdiodev);

-	/* Disable Function 2 */
-	sdio_claim_host(sdiodev->func2);
-	sdio_disable_func(sdiodev->func2);
-	sdio_release_host(sdiodev->func2);
-
-	/* Disable Function 1 */
+	/* Disable functions 2 then 1. */
 	sdio_claim_host(sdiodev->func1);
+	sdio_disable_func(sdiodev->func2);
 	sdio_disable_func(sdiodev->func1);
 	sdio_release_host(sdiodev->func1);

@@ -911,7 +908,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F1 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	}
 	switch (sdiodev->func2->device) {
 	case SDIO_DEVICE_ID_BROADCOM_CYPRESS_4373:
@@ -933,7 +930,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F2 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	} else {
 		brcmf_dbg(SDIO, "set F2 blocksize to %d\n", f2_blksz);
 	}
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 212fbbe1cd7e..2ed70f809097 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4152,7 +4152,6 @@  int brcmf_sdio_get_fwname(struct device *dev, const char *ext, u8 *fw_name)

 static int brcmf_sdio_bus_reset(struct device *dev)
 {
-	int ret = 0;
 	struct brcmf_bus *bus_if = dev_get_drvdata(dev);
 	struct brcmf_sdio_dev *sdiodev = bus_if->bus_priv.sdio;

@@ -4169,14 +4168,7 @@  static int brcmf_sdio_bus_reset(struct device *dev)
 	sdio_release_host(sdiodev->func1);

 	brcmf_bus_change_state(sdiodev->bus_if, BRCMF_BUS_DOWN);
-
-	ret = brcmf_sdiod_probe(sdiodev);
-	if (ret) {
-		brcmf_err("Failed to probe after sdio device reset: ret %d\n",
-			  ret);
-	}
-
-	return ret;
+	return 0;
 }

 static const struct brcmf_bus_ops brcmf_sdio_bus_ops = {