diff mbox series

[v4] brcmfmac: prevent double-free on hardware-reset

Message ID g_Py6bM1lfcJOWWmHwKU8x4tCFrTRdgFtoM13qYHeN441F392j_6etJnEJ8gHJMRZ6OEKxpJYuP45x3iziHqY6HNXnVwIiyvJLYjvzxT0Xk=@dannyvanheumen.nl (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series [v4] brcmfmac: prevent double-free on hardware-reset | expand

Commit Message

Danny van Heumen July 2, 2022, 5:35 p.m. UTC
In case of buggy firmware, brcmfmac may perform a hardware reset. If during
reset and subsequent probing an early failure occurs, a memory region is
accidentally double-freed. With hardened memory allocation enabled, this error
will be detected.

- return early where appropriate to skip unnecessary clean-up.
- set '.freezer' pointer to NULL to prevent double-freeing under possible
  other circumstances and to re-align result under various different
  behaviors of memory allocation freeing.
- correctly claim host on func1 for disabling func2.
- after reset, do not initiate probing immediately, but rely on events.

Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
early, which includes calling 'brcmf_sdiod_remove'. In both cases
'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
has not yet been re-allocated the second time.

Stacktrace of (failing) hardware reset after firmware-crash:

Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
 ret_from_fork+0x10/0x20
 kthread+0x154/0x160
 worker_thread+0x188/0x504
 process_one_work+0x1f4/0x490
 brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
 brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
 brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
 brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
 kfree+0x210/0x220
 __slab_free+0x58/0x40c
Call trace:
x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
sp : ffff80000a22bbf0
lr : kfree+0x210/0x220
pc : __slab_free+0x58/0x40c
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Workqueue: events brcmf_core_bus_reset [brcmfmac]
Hardware name: Pine64 Pinebook Pro (DT)
CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
 nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
Internal error: Oops - BUG: 0 [#1] SMP
kernel BUG at mm/slub.c:379!

Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
---
 .../broadcom/brcm80211/brcmfmac/bcmsdh.c      | 31 ++++++++++++-------
 .../broadcom/brcm80211/brcmfmac/sdio.c        | 10 +-----
 2 files changed, 21 insertions(+), 20 deletions(-)

--
2.34.1

Comments

Arend Van Spriel July 4, 2022, 9:43 a.m. UTC | #1
On 7/2/2022 7:35 PM, Danny van Heumen wrote:
> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> reset and subsequent probing an early failure occurs, a memory region is
> accidentally double-freed. With hardened memory allocation enabled, this error
> will be detected.
> 
> - return early where appropriate to skip unnecessary clean-up.
> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>    other circumstances and to re-align result under various different
>    behaviors of memory allocation freeing.
> - correctly claim host on func1 for disabling func2.
> - after reset, do not initiate probing immediately, but rely on events.
> 
> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> has not yet been re-allocated the second time.
> 
> Stacktrace of (failing) hardware reset after firmware-crash:
> 
> Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
>   ret_from_fork+0x10/0x20
>   kthread+0x154/0x160
>   worker_thread+0x188/0x504
>   process_one_work+0x1f4/0x490
>   brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
>   brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
>   brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
>   brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
>   kfree+0x210/0x220
>   __slab_free+0x58/0x40c
> Call trace:
> x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
> x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
> x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
> x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
> x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
> x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
> x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
> x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
> x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
> sp : ffff80000a22bbf0
> lr : kfree+0x210/0x220
> pc : __slab_free+0x58/0x40c
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> Workqueue: events brcmf_core_bus_reset [brcmfmac]
> Hardware name: Pine64 Pinebook Pro (DT)
> CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
>   nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
> Internal error: Oops - BUG: 0 [#1] SMP
> kernel BUG at mm/slub.c:379!
> 
> Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
> ---
It is good practice to throw in a changelog here so people know what has 
changed since earlier version of the patch.
---
>   .../broadcom/brcm80211/brcmfmac/bcmsdh.c      | 31 ++++++++++++-------
>   .../broadcom/brcm80211/brcmfmac/sdio.c        | 10 +-----
>   2 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> index ac02244a6fdf..dd634edaa0b3 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c

[...]

> @@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
>   	if (bus_if) {
>   		sdiodev = bus_if->bus_priv.sdio;
> 
> +		if (func->num != 1) {
> +			/* Satisfy kernel expectation that the irq is released once the
> +			 * '.remove' callback has executed, while respecting the design
> +			 * that removal is executed for 'sdiodev', instead of individual
> +			 * function.
> +			 */
> +			brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
> +			sdio_claim_host(sdiodev->func1);
> +			sdio_release_irq(func);
> +			sdio_release_host(sdiodev->func1);
> +			return;
> +		}
> +
> +		/* func 1: so do full clean-up and removal */
> +

The problem is that upon driver unload we get remove for function 2 and 
then for function 1. Upon mmc_hw_reset() we get a remove for function 1 
and then for function 2. So in the scenario of mmc_hw_reset() we free 
sdiodev upon func 1 removal and then for func 2 removal we have a 
use-after-free of sdiodev. The code currently relies on the order in 
which remove callback is done. To make it more robust we could throw in 
a refcount for sdiodev and only do the full clean-up when refcount hits 
zero.

Regards,
Arend
Danny van Heumen July 4, 2022, 2:49 p.m. UTC | #2
Hi Arend,

------- Original Message -------
On Monday, July 4th, 2022 at 11:43, Arend Van Spriel <aspriel@gmail.com> wrote:


> [..]
>
> It is good practice to throw in a changelog here so people know what has
> changed since earlier version of the patch.

That's fair enough. The commit message is updated.
Changes compared to v3:

- brcmf_sdiod_remove(..) disables functions in reverse order. It also claims
  'func2' when disabling 'func2'. However, operations on 'func2' are always
  performed after claiming 'func1'. So this corrects mistake that deviates
  from convention.
- furthermore, following feedback from the kernel, irq is released for each
  individual function, but only func1 performs removal operations. This
  prevents the ordering issue from occurring.

> ---
>
> > .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 31 ++++++++++++-------
> > .../broadcom/brcm80211/brcmfmac/sdio.c | 10 +-----
> > 2 files changed, 21 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> > index ac02244a6fdf..dd634edaa0b3 100644
> > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>
>
> [...]
>
> > @@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
> > if (bus_if) {
> > sdiodev = bus_if->bus_priv.sdio;
> >
> > + if (func->num != 1) {
> > + /* Satisfy kernel expectation that the irq is released once the
> > + * '.remove' callback has executed, while respecting the design
> > + * that removal is executed for 'sdiodev', instead of individual
> > + * function.
> > + /
> > + brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
> > + sdio_claim_host(sdiodev->func1);
> > + sdio_release_irq(func);
> > + sdio_release_host(sdiodev->func1);
> > + return;
> > + }
> > +
> > + / func 1: so do full clean-up and removal */
> > +
>
>
> The problem is that upon driver unload we get remove for function 2 and
> then for function 1. Upon mmc_hw_reset() we get a remove for function 1
> and then for function 2. So in the scenario of mmc_hw_reset() we free
> sdiodev upon func 1 removal and then for func 2 removal we have a
> use-after-free of sdiodev.

I understood this. I recognize the different orders. However, there is a
false assumption regarding double-freeing. The removal logic in
'brcmf_ops_sdio_remove' is conditional on function number. Little is done
for any function that is not `func->num == 1`. The proposed patch V4 fine-
tunes this behavior slightly. In this fine-tuning it mostly (completely)
negates order differences.

> The code currently relies on the order in
> which remove callback is done. To make it more robust we could throw in
> a refcount for sdiodev and only do the full clean-up when refcount hits
> zero.

Am I missing something else, maybe? If not, I think I have your concerns
covered.

Note that I have (and am) testing this patch on my own system. So far
without issue. I have run a script that resets the mmc device every 90
seconds to see if repeated use would cause issues. So far without issues.

Regards,
Danny
Arend Van Spriel July 4, 2022, 6:52 p.m. UTC | #3
On 7/4/2022 4:49 PM, Danny van Heumen wrote:
> Hi Arend,
> 
> ------- Original Message -------
> On Monday, July 4th, 2022 at 11:43, Arend Van Spriel <aspriel@gmail.com> wrote:
> 
> 
>> [..]
>>
>> It is good practice to throw in a changelog here so people know what has
>> changed since earlier version of the patch.
> 
> That's fair enough. The commit message is updated.
> Changes compared to v3:
> 
> - brcmf_sdiod_remove(..) disables functions in reverse order. It also claims
>    'func2' when disabling 'func2'. However, operations on 'func2' are always
>    performed after claiming 'func1'. So this corrects mistake that deviates
>    from convention.
> - furthermore, following feedback from the kernel, irq is released for each
>    individual function, but only func1 performs removal operations. This
>    prevents the ordering issue from occurring.
> 
>> ---
>>
>>> .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 31 ++++++++++++-------
>>> .../broadcom/brcm80211/brcmfmac/sdio.c | 10 +-----
>>> 2 files changed, 21 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>> index ac02244a6fdf..dd634edaa0b3 100644
>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>
>>
>> [...]
>>
>>> @@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
>>> if (bus_if) {
>>> sdiodev = bus_if->bus_priv.sdio;
>>>
>>> + if (func->num != 1) {
>>> + /* Satisfy kernel expectation that the irq is released once the
>>> + * '.remove' callback has executed, while respecting the design
>>> + * that removal is executed for 'sdiodev', instead of individual
>>> + * function.
>>> + /
>>> + brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
>>> + sdio_claim_host(sdiodev->func1);
>>> + sdio_release_irq(func);
>>> + sdio_release_host(sdiodev->func1);
>>> + return;

Actually this is wrong. Before the function 
brcmf_sdiod_intr_unregister() was called for every sdio function 
instance. That function does exactly the same as the above and more. On 
some platforms the device does not used the SDIO interrupt, but instead 
it uses what we call an OOB interrupt (out-of-band). So your change does 
not add anything for devices/platforms employing the SDIO interrupt, but 
it does break those using OOB interrupt.

>>> + }
>>> +
>>> + / func 1: so do full clean-up and removal */
>>> +
>>
>>
>> The problem is that upon driver unload we get remove for function 2 and
>> then for function 1. Upon mmc_hw_reset() we get a remove for function 1
>> and then for function 2. So in the scenario of mmc_hw_reset() we free
>> sdiodev upon func 1 removal and then for func 2 removal we have a
>> use-after-free of sdiodev.
> 
> I understood this. I recognize the different orders. However, there is a
> false assumption regarding double-freeing. The removal logic in
> 'brcmf_ops_sdio_remove' is conditional on function number. Little is done
> for any function that is not `func->num == 1`. The proposed patch V4 fine-
> tunes this behavior slightly. In this fine-tuning it mostly (completely)
> negates order differences.
> 
>> The code currently relies on the order in
>> which remove callback is done. To make it more robust we could throw in
>> a refcount for sdiodev and only do the full clean-up when refcount hits
>> zero.
> 
> Am I missing something else, maybe? If not, I think I have your concerns
> covered.

I think you are. The function brcmf_sdiod_remove() does end-up freeing 
the sdiodev instance. brcmf_sdiod_remove() for func 1, but in 
brcmf_ops_sdio_remove() we do dereference sdiodev instance for all 
functions.

In the portion above you have:

sdio_claim_host(sdiodev->func1);

When brcmf_ops_sdio_remove is called for func 2 (and even func 3) after 
func 1 has been removed sdiodev points to memory already free.

Regards,
Arend
Danny van Heumen July 4, 2022, 8:43 p.m. UTC | #4
Hi,

I'll respond to your comments one-by-one. I will include some of my
reasoning to help clarify.

------- Original Message -------
On Monday, July 4th, 2022 at 20:52, Arend Van Spriel <aspriel@gmail.com> wrote:

> On 7/4/2022 4:49 PM, Danny van Heumen wrote:
>
> > Hi Arend,
> >
> > ------- Original Message -------
> > On Monday, July 4th, 2022 at 11:43, Arend Van Spriel aspriel@gmail.com wrote:
> >
> > > [..]
> > >
> > > It is good practice to throw in a changelog here so people know what has
> > > changed since earlier version of the patch.
> >
> > That's fair enough. The commit message is updated.
> > Changes compared to v3:
> >
> > - brcmf_sdiod_remove(..) disables functions in reverse order. It also claims
> > 'func2' when disabling 'func2'. However, operations on 'func2' are always
> > performed after claiming 'func1'. So this corrects mistake that deviates
> > from convention.
> > - furthermore, following feedback from the kernel, irq is released for each
> > individual function, but only func1 performs removal operations. This
> > prevents the ordering issue from occurring.
> >
> > > ---
> > >
> > > > .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 31 ++++++++++++-------
> > > > .../broadcom/brcm80211/brcmfmac/sdio.c | 10 +-----
> > > > 2 files changed, 21 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> > > > index ac02244a6fdf..dd634edaa0b3 100644
> > > > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> > > > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> > >
> > > [...]
> > >
> > > > @@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
> > > > if (bus_if) {
> > > > sdiodev = bus_if->bus_priv.sdio;
> > > >
> > > > + if (func->num != 1) {
> > > > + /* Satisfy kernel expectation that the irq is released once the
> > > > + * '.remove' callback has executed, while respecting the design
> > > > + * that removal is executed for 'sdiodev', instead of individual
> > > > + * function.
> > > > + /
> > > > + brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
> > > > + sdio_claim_host(sdiodev->func1);
> > > > + sdio_release_irq(func);
> > > > + sdio_release_host(sdiodev->func1);
> > > > + return;
>
>
> Actually this is wrong. Before the function
> brcmf_sdiod_intr_unregister() was called for every sdio function
> instance.


We are in agreement here.


> That function does exactly the same as the above and more. On
> some platforms the device does not used the SDIO interrupt, but instead
> it uses what we call an OOB interrupt (out-of-band).

The SDIODEV struct is hard-coded for 2 functions. The function
`brcmf_sdiod_intr_unregister` unregisters for the whole devices, i.e.
two functions simultaneously. The OOB interrupt handling is
function-independent. It takes a sdiodev pointer to work with. In
addition, the code facilitates the request of a single OOB interrupt,
i.e. a single one for the device as a whole.

Regarding the sdio-irq-claim/release, these are function-bound.
However, as mentioned before, `brcmf_sdiod_intr_unregister` handles
both functions at once. This code does not handle `func` for the function
currently being iterated on. Only a whole device.

From how I read the code, this logic is scoped to SDIO-based devices.
Plz correct if this is interpretation is wrong.


> So your change does
> not add anything for devices/platforms employing the SDIO interrupt, but
> it does break those using OOB interrupt.

It adds for SDIO-based interrupt handling that the interrupt gets released
for the function that is being iterated on for removal. Therefore, it
satisfies the expectations of the SDIO subsystem which otherwise emits a
warning about an irq not having been freed.

AFAICT OOB handling does not change: it still executes once. After that,
the flag `oob_irq_requested` is false. The benefit we create, is that
`brcmf_sdiod_intr_unregister` now only executes for func1, instead of
either func1 or func2 depending on iteration order.

>
> > > > + }
> > > > +
> > > > + / func 1: so do full clean-up and removal */
> > > > +
> > >
> > > The problem is that upon driver unload we get remove for function 2 and
> > > then for function 1. Upon mmc_hw_reset() we get a remove for function 1
> > > and then for function 2. So in the scenario of mmc_hw_reset() we free
> > > sdiodev upon func 1 removal and then for func 2 removal we have a
> > > use-after-free of sdiodev.
> >
> > I understood this. I recognize the different orders. However, there is a
> > false assumption regarding double-freeing. The removal logic in
> > 'brcmf_ops_sdio_remove' is conditional on function number. Little is done
> > for any function that is not `func->num == 1`. The proposed patch V4 fine-
> > tunes this behavior slightly. In this fine-tuning it mostly (completely)
> > negates order differences.
> >
> > > The code currently relies on the order in
> > > which remove callback is done. To make it more robust we could throw in
> > > a refcount for sdiodev and only do the full clean-up when refcount hits
> > > zero.
> >
> > Am I missing something else, maybe? If not, I think I have your concerns
> > covered.
>
>
> I think you are. The function brcmf_sdiod_remove() does end-up freeing
> the sdiodev instance. brcmf_sdiod_remove() for func 1, but in
> brcmf_ops_sdio_remove() we do dereference sdiodev instance for all
> functions.

That is correct, but in `brcmf_ops_sdio_remove`:

```
dev_set_drvdata(&sdiodev->func1->dev, NULL);
dev_set_drvdata(&sdiodev->func2->dev, NULL);
```

set those pointers to NULL. So, in the use case where we process functions
in increasing order, `func1` will result in full device clean-up. `func2`
will result in `bus_if` (func->dev) being NULL, therefore exits immediately.

So, I see two use cases:

1.) we iterate in decreasing order: release irq for func2 first, then do
    full clean-up in func1.
2.) we iterate in increasing order: do full clean-up in func1, then skip
    clean-up for func2 due to NULL bus_if.


> In the portion above you have:
>
> sdio_claim_host(sdiodev->func1);
>
> When brcmf_ops_sdio_remove is called for func 2 (and even func 3) [..]

`func3` does not exist from what I read in the code. Can you point me to
the logic that I have missed?


> [..] after func 1 has been removed sdiodev points to memory already free.

Not sure if we are considering the same use case, but if called from
`brcmf_sdiod_remove` within `brcmf_ops_sdio_remove`: this is called before
NULLing and '..._disable' does not clear. And then, is only called once.

Regards,
Danny
Arend van Spriel July 5, 2022, 8:50 a.m. UTC | #5
On 7/4/2022 10:43 PM, Danny van Heumen wrote:
> Hi,
> 
> I'll respond to your comments one-by-one. I will include some of my
> reasoning to help clarify.
> 
> ------- Original Message -------
> On Monday, July 4th, 2022 at 20:52, Arend Van Spriel <aspriel@gmail.com> wrote:
> 
>> On 7/4/2022 4:49 PM, Danny van Heumen wrote:
>>
>>> Hi Arend,
>>>
>>> ------- Original Message -------
>>> On Monday, July 4th, 2022 at 11:43, Arend Van Spriel aspriel@gmail.com wrote:
>>>
>>>> [..]
>>>>
>>>> It is good practice to throw in a changelog here so people know what has
>>>> changed since earlier version of the patch.
>>>
>>> That's fair enough. The commit message is updated.
>>> Changes compared to v3:
>>>
>>> - brcmf_sdiod_remove(..) disables functions in reverse order. It also claims
>>> 'func2' when disabling 'func2'. However, operations on 'func2' are always
>>> performed after claiming 'func1'. So this corrects mistake that deviates
>>> from convention.
>>> - furthermore, following feedback from the kernel, irq is released for each
>>> individual function, but only func1 performs removal operations. This
>>> prevents the ordering issue from occurring.
>>>
>>>> ---
>>>>
>>>>> .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 31 ++++++++++++-------
>>>>> .../broadcom/brcm80211/brcmfmac/sdio.c | 10 +-----
>>>>> 2 files changed, 21 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>>>> index ac02244a6fdf..dd634edaa0b3 100644
>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>>>>
>>>> [...]
>>>>
>>>>> @@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
>>>>> if (bus_if) {
>>>>> sdiodev = bus_if->bus_priv.sdio;
>>>>>
>>>>> + if (func->num != 1) {
>>>>> + /* Satisfy kernel expectation that the irq is released once the
>>>>> + * '.remove' callback has executed, while respecting the design
>>>>> + * that removal is executed for 'sdiodev', instead of individual
>>>>> + * function.
>>>>> + /
>>>>> + brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
>>>>> + sdio_claim_host(sdiodev->func1);
>>>>> + sdio_release_irq(func);
>>>>> + sdio_release_host(sdiodev->func1);
>>>>> + return;
>>
>>
>> Actually this is wrong. Before the function
>> brcmf_sdiod_intr_unregister() was called for every sdio function
>> instance.
> 
> 
> We are in agreement here.
> 
> 
>> That function does exactly the same as the above and more. On
>> some platforms the device does not used the SDIO interrupt, but instead
>> it uses what we call an OOB interrupt (out-of-band).
> 
> The SDIODEV struct is hard-coded for 2 functions. The function
> `brcmf_sdiod_intr_unregister` unregisters for the whole devices, i.e.
> two functions simultaneously. The OOB interrupt handling is
> function-independent. It takes a sdiodev pointer to work with. In
> addition, the code facilitates the request of a single OOB interrupt,
> i.e. a single one for the device as a whole.

Okay. Forgot the internals of brcmf_sdiod_intr_unregister(). My elephant 
brain is failing ;-)

> Regarding the sdio-irq-claim/release, these are function-bound.
> However, as mentioned before, `brcmf_sdiod_intr_unregister` handles
> both functions at once. This code does not handle `func` for the function
> currently being iterated on. Only a whole device.
> 
>  From how I read the code, this logic is scoped to SDIO-based devices.
> Plz correct if this is interpretation is wrong.

I stand corrected in this.

>> So your change does
>> not add anything for devices/platforms employing the SDIO interrupt, but
>> it does break those using OOB interrupt.
> 
> It adds for SDIO-based interrupt handling that the interrupt gets released
> for the function that is being iterated on for removal. Therefore, it
> satisfies the expectations of the SDIO subsystem which otherwise emits a
> warning about an irq not having been freed.
> 
> AFAICT OOB handling does not change: it still executes once. After that,
> the flag `oob_irq_requested` is false. The benefit we create, is that
> `brcmf_sdiod_intr_unregister` now only executes for func1, instead of
> either func1 or func2 depending on iteration order.

But does that matter. brcmf_sdiod_intr_unregister() will do any real 
stuff only once, right? Regardless which func comes first it will 
release both func1 and func2 irq. Does it matter? I don't see enough 
benefit to add code for it.

>>>>> + }
>>>>> +
>>>>> + / func 1: so do full clean-up and removal */
>>>>> +
>>>>
>>>> The problem is that upon driver unload we get remove for function 2 and
>>>> then for function 1. Upon mmc_hw_reset() we get a remove for function 1
>>>> and then for function 2. So in the scenario of mmc_hw_reset() we free
>>>> sdiodev upon func 1 removal and then for func 2 removal we have a
>>>> use-after-free of sdiodev.
>>>
>>> I understood this. I recognize the different orders. However, there is a
>>> false assumption regarding double-freeing. The removal logic in
>>> 'brcmf_ops_sdio_remove' is conditional on function number. Little is done
>>> for any function that is not `func->num == 1`. The proposed patch V4 fine-
>>> tunes this behavior slightly. In this fine-tuning it mostly (completely)
>>> negates order differences.
>>>
>>>> The code currently relies on the order in
>>>> which remove callback is done. To make it more robust we could throw in
>>>> a refcount for sdiodev and only do the full clean-up when refcount hits
>>>> zero.
>>>
>>> Am I missing something else, maybe? If not, I think I have your concerns
>>> covered.
>>
>>
>> I think you are. The function brcmf_sdiod_remove() does end-up freeing
>> the sdiodev instance. brcmf_sdiod_remove() for func 1, but in
>> brcmf_ops_sdio_remove() we do dereference sdiodev instance for all
>> functions.
> 
> That is correct, but in `brcmf_ops_sdio_remove`:
> 
> ```
> dev_set_drvdata(&sdiodev->func1->dev, NULL);
> dev_set_drvdata(&sdiodev->func2->dev, NULL);
> ```
> 
> set those pointers to NULL. So, in the use case where we process functions
> in increasing order, `func1` will result in full device clean-up. `func2`
> will result in `bus_if` (func->dev) being NULL, therefore exits immediately.

That's a relieve.

> So, I see two use cases:
> 
> 1.) we iterate in decreasing order: release irq for func2 first, then do
>      full clean-up in func1.
> 2.) we iterate in increasing order: do full clean-up in func1, then skip
>      clean-up for func2 due to NULL bus_if.
> 
> 
>> In the portion above you have:
>>
>> sdio_claim_host(sdiodev->func1);
>>
>> When brcmf_ops_sdio_remove is called for func 2 (and even func 3) [..]
> 
> `func3` does not exist from what I read in the code. Can you point me to
> the logic that I have missed?

We don't use/claim func3, but some devices report a func3 and the 
mmc/sdio subsystem will probe the driver with a func3. However, as we 
never claim it we will not be called for func3 so please ignore my 
blabbering about that.

Regards,
Arend
Danny van Heumen July 5, 2022, 1:12 p.m. UTC | #6
Hi,

------- Original Message -------
On Tuesday, July 5th, 2022 at 10:50, Arend van Spriel <arend.vanspriel@broadcom.com> wrote:

> [..]
> >
> > The SDIODEV struct is hard-coded for 2 functions. The function
> > `brcmf_sdiod_intr_unregister` unregisters for the whole devices, i.e.
> > two functions simultaneously. The OOB interrupt handling is
> > function-independent. It takes a sdiodev pointer to work with. In
> > addition, the code facilitates the request of a single OOB interrupt,
> > i.e. a single one for the device as a whole.
>
> Okay. Forgot the internals of brcmf_sdiod_intr_unregister(). My elephant
> brain is failing ;-)

Okay.


> > Regarding the sdio-irq-claim/release, these are function-bound.
> > However, as mentioned before, `brcmf_sdiod_intr_unregister` handles
> > both functions at once. This code does not handle `func` for the function
> > currently being iterated on. Only a whole device.
> >
> > From how I read the code, this logic is scoped to SDIO-based devices.
> > Plz correct if this is interpretation is wrong.
>
>
> I stand corrected in this.

Okay.


> > > So your change does
> > > not add anything for devices/platforms employing the SDIO interrupt, but
> > > it does break those using OOB interrupt.
> >
> > It adds for SDIO-based interrupt handling that the interrupt gets released
> > for the function that is being iterated on for removal. Therefore, it
> > satisfies the expectations of the SDIO subsystem which otherwise emits a
> > warning about an irq not having been freed.
> >
> > AFAICT OOB handling does not change: it still executes once. After that,
> > the flag `oob_irq_requested` is false. The benefit we create, is that
> > `brcmf_sdiod_intr_unregister` now only executes for func1, instead of
> > either func1 or func2 depending on iteration order.
>
> But does that matter. brcmf_sdiod_intr_unregister() will do any real
> stuff only once, right? Regardless which func comes first it will
> release both func1 and func2 irq. Does it matter? I don't see enough
> benefit to add code for it.

Does it? You can probably judge this better than I can.

What I read, is that the interrupt release logic is processed twice. And
you have recently put a lot of emphasis on potential ordering problems.
So, this is my proposal to somewhat stream-line the logic to per-func
behavior, instead of doing everything at func1 and func2 is or isn't
processed at all. I do not have a strong opinion.

> > [..]
> >
> > That is correct, but in `brcmf_ops_sdio_remove`:
> >
> > `dev_set_drvdata(&sdiodev->func1->dev, NULL); dev_set_drvdata(&sdiodev->func2->dev, NULL);`
> >
> > set those pointers to NULL. So, in the use case where we process functions
> > in increasing order, `func1` will result in full device clean-up. `func2`
> > will result in `bus_if` (func->dev) being NULL, therefore exits immediately.
>
> That's a relieve.
>
> > So, I see two use cases:
> >
> > 1.) we iterate in decreasing order: release irq for func2 first, then do
> > full clean-up in func1.
> > 2.) we iterate in increasing order: do full clean-up in func1, then skip
> > clean-up for func2 due to NULL bus_if.
> >
> > > In the portion above you have:
> > >
> > > sdio_claim_host(sdiodev->func1);
> > >
> > > When brcmf_ops_sdio_remove is called for func 2 (and even func 3) [..]
> >
> > `func3` does not exist from what I read in the code. Can you point me to
> > the logic that I have missed?
>
> We don't use/claim func3, but some devices report a func3 and the
> mmc/sdio subsystem will probe the driver with a func3. However, as we
> never claim it we will not be called for func3 so please ignore my
> blabbering about that.

Okay, so what I understand from the response right now, is that we mostly
agree on the implementation except maybe for the interrupt clean up.

For the interrupt clean-up, I guess it may be determined by how strongly you
want to tie the clean-up logic to func1 vs. the first func to be iterated on.

Regards,
Danny
Arend Van Spriel July 7, 2022, 8:25 p.m. UTC | #7
On 7/5/2022 3:12 PM, Danny van Heumen wrote:
> Hi,
> 
> ------- Original Message -------
> On Tuesday, July 5th, 2022 at 10:50, Arend van Spriel <arend.vanspriel@broadcom.com> wrote:
> 
>> [..]
>>>
>>> The SDIODEV struct is hard-coded for 2 functions. The function
>>> `brcmf_sdiod_intr_unregister` unregisters for the whole devices, i.e.
>>> two functions simultaneously. The OOB interrupt handling is
>>> function-independent. It takes a sdiodev pointer to work with. In
>>> addition, the code facilitates the request of a single OOB interrupt,
>>> i.e. a single one for the device as a whole.
>>
>> Okay. Forgot the internals of brcmf_sdiod_intr_unregister(). My elephant
>> brain is failing ;-)
> 
> Okay.
> 
> 
>>> Regarding the sdio-irq-claim/release, these are function-bound.
>>> However, as mentioned before, `brcmf_sdiod_intr_unregister` handles
>>> both functions at once. This code does not handle `func` for the function
>>> currently being iterated on. Only a whole device.
>>>
>>>  From how I read the code, this logic is scoped to SDIO-based devices.
>>> Plz correct if this is interpretation is wrong.
>>
>>
>> I stand corrected in this.
> 
> Okay.
> 
> 
>>>> So your change does
>>>> not add anything for devices/platforms employing the SDIO interrupt, but
>>>> it does break those using OOB interrupt.
>>>
>>> It adds for SDIO-based interrupt handling that the interrupt gets released
>>> for the function that is being iterated on for removal. Therefore, it
>>> satisfies the expectations of the SDIO subsystem which otherwise emits a
>>> warning about an irq not having been freed.
>>>
>>> AFAICT OOB handling does not change: it still executes once. After that,
>>> the flag `oob_irq_requested` is false. The benefit we create, is that
>>> `brcmf_sdiod_intr_unregister` now only executes for func1, instead of
>>> either func1 or func2 depending on iteration order.
>>
>> But does that matter. brcmf_sdiod_intr_unregister() will do any real
>> stuff only once, right? Regardless which func comes first it will
>> release both func1 and func2 irq. Does it matter? I don't see enough
>> benefit to add code for it.
> 
> Does it? You can probably judge this better than I can.
> 
> What I read, is that the interrupt release logic is processed twice. And
> you have recently put a lot of emphasis on potential ordering problems.
> So, this is my proposal to somewhat stream-line the logic to per-func
> behavior, instead of doing everything at func1 and func2 is or isn't
> processed at all. I do not have a strong opinion.
> 
>>> [..]
>>>
>>> That is correct, but in `brcmf_ops_sdio_remove`:
>>>
>>> `dev_set_drvdata(&sdiodev->func1->dev, NULL); dev_set_drvdata(&sdiodev->func2->dev, NULL);`
>>>
>>> set those pointers to NULL. So, in the use case where we process functions
>>> in increasing order, `func1` will result in full device clean-up. `func2`
>>> will result in `bus_if` (func->dev) being NULL, therefore exits immediately.
>>
>> That's a relieve.
>>
>>> So, I see two use cases:
>>>
>>> 1.) we iterate in decreasing order: release irq for func2 first, then do
>>> full clean-up in func1.
>>> 2.) we iterate in increasing order: do full clean-up in func1, then skip
>>> clean-up for func2 due to NULL bus_if.
>>>
>>>> In the portion above you have:
>>>>
>>>> sdio_claim_host(sdiodev->func1);
>>>>
>>>> When brcmf_ops_sdio_remove is called for func 2 (and even func 3) [..]
>>>
>>> `func3` does not exist from what I read in the code. Can you point me to
>>> the logic that I have missed?
>>
>> We don't use/claim func3, but some devices report a func3 and the
>> mmc/sdio subsystem will probe the driver with a func3. However, as we
>> never claim it we will not be called for func3 so please ignore my
>> blabbering about that.
> 
> Okay, so what I understand from the response right now, is that we mostly
> agree on the implementation except maybe for the interrupt clean up.
> 
> For the interrupt clean-up, I guess it may be determined by how strongly you
> want to tie the clean-up logic to func1 vs. the first func to be iterated on.

Right. I prefer dropping the interrupt clean-up and rest of the patch is 
fine by me.

Thanks,
Arend
Danny van Heumen July 11, 2022, 11:24 p.m. UTC | #8
Hi,

------- Original Message -------
On Thursday, July 7th, 2022 at 22:25, Arend Van Spriel <aspriel@gmail.com> wrote:

> [..]
>
> Right. I prefer dropping the interrupt clean-up and rest of the patch is
> fine by me.

Please find patch v5 submitted. Subject: [PATCH v5] brcmfmac: prevent double-free on hardware-reset

I have reverted the split, in-order freeing of func irqs, as discussed. No further changes.

During my time running with the various patches, I have not had any issues. As mentioned before, repeated resets did not result in any issues. (I didn't count exactly, maybe 30+ resets in a scripted run.)

IIUC this should be ready for integration.

Regards,
Danny
diff mbox series

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index ac02244a6fdf..dd634edaa0b3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -802,6 +802,7 @@  static void brcmf_sdiod_freezer_detach(struct brcmf_sdio_dev *sdiodev)
 	if (sdiodev->freezer) {
 		WARN_ON(atomic_read(&sdiodev->freezer->freezing));
 		kfree(sdiodev->freezer);
+		sdiodev->freezer = NULL;
 	}
 }

@@ -875,13 +876,9 @@  int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev)

 	brcmf_sdiod_freezer_detach(sdiodev);

-	/* Disable Function 2 */
-	sdio_claim_host(sdiodev->func2);
-	sdio_disable_func(sdiodev->func2);
-	sdio_release_host(sdiodev->func2);
-
-	/* Disable Function 1 */
+	/* Disable functions 2 then 1. */
 	sdio_claim_host(sdiodev->func1);
+	sdio_disable_func(sdiodev->func2);
 	sdio_disable_func(sdiodev->func1);
 	sdio_release_host(sdiodev->func1);

@@ -911,7 +908,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F1 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	}
 	switch (sdiodev->func2->device) {
 	case SDIO_DEVICE_ID_BROADCOM_CYPRESS_4373:
@@ -933,7 +930,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F2 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	} else {
 		brcmf_dbg(SDIO, "set F2 blocksize to %d\n", f2_blksz);
 	}
@@ -1096,12 +1093,24 @@  static void brcmf_ops_sdio_remove(struct sdio_func *func)
 	if (bus_if) {
 		sdiodev = bus_if->bus_priv.sdio;

+		if (func->num != 1) {
+			/* Satisfy kernel expectation that the irq is released once the
+			 * '.remove' callback has executed, while respecting the design
+			 * that removal is executed for 'sdiodev', instead of individual
+			 * function.
+			 */
+			brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
+			sdio_claim_host(sdiodev->func1);
+			sdio_release_irq(func);
+			sdio_release_host(sdiodev->func1);
+			return;
+		}
+
+		/* func 1: so do full clean-up and removal */
+
 		/* start by unregistering irqs */
 		brcmf_sdiod_intr_unregister(sdiodev);

-		if (func->num != 1)
-			return;
-
 		/* only proceed with rest of cleanup if func 1 */
 		brcmf_sdiod_remove(sdiodev);

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 212fbbe1cd7e..2ed70f809097 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4152,7 +4152,6 @@  int brcmf_sdio_get_fwname(struct device *dev, const char *ext, u8 *fw_name)

 static int brcmf_sdio_bus_reset(struct device *dev)
 {
-	int ret = 0;
 	struct brcmf_bus *bus_if = dev_get_drvdata(dev);
 	struct brcmf_sdio_dev *sdiodev = bus_if->bus_priv.sdio;

@@ -4169,14 +4168,7 @@  static int brcmf_sdio_bus_reset(struct device *dev)
 	sdio_release_host(sdiodev->func1);

 	brcmf_bus_change_state(sdiodev->bus_if, BRCMF_BUS_DOWN);
-
-	ret = brcmf_sdiod_probe(sdiodev);
-	if (ret) {
-		brcmf_err("Failed to probe after sdio device reset: ret %d\n",
-			  ret);
-	}
-
-	return ret;
+	return 0;
 }

 static const struct brcmf_bus_ops brcmf_sdio_bus_ops = {