diff mbox series

[v3] brcmfmac: prevent double-free on hardware-reset.

Message ID ThT2jwvySn9tFQm9FxrlPNMQkiGUnx_87cOhmmeexoVOFZqOkpjmAntCWG-kIBMj2830LHZaOULlJxQKiRXkVtGYrrT8rBaB7R65qjIinYo=@dannyvanheumen.nl (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series [v3] brcmfmac: prevent double-free on hardware-reset. | expand

Commit Message

Danny van Heumen June 17, 2022, 2:24 p.m. UTC
From f1fcceb65d4a44c078cd684ea25a2f2c7f53deb2 Mon Sep 17 00:00:00 2001
From: Danny van Heumen <danny@dannyvanheumen.nl>
Date: Tue, 24 May 2022 18:30:50 +0200
Subject: [PATCH v3] brcmfmac: prevent double-free on hardware-reset.

In case of buggy firmware, brcmfmac may perform a hardware reset. If during
reset and subsequent probing an early failure occurs, a memory region is
accidentally double-freed. With hardened memory allocation enabled, this error
will be detected.

- return early where appropriate to skip unnecessary clean-up.
- set '.freezer' pointer to NULL to prevent double-freeing under possible
  other circumstances and to re-align result under various different
  behaviors of memory allocation freeing.

Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
early, which includes calling 'brcmf_sdiod_remove'. In both cases
'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
has not yet been re-allocated the second time.

Stacktrace of (failing) hardware reset after firmware-crash:

Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
 ret_from_fork+0x10/0x20
 kthread+0x154/0x160
 worker_thread+0x188/0x504
 process_one_work+0x1f4/0x490
 brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
 brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
 brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
 brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
 kfree+0x210/0x220
 __slab_free+0x58/0x40c
Call trace:
x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
sp : ffff80000a22bbf0
lr : kfree+0x210/0x220
pc : __slab_free+0x58/0x40c
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Workqueue: events brcmf_core_bus_reset [brcmfmac]
Hardware name: Pine64 Pinebook Pro (DT)
CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
 nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
Internal error: Oops - BUG: 0 [#1] SMP
kernel BUG at mm/slub.c:379!

Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--
2.34.1

Comments

Arend van Spriel June 22, 2022, 12:36 p.m. UTC | #1
+ Uffe

On 6/17/2022 4:24 PM, Danny van Heumen wrote:
>  From f1fcceb65d4a44c078cd684ea25a2f2c7f53deb2 Mon Sep 17 00:00:00 2001
> From: Danny van Heumen <danny@dannyvanheumen.nl>
> Date: Tue, 24 May 2022 18:30:50 +0200
> Subject: [PATCH v3] brcmfmac: prevent double-free on hardware-reset.
> 
> In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> reset and subsequent probing an early failure occurs, a memory region is
> accidentally double-freed. With hardened memory allocation enabled, this error
> will be detected.
> 
> - return early where appropriate to skip unnecessary clean-up.
> - set '.freezer' pointer to NULL to prevent double-freeing under possible
>    other circumstances and to re-align result under various different
>    behaviors of memory allocation freeing.
> 
> Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> early, which includes calling 'brcmf_sdiod_remove'. In both cases
> 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> has not yet been re-allocated the second time.

For testing you can also trigger the brcmf_sdio_bus_reset() through debugfs:

# cd /sys/kernel/debug/ieee80211/phyX
# echo 1 > reset

So I did that without your patch and observed following:

[  531.481045] brcmfmac: brcmf_sdiod_probe: Failed to set F1 blocksize 

[  531.487314] brcmfmac: brcmf_sdio_bus_reset: Failed to probe after 
sdio device
  reset: ret -123 

[  531.495893] mmc0: card 0001 removed 

[  531.550567] mmc0: new high speed SDIO card at address 0001 

[  531.556561] brcmfmac: F1 signature read @0x18000000=0x16044330

So I looked in brcmf_sdio_bus_reset() and noticed that this function 
actually does a call to mmc_hw_reset() which is what I suggested 
earlier. Looking at this log it seems the actual remove and subsequent 
rescan is a deferred work and it will take care of probing the driver anew.

So I changed debug message level for the sdio ops and then I see:

[ 1327.686828] brcmfmac: brcmf_sdiod_probe: Failed to set F1 blocksize 

[ 1327.693091] brcmfmac: brcmf_sdio_bus_reset: Failed to probe after 
sdio device
  reset: ret -123 

[ 1327.701626] brcmfmac: brcmf_ops_sdio_remove Enter 

[ 1327.706333] brcmfmac: brcmf_ops_sdio_remove Function: 1 

[ 1327.711557] brcmfmac: brcmf_ops_sdio_remove Exit 

[ 1327.716203] brcmfmac: brcmf_ops_sdio_remove Enter 

[ 1327.720911] brcmfmac: brcmf_ops_sdio_remove Function: 2 

[ 1327.726135] brcmfmac: brcmf_ops_sdio_remove Exit 

[ 1327.730768] mmc0: card 0001 removed 

[ 1327.785458] mmc0: new high speed SDIO card at address 0001 

[ 1327.791068] brcmfmac: brcmf_ops_sdio_probe Enter 

[ 1327.795687] brcmfmac: brcmf_ops_sdio_probe Function#: 1 

[ 1327.800988] brcmfmac: brcmf_ops_sdio_probe Enter 

[ 1327.805610] brcmfmac: brcmf_ops_sdio_probe Function#: 2 

[ 1327.811317] brcmfmac: F1 signature read @0x18000000=0x16044330

So to me it seems mmc_hw_reset() is doing everything we need and we can 
make brcmf_sdio_bus_reset() a bit simpler. The only scary part here is 
that brcmf_ops_sdio_remove() is called first for func1 and then for 
func2. Upon rmmod this is different, ie. first func2 and then func1. 
Looking at the implementation in brcmf_ops_sdio_remove() it means we run 
into use-after-free upon mmc_hw_reset().

Regards,
Arend

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 212fbbe1cd7e..21e15f571eea 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4158,24 +4158,15 @@ static int brcmf_sdio_bus_reset(struct device *dev)

  	brcmf_dbg(SDIO, "Enter\n");

-	/* start by unregistering irqs */
  	brcmf_sdiod_intr_unregister(sdiodev);
-
  	brcmf_sdiod_remove(sdiodev);
+	brcmf_bus_change_state(sdiodev->bus_if, BRCMF_BUS_DOWN);

-	/* reset the adapter */
+	/* reset the card */
  	sdio_claim_host(sdiodev->func1);
  	mmc_hw_reset(sdiodev->func1->card);
  	sdio_release_host(sdiodev->func1);

-	brcmf_bus_change_state(sdiodev->bus_if, BRCMF_BUS_DOWN);
-
-	ret = brcmf_sdiod_probe(sdiodev);
-	if (ret) {
-		brcmf_err("Failed to probe after sdio device reset: ret %d\n",
-			  ret);
-	}
-
  	return ret;
  }
Danny van Heumen July 2, 2022, 5:49 p.m. UTC | #2
Hi all,

I missed the response for a while. Following up in-line.

------- Original Message -------
On Wednesday, June 22nd, 2022 at 14:36, Arend van Spriel <arend.vanspriel@broadcom.com> wrote:

> + Uffe
>
> On 6/17/2022 4:24 PM, Danny van Heumen wrote:
>
> > From f1fcceb65d4a44c078cd684ea25a2f2c7f53deb2 Mon Sep 17 00:00:00 2001
> > From: Danny van Heumen danny@dannyvanheumen.nl
> > Date: Tue, 24 May 2022 18:30:50 +0200
> > Subject: [PATCH v3] brcmfmac: prevent double-free on hardware-reset.
> >
> > In case of buggy firmware, brcmfmac may perform a hardware reset. If during
> > reset and subsequent probing an early failure occurs, a memory region is
> > accidentally double-freed. With hardened memory allocation enabled, this error
> > will be detected.
> >
> > - return early where appropriate to skip unnecessary clean-up.
> > - set '.freezer' pointer to NULL to prevent double-freeing under possible
> > other circumstances and to re-align result under various different
> > behaviors of memory allocation freeing.
> >
> > Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
> > 'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
> > the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
> > early, which includes calling 'brcmf_sdiod_remove'. In both cases
> > 'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
> > has not yet been re-allocated the second time.
>
>
> For testing you can also trigger the brcmf_sdio_bus_reset() through debugfs:
>
> # cd /sys/kernel/debug/ieee80211/phyX
> # echo 1 > reset

This works. I have done a few resets in a loop to see if the hardware keeps
recovering and no other issues occur. Everything seems to be fine.


>
>
> So I did that without your patch and observed following:
>
> [ 531.481045] brcmfmac: brcmf_sdiod_probe: Failed to set F1 blocksize
>
> [ 531.487314] brcmfmac: brcmf_sdio_bus_reset: Failed to probe after
> sdio device
> reset: ret -123
>
> [ 531.495893] mmc0: card 0001 removed
>
> [ 531.550567] mmc0: new high speed SDIO card at address 0001
>
> [ 531.556561] brcmfmac: F1 signature read @0x18000000=0x16044330
>
> So I looked in brcmf_sdio_bus_reset() and noticed that this function
> actually does a call to mmc_hw_reset() which is what I suggested
> earlier. Looking at this log it seems the actual remove and subsequent
> rescan is a deferred work and it will take care of probing the driver anew.

I can confirm that this works. This is helpful.


>
> So I changed debug message level for the sdio ops and then I see:
>
> [ 1327.686828] brcmfmac: brcmf_sdiod_probe: Failed to set F1 blocksize
>
> [ 1327.693091] brcmfmac: brcmf_sdio_bus_reset: Failed to probe after
> sdio device
> reset: ret -123
>
> [ 1327.701626] brcmfmac: brcmf_ops_sdio_remove Enter
>
> [ 1327.706333] brcmfmac: brcmf_ops_sdio_remove Function: 1
>
> [ 1327.711557] brcmfmac: brcmf_ops_sdio_remove Exit
>
> [ 1327.716203] brcmfmac: brcmf_ops_sdio_remove Enter
>
> [ 1327.720911] brcmfmac: brcmf_ops_sdio_remove Function: 2
>
> [ 1327.726135] brcmfmac: brcmf_ops_sdio_remove Exit
>
> [ 1327.730768] mmc0: card 0001 removed
>
> [ 1327.785458] mmc0: new high speed SDIO card at address 0001
>
> [ 1327.791068] brcmfmac: brcmf_ops_sdio_probe Enter
>
> [ 1327.795687] brcmfmac: brcmf_ops_sdio_probe Function#: 1
>
> [ 1327.800988] brcmfmac: brcmf_ops_sdio_probe Enter
>
> [ 1327.805610] brcmfmac: brcmf_ops_sdio_probe Function#: 2
>
> [ 1327.811317] brcmfmac: F1 signature read @0x18000000=0x16044330
>
> So to me it seems mmc_hw_reset() is doing everything we need and we can
> make brcmf_sdio_bus_reset() a bit simpler. The only scary part here is
> that brcmf_ops_sdio_remove() is called first for func1 and then for
> func2. Upon rmmod this is different, ie. first func2 and then func1.
> Looking at the implementation in brcmf_ops_sdio_remove() it means we run
> into use-after-free upon mmc_hw_reset().

I have looked into this. The logic itself protects from duplicate behavior
(at least partially) by checking which func is being removed. However,
irq-releases are duplicated because the check only happens after that logic.
This is also necessary, because the kernel warns if the irq isn't released
after executing the callback.

In the next patch (prefixed "PATCH v4", already sent), I have slightly
adjusted the control flow such that irq-release happens for each func
-- as per expectation -- while the removal logic remains rooted at func1,
but executed based on the full device (`sdiodev` as function arg). This
should, IIUC, be correct regardless of function order.

This solution isn't a perfect redesign, but should respect both the design
itself and also sdio driver expectations.

Please find the patch at: g_Py6bM1lfcJOWWmHwKU8x4tCFrTRdgFtoM13qYHeN441F392j_6etJnEJ8gHJMRZ6OEKxpJYuP45x3iziHqY6HNXnVwIiyvJLYjvzxT0Xk=%20()%20dannyvanheumen%20!%20nl

Regards,
Danny
diff mbox series

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index 9c598ea97499..40f40ac4ef73 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -802,6 +802,7 @@  static void brcmf_sdiod_freezer_detach(struct brcmf_sdio_dev *sdiodev)
 	if (sdiodev->freezer) {
 		WARN_ON(atomic_read(&sdiodev->freezer->freezing));
 		kfree(sdiodev->freezer);
+		sdiodev->freezer = NULL;
 	}
 }

@@ -911,7 +912,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F1 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	}
 	switch (sdiodev->func2->device) {
 	case SDIO_DEVICE_ID_BROADCOM_CYPRESS_4373:
@@ -933,7 +934,7 @@  int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev)
 	if (ret) {
 		brcmf_err("Failed to set F2 blocksize\n");
 		sdio_release_host(sdiodev->func1);
-		goto out;
+		return ret;
 	} else {
 		brcmf_dbg(SDIO, "set F2 blocksize to %d\n", f2_blksz);
 	}