diff mbox series

mt76: mt7915: fix msta->wcid use-after-free in mt76_tx_status_check()

Message ID 20220420031451.6770-1-bo.jiao@mediatek.com (mailing list archive)
State New, archived
Headers show
Series mt76: mt7915: fix msta->wcid use-after-free in mt76_tx_status_check() | expand

Commit Message

Bo Jiao April 20, 2022, 3:14 a.m. UTC
From: Bo Jiao <Bo.Jiao@mediatek.com>

fix msta->wcid use-after-free in mt76_tx_status_check when the sta
has been removed.

Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com>
---
 drivers/net/wireless/mediatek/mt76/mt7915/main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Felix Fietkau April 20, 2022, 10:40 a.m. UTC | #1
On 20.04.22 05:14, Bo Jiao wrote:
> From: Bo Jiao <Bo.Jiao@mediatek.com>
> 
> fix msta->wcid use-after-free in mt76_tx_status_check when the sta
> has been removed.
> 
> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com>
> ---
>   drivers/net/wireless/mediatek/mt76/mt7915/main.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> index 800f720..160d80e 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> @@ -701,6 +701,11 @@ void mt7915_mac_sta_remove(struct mt76_dev *mdev, struct ieee80211_vif *vif,
>   	if (!list_empty(&msta->rc_list))
>   		list_del_init(&msta->rc_list);
>   	spin_unlock_bh(&dev->sta_poll_lock);
> +
> +	spin_lock_bh(&mdev->status_lock);
> +	if (!list_empty(&msta->wcid.list))
> +		list_del_init(&msta->wcid.list);
> +	spin_unlock_bh(&mdev->status_lock);

I'm trying to figure out where this use-after-free bug is coming from,
and I can't seem to find the cause of it.

Some context:
mt7915_mac_sta_remove is called by __mt76_sta_remove, which also calls
mt76_packet_id_flush afterwards.
mt76_packet_id_flush calls mt76_tx_status_skb_get in a way that makes it
iterate over all pending tx status packets and clearing them from the
idr.
If the idr is empty afterwards, it calls list_del_init(&wcid->list).
The only way I can see your patch making a difference would be if
clearing the idr fails. That could happen if for some unknown reason,
cb->pktid is out of sync with the id that was used to add the packet to
the idr.

Can you please try the patch below and see if it avoids use-after-free
issues and if it also shows the warning I added?

Thanks,

- Felix


---
--- a/drivers/net/wireless/mediatek/mt76/tx.c
+++ b/drivers/net/wireless/mediatek/mt76/tx.c
@@ -181,7 +181,8 @@ mt76_tx_status_skb_get(struct mt76_dev *dev, struct mt76_wcid *wcid, int pktid,
  		/* It has been too long since DMA_DONE, time out this packet
  		 * and stop waiting for TXS callback.
  		 */
-		idr_remove(&wcid->pktid, cb->pktid);
+		WARN(id != cb->pktid, "Packet id %d does not match idr id %d\n", cb->pktid, id);
+		idr_remove(&wcid->pktid, id);
  		__mt76_tx_status_skb_done(dev, skb, MT_TX_CB_TXS_FAILED |
  						    MT_TX_CB_TXS_DONE, list);
  	}
Bo Jiao April 21, 2022, 2:14 a.m. UTC | #2
hi felix.

we found this crash calltrace:
[2022-03-26 10:12:33.755] [48338.807322] Unable to handle kernel NULL
pointer dereference at virtual address 0000000000000003
[2022-03-26 10:12:34.104] [48338.816123] Mem abort info:
[2022-03-26 10:12:34.104] [48338.818908]   ESR = 0x96000006
[2022-03-26 10:12:34.104] [48338.821983]   EC = 0x25: DABT (current
EL), IL = 32 bits
[2022-03-26 10:12:34.104] [48338.827298]   SET = 0, FnV = 0
[2022-03-26 10:12:34.104] [48338.830036] br-lan: port 6(ra0) entered
blocking state
[2022-03-26 10:12:34.104] [48338.830338]   EA = 0, S1PTW = 0
[2022-03-26 10:12:34.104] [48338.830341] Data abort info:
[2022-03-26 10:12:34.104] [48338.835489] br-lan: port 6(ra0) entered
disabled state
[2022-03-26 10:12:34.104] [48338.838609]   ISV = 0, ISS = 0x00000006
[2022-03-26 10:12:34.104] [48338.841709] device ra0 entered promiscuous
mode
[2022-03-26 10:12:34.104] [48338.846636]   CM = 0, WnR = 0
[2022-03-26 10:12:34.104] [48338.846642] user pgtable: 4k pages, 39-bit 
VAs, pgdp=000000005a94d000
[2022-03-26 10:12:34.104] [48338.846647] [0000000000000003]
pgd=000000005a88b003, pud=000000005a88b003, pmd=0000000000000000
[2022-03-26 10:12:34.104] [48338.850605] br-lan: port 6(ra0) entered
blocking state
[2022-03-26 10:12:34.104] [48338.855016] Internal error: Oops: 96000006
[#1] SMP
[2022-03-26 10:12:34.104] [48338.857981] br-lan: port 6(ra0) entered
forwarding state
[2022-03-26 10:12:34.104] [48338.864382] Modules linked in: ksmbd pppoe
....
[2022-03-26 10:12:34.124] [48339.002070] CPU: 2 PID: 8122 Comm:
kworker/u8:4 Not tainted 5.4.182 #0
[2022-03-26 10:12:34.124] [48339.008575] Hardware name: MediaTek
MT7986b RFB (DT)
[2022-03-26 10:12:34.124] [48339.013533] Workqueue: phy1
mt7915_mac_work [mt7915e]
[2022-03-26 10:12:34.124] [48339.018568] pstate: 80000005 (Nzcv daif
-PAN -UAO)
[2022-03-26 10:12:34.124] [48339.023344] pc :
mt76_tx_status_check+0x98/0xd8 [mt76]
[2022-03-26 10:12:34.124] [48339.028464] lr :
mt76_tx_status_check+0x98/0xd8 [mt76]
[2022-03-26 10:12:34.124] [48339.033581] sp : ffffffc01adf3d10
[2022-03-26 10:12:34.124] [48339.036879] x29: ffffffc01adf3d10 x28:
0000000000000000 
[2022-03-26 10:12:34.124] [48339.042171] x27: ffffff801b27b738 x26:
ffffffc0108a07e0 
[2022-03-26 10:12:34.124] [48339.047463] x25: 0000000000000002 x24:
ffffff801b302ba8 
[2022-03-26 10:12:34.124] [48339.052756] x23: ffffff801bd8df78 x22:
0000000000000000 
[2022-03-26 10:12:34.124] [48339.058048] x21: ffffffc01adf3d58 x20:
ffffff801bd8a840 
[2022-03-26 10:12:34.124] [48339.063340] x19: fffffffffffffee3 x18:
0000000059479c00 
[2022-03-26 10:12:34.124] [48339.068632] x17: 00000000ffffffff x16:
0000000000000000 
[2022-03-26 10:12:34.124] [48339.073924] x15: 0000000000000d80 x14:
ffffffc010b95000 
[2022-03-26 10:12:34.133] [48339.079216] x13: 00000000000006c0 x12:
0000000000000040 
[2022-03-26 10:12:34.133] [48339.084508] x11: 0000000000000228 x10:
0000000000000000 
[2022-03-26 10:12:34.133] [48339.089800] x9 : 0000000000000000 x8 :
0000000000000000 
[2022-03-26 10:12:34.133] [48339.095092] x7 : 0000000000000001 x6 :
0000009259428972 
[2022-03-26 10:12:34.133] [48339.100384] x5 : 0000000000000000 x4 :
0000000000000000 
[2022-03-26 10:12:34.133] [48339.105676] x3 : ffffff801b34ccf0 x2 :
000000007fffffff 
[2022-03-26 10:12:34.133] [48339.110968] x1 : 000000001b34ccf1 x0 :
0000000000000000 
[2022-03-26 10:12:34.133] [48339.116261] Call trace:
[2022-03-26 10:12:34.133]
[48339.118696]  mt76_tx_status_check+0x98/0xd8 [mt76]
[2022-03-26 10:12:34.133] [48339.123470]  mt7915_mac_work+0x60/0x90
[mt7915e]
[2022-03-26 10:12:34.133] [48339.128073]  process_one_work+0x1fc/0x390
[2022-03-26 10:12:34.133] [48339.132066]  worker_thread+0x48/0x4d0
[2022-03-26 10:12:34.133] [48339.135712]  kthread+0x120/0x128
[2022-03-26 10:12:34.133] [48339.138926]  ret_from_fork+0x10/0x1c


void
mt76_tx_status_check(struct mt76_dev *dev, bool flush)
{
	struct mt76_wcid *wcid, *tmp;
	struct sk_buff_head list;

	mt76_tx_status_lock(dev, &list);
	list_for_each_entry_safe(wcid, tmp, &dev->wcid_list, list)
		mt76_tx_status_skb_get(dev, wcid, flush ? -1 : 0,
&list);
	mt76_tx_status_unlock(dev, &list);
}

crash on:list_for_each_entry_safe(wcid, tmp, &dev->wcid_list, list)
we get wcid from dev->wcid_list, 
x19: fffffffffffffee3 is wcid

our test steps:
1. Configured APUT setting as 3 BSSID with 2G band / WPA2-PSK
AES/NGHT/Channel 11/HT40/ group key rotation upgrade interval to 5 mins
in card0.
2. Configured APUT setting as 3 BSSID with 5G band / WPA3-PSK AES
/HE_5G/Channel 149/ HE160/ group key rotation upgrade interval to 5
mins in card1.
3. Intetface down 
4. wifi restart
5. Repeat step3 to step4 about 500 times.
6. After Step5 ,Check it's without any error or crash.. 
7. After Step5, Check the APUT memory usage and memory leakage issue.
the crash disappeared after applied my patch.

thanks.

On Wed, 2022-04-20 at 12:40 +0200, Felix Fietkau wrote:
> On 20.04.22 05:14, Bo Jiao wrote:
> > From: Bo Jiao <Bo.Jiao@mediatek.com>
> > 
> > fix msta->wcid use-after-free in mt76_tx_status_check when the sta
> > has been removed.
> > 
> > Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com>
> > ---
> >   drivers/net/wireless/mediatek/mt76/mt7915/main.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> > b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> > index 800f720..160d80e 100644
> > --- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> > +++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
> > @@ -701,6 +701,11 @@ void mt7915_mac_sta_remove(struct mt76_dev
> > *mdev, struct ieee80211_vif *vif,
> >   	if (!list_empty(&msta->rc_list))
> >   		list_del_init(&msta->rc_list);
> >   	spin_unlock_bh(&dev->sta_poll_lock);
> > +
> > +	spin_lock_bh(&mdev->status_lock);
> > +	if (!list_empty(&msta->wcid.list))
> > +		list_del_init(&msta->wcid.list);
> > +	spin_unlock_bh(&mdev->status_lock);
> 
> I'm trying to figure out where this use-after-free bug is coming
> from,
> and I can't seem to find the cause of it.
> 
> Some context:
> mt7915_mac_sta_remove is called by __mt76_sta_remove, which also
> calls
> mt76_packet_id_flush afterwards.
> mt76_packet_id_flush calls mt76_tx_status_skb_get in a way that makes
> it
> iterate over all pending tx status packets and clearing them from the
> idr.
> If the idr is empty afterwards, it calls list_del_init(&wcid->list).
> The only way I can see your patch making a difference would be if
> clearing the idr fails. That could happen if for some unknown reason,
> cb->pktid is out of sync with the id that was used to add the packet
> to
> the idr.
> 
> Can you please try the patch below and see if it avoids use-after-
> free
> issues and if it also shows the warning I added?
> 
> Thanks,
> 
> - Felix
> 
> 
> ---
> --- a/drivers/net/wireless/mediatek/mt76/tx.c
> +++ b/drivers/net/wireless/mediatek/mt76/tx.c
> @@ -181,7 +181,8 @@ mt76_tx_status_skb_get(struct mt76_dev *dev,
> struct mt76_wcid *wcid, int pktid,
>   		/* It has been too long since DMA_DONE, time out
> this packet
>   		 * and stop waiting for TXS callback.
>   		 */
> -		idr_remove(&wcid->pktid, cb->pktid);
> +		WARN(id != cb->pktid, "Packet id %d does not match
> idr id %d\n", cb->pktid, id);
> +		idr_remove(&wcid->pktid, id);
>   		__mt76_tx_status_skb_done(dev, skb,
> MT_TX_CB_TXS_FAILED |
>   						    MT_TX_CB_TXS_DO
> NE, list);
>   	}
>
diff mbox series

Patch

diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
index 800f720..160d80e 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c
@@ -701,6 +701,11 @@  void mt7915_mac_sta_remove(struct mt76_dev *mdev, struct ieee80211_vif *vif,
 	if (!list_empty(&msta->rc_list))
 		list_del_init(&msta->rc_list);
 	spin_unlock_bh(&dev->sta_poll_lock);
+
+	spin_lock_bh(&mdev->status_lock);
+	if (!list_empty(&msta->wcid.list))
+		list_del_init(&msta->wcid.list);
+	spin_unlock_bh(&mdev->status_lock);
 }
 
 static void mt7915_tx(struct ieee80211_hw *hw,