Message ID | 20220420031451.6770-1-bo.jiao@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mt76: mt7915: fix msta->wcid use-after-free in mt76_tx_status_check() | expand |
On 20.04.22 05:14, Bo Jiao wrote: > From: Bo Jiao <Bo.Jiao@mediatek.com> > > fix msta->wcid use-after-free in mt76_tx_status_check when the sta > has been removed. > > Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> > --- > drivers/net/wireless/mediatek/mt76/mt7915/main.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c b/drivers/net/wireless/mediatek/mt76/mt7915/main.c > index 800f720..160d80e 100644 > --- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c > +++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c > @@ -701,6 +701,11 @@ void mt7915_mac_sta_remove(struct mt76_dev *mdev, struct ieee80211_vif *vif, > if (!list_empty(&msta->rc_list)) > list_del_init(&msta->rc_list); > spin_unlock_bh(&dev->sta_poll_lock); > + > + spin_lock_bh(&mdev->status_lock); > + if (!list_empty(&msta->wcid.list)) > + list_del_init(&msta->wcid.list); > + spin_unlock_bh(&mdev->status_lock); I'm trying to figure out where this use-after-free bug is coming from, and I can't seem to find the cause of it. Some context: mt7915_mac_sta_remove is called by __mt76_sta_remove, which also calls mt76_packet_id_flush afterwards. mt76_packet_id_flush calls mt76_tx_status_skb_get in a way that makes it iterate over all pending tx status packets and clearing them from the idr. If the idr is empty afterwards, it calls list_del_init(&wcid->list). The only way I can see your patch making a difference would be if clearing the idr fails. That could happen if for some unknown reason, cb->pktid is out of sync with the id that was used to add the packet to the idr. Can you please try the patch below and see if it avoids use-after-free issues and if it also shows the warning I added? Thanks, - Felix --- --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -181,7 +181,8 @@ mt76_tx_status_skb_get(struct mt76_dev *dev, struct mt76_wcid *wcid, int pktid, /* It has been too long since DMA_DONE, time out this packet * and stop waiting for TXS callback. */ - idr_remove(&wcid->pktid, cb->pktid); + WARN(id != cb->pktid, "Packet id %d does not match idr id %d\n", cb->pktid, id); + idr_remove(&wcid->pktid, id); __mt76_tx_status_skb_done(dev, skb, MT_TX_CB_TXS_FAILED | MT_TX_CB_TXS_DONE, list); }
hi felix. we found this crash calltrace: [2022-03-26 10:12:33.755] [48338.807322] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000003 [2022-03-26 10:12:34.104] [48338.816123] Mem abort info: [2022-03-26 10:12:34.104] [48338.818908] ESR = 0x96000006 [2022-03-26 10:12:34.104] [48338.821983] EC = 0x25: DABT (current EL), IL = 32 bits [2022-03-26 10:12:34.104] [48338.827298] SET = 0, FnV = 0 [2022-03-26 10:12:34.104] [48338.830036] br-lan: port 6(ra0) entered blocking state [2022-03-26 10:12:34.104] [48338.830338] EA = 0, S1PTW = 0 [2022-03-26 10:12:34.104] [48338.830341] Data abort info: [2022-03-26 10:12:34.104] [48338.835489] br-lan: port 6(ra0) entered disabled state [2022-03-26 10:12:34.104] [48338.838609] ISV = 0, ISS = 0x00000006 [2022-03-26 10:12:34.104] [48338.841709] device ra0 entered promiscuous mode [2022-03-26 10:12:34.104] [48338.846636] CM = 0, WnR = 0 [2022-03-26 10:12:34.104] [48338.846642] user pgtable: 4k pages, 39-bit VAs, pgdp=000000005a94d000 [2022-03-26 10:12:34.104] [48338.846647] [0000000000000003] pgd=000000005a88b003, pud=000000005a88b003, pmd=0000000000000000 [2022-03-26 10:12:34.104] [48338.850605] br-lan: port 6(ra0) entered blocking state [2022-03-26 10:12:34.104] [48338.855016] Internal error: Oops: 96000006 [#1] SMP [2022-03-26 10:12:34.104] [48338.857981] br-lan: port 6(ra0) entered forwarding state [2022-03-26 10:12:34.104] [48338.864382] Modules linked in: ksmbd pppoe .... [2022-03-26 10:12:34.124] [48339.002070] CPU: 2 PID: 8122 Comm: kworker/u8:4 Not tainted 5.4.182 #0 [2022-03-26 10:12:34.124] [48339.008575] Hardware name: MediaTek MT7986b RFB (DT) [2022-03-26 10:12:34.124] [48339.013533] Workqueue: phy1 mt7915_mac_work [mt7915e] [2022-03-26 10:12:34.124] [48339.018568] pstate: 80000005 (Nzcv daif -PAN -UAO) [2022-03-26 10:12:34.124] [48339.023344] pc : mt76_tx_status_check+0x98/0xd8 [mt76] [2022-03-26 10:12:34.124] [48339.028464] lr : mt76_tx_status_check+0x98/0xd8 [mt76] [2022-03-26 10:12:34.124] [48339.033581] sp : ffffffc01adf3d10 [2022-03-26 10:12:34.124] [48339.036879] x29: ffffffc01adf3d10 x28: 0000000000000000 [2022-03-26 10:12:34.124] [48339.042171] x27: ffffff801b27b738 x26: ffffffc0108a07e0 [2022-03-26 10:12:34.124] [48339.047463] x25: 0000000000000002 x24: ffffff801b302ba8 [2022-03-26 10:12:34.124] [48339.052756] x23: ffffff801bd8df78 x22: 0000000000000000 [2022-03-26 10:12:34.124] [48339.058048] x21: ffffffc01adf3d58 x20: ffffff801bd8a840 [2022-03-26 10:12:34.124] [48339.063340] x19: fffffffffffffee3 x18: 0000000059479c00 [2022-03-26 10:12:34.124] [48339.068632] x17: 00000000ffffffff x16: 0000000000000000 [2022-03-26 10:12:34.124] [48339.073924] x15: 0000000000000d80 x14: ffffffc010b95000 [2022-03-26 10:12:34.133] [48339.079216] x13: 00000000000006c0 x12: 0000000000000040 [2022-03-26 10:12:34.133] [48339.084508] x11: 0000000000000228 x10: 0000000000000000 [2022-03-26 10:12:34.133] [48339.089800] x9 : 0000000000000000 x8 : 0000000000000000 [2022-03-26 10:12:34.133] [48339.095092] x7 : 0000000000000001 x6 : 0000009259428972 [2022-03-26 10:12:34.133] [48339.100384] x5 : 0000000000000000 x4 : 0000000000000000 [2022-03-26 10:12:34.133] [48339.105676] x3 : ffffff801b34ccf0 x2 : 000000007fffffff [2022-03-26 10:12:34.133] [48339.110968] x1 : 000000001b34ccf1 x0 : 0000000000000000 [2022-03-26 10:12:34.133] [48339.116261] Call trace: [2022-03-26 10:12:34.133] [48339.118696] mt76_tx_status_check+0x98/0xd8 [mt76] [2022-03-26 10:12:34.133] [48339.123470] mt7915_mac_work+0x60/0x90 [mt7915e] [2022-03-26 10:12:34.133] [48339.128073] process_one_work+0x1fc/0x390 [2022-03-26 10:12:34.133] [48339.132066] worker_thread+0x48/0x4d0 [2022-03-26 10:12:34.133] [48339.135712] kthread+0x120/0x128 [2022-03-26 10:12:34.133] [48339.138926] ret_from_fork+0x10/0x1c void mt76_tx_status_check(struct mt76_dev *dev, bool flush) { struct mt76_wcid *wcid, *tmp; struct sk_buff_head list; mt76_tx_status_lock(dev, &list); list_for_each_entry_safe(wcid, tmp, &dev->wcid_list, list) mt76_tx_status_skb_get(dev, wcid, flush ? -1 : 0, &list); mt76_tx_status_unlock(dev, &list); } crash on:list_for_each_entry_safe(wcid, tmp, &dev->wcid_list, list) we get wcid from dev->wcid_list, x19: fffffffffffffee3 is wcid our test steps: 1. Configured APUT setting as 3 BSSID with 2G band / WPA2-PSK AES/NGHT/Channel 11/HT40/ group key rotation upgrade interval to 5 mins in card0. 2. Configured APUT setting as 3 BSSID with 5G band / WPA3-PSK AES /HE_5G/Channel 149/ HE160/ group key rotation upgrade interval to 5 mins in card1. 3. Intetface down 4. wifi restart 5. Repeat step3 to step4 about 500 times. 6. After Step5 ,Check it's without any error or crash.. 7. After Step5, Check the APUT memory usage and memory leakage issue. the crash disappeared after applied my patch. thanks. On Wed, 2022-04-20 at 12:40 +0200, Felix Fietkau wrote: > On 20.04.22 05:14, Bo Jiao wrote: > > From: Bo Jiao <Bo.Jiao@mediatek.com> > > > > fix msta->wcid use-after-free in mt76_tx_status_check when the sta > > has been removed. > > > > Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> > > --- > > drivers/net/wireless/mediatek/mt76/mt7915/main.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c > > b/drivers/net/wireless/mediatek/mt76/mt7915/main.c > > index 800f720..160d80e 100644 > > --- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c > > +++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c > > @@ -701,6 +701,11 @@ void mt7915_mac_sta_remove(struct mt76_dev > > *mdev, struct ieee80211_vif *vif, > > if (!list_empty(&msta->rc_list)) > > list_del_init(&msta->rc_list); > > spin_unlock_bh(&dev->sta_poll_lock); > > + > > + spin_lock_bh(&mdev->status_lock); > > + if (!list_empty(&msta->wcid.list)) > > + list_del_init(&msta->wcid.list); > > + spin_unlock_bh(&mdev->status_lock); > > I'm trying to figure out where this use-after-free bug is coming > from, > and I can't seem to find the cause of it. > > Some context: > mt7915_mac_sta_remove is called by __mt76_sta_remove, which also > calls > mt76_packet_id_flush afterwards. > mt76_packet_id_flush calls mt76_tx_status_skb_get in a way that makes > it > iterate over all pending tx status packets and clearing them from the > idr. > If the idr is empty afterwards, it calls list_del_init(&wcid->list). > The only way I can see your patch making a difference would be if > clearing the idr fails. That could happen if for some unknown reason, > cb->pktid is out of sync with the id that was used to add the packet > to > the idr. > > Can you please try the patch below and see if it avoids use-after- > free > issues and if it also shows the warning I added? > > Thanks, > > - Felix > > > --- > --- a/drivers/net/wireless/mediatek/mt76/tx.c > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > @@ -181,7 +181,8 @@ mt76_tx_status_skb_get(struct mt76_dev *dev, > struct mt76_wcid *wcid, int pktid, > /* It has been too long since DMA_DONE, time out > this packet > * and stop waiting for TXS callback. > */ > - idr_remove(&wcid->pktid, cb->pktid); > + WARN(id != cb->pktid, "Packet id %d does not match > idr id %d\n", cb->pktid, id); > + idr_remove(&wcid->pktid, id); > __mt76_tx_status_skb_done(dev, skb, > MT_TX_CB_TXS_FAILED | > MT_TX_CB_TXS_DO > NE, list); > } >
diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/main.c b/drivers/net/wireless/mediatek/mt76/mt7915/main.c index 800f720..160d80e 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/main.c @@ -701,6 +701,11 @@ void mt7915_mac_sta_remove(struct mt76_dev *mdev, struct ieee80211_vif *vif, if (!list_empty(&msta->rc_list)) list_del_init(&msta->rc_list); spin_unlock_bh(&dev->sta_poll_lock); + + spin_lock_bh(&mdev->status_lock); + if (!list_empty(&msta->wcid.list)) + list_del_init(&msta->wcid.list); + spin_unlock_bh(&mdev->status_lock); } static void mt7915_tx(struct ieee80211_hw *hw,