diff mbox series

wifi: mt76: fix oops on non-dbdc mt7986

Message ID 20240713130010.516037-1-bjorn@mork.no (mailing list archive)
State Accepted
Delegated to: Felix Fietkau
Headers show
Series wifi: mt76: fix oops on non-dbdc mt7986 | expand

Commit Message

Bjørn Mork July 13, 2024, 1 p.m. UTC
mt7915_band_config() sets band_idx = 1 on the main phy for mt7986
with MT7975_ONE_ADIE or MT7976_ONE_ADIE.

Commit 0335c034e726 ("wifi: mt76: fix race condition related to
checking tx queue fill status") introduced a dereference of the
phys array indirectly indexed by band_idx via wcid->phy_idx in
mt76_wcid_cleanup(). This caused the following Oops on affected
mt7986 devices:

 Unable to handle kernel read from unreadable memory at virtual address 0000000000000024
 Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000005
   CM = 0, WnR = 0
 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000042545000
 [0000000000000024] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
 Internal error: Oops: 0000000096000005 [#1] SMP
 Modules linked in: ... mt7915e mt76_connac_lib mt76 mac80211 cfg80211 ...
 CPU: 2 PID: 1631 Comm: hostapd Not tainted 5.15.150 #0
 Hardware name: ZyXEL EX5700 (Telenor) (DT)
 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : mt76_wcid_cleanup+0x84/0x22c [mt76]
 lr : mt76_wcid_cleanup+0x64/0x22c [mt76]
 sp : ffffffc00a803700
 x29: ffffffc00a803700 x28: ffffff80008f7300 x27: ffffff80003f3c00
 x26: ffffff80000a7880 x25: ffffffc008c26e00 x24: 0000000000000001
 x23: ffffffc000a68114 x22: 0000000000000000 x21: ffffff8004172cc8
 x20: ffffffc00a803748 x19: ffffff8004152020 x18: 0000000000000000
 x17: 00000000000017c0 x16: ffffffc008ef5000 x15: 0000000000000be0
 x14: ffffff8004172e28 x13: ffffff8004172e28 x12: 0000000000000000
 x11: 0000000000000000 x10: ffffff8004172e30 x9 : ffffff8004172e28
 x8 : 0000000000000000 x7 : ffffff8004156020 x6 : 0000000000000000
 x5 : 0000000000000031 x4 : 0000000000000000 x3 : 0000000000000001
 x2 : 0000000000000000 x1 : ffffff80008f7300 x0 : 0000000000000024
 Call trace:
  mt76_wcid_cleanup+0x84/0x22c [mt76]
  __mt76_sta_remove+0x70/0xbc [mt76]
  mt76_sta_state+0x8c/0x1a4 [mt76]
  mt7915_eeprom_get_power_delta+0x11e4/0x23a0 [mt7915e]
  drv_sta_state+0x144/0x274 [mac80211]
  sta_info_move_state+0x1cc/0x2a4 [mac80211]
  sta_set_sinfo+0xaf8/0xc24 [mac80211]
  sta_info_destroy_addr_bss+0x4c/0x6c [mac80211]

  ieee80211_color_change_finish+0x1c08/0x1e70 [mac80211]
  cfg80211_check_station_change+0x1360/0x4710 [cfg80211]
  genl_family_rcv_msg_doit+0xb4/0x110
  genl_rcv_msg+0xd0/0x1bc
  netlink_rcv_skb+0x58/0x120
  genl_rcv+0x34/0x50
  netlink_unicast+0x1f0/0x2ec
  netlink_sendmsg+0x198/0x3d0
  ____sys_sendmsg+0x1b0/0x210
  ___sys_sendmsg+0x80/0xf0
  __sys_sendmsg+0x44/0xa0
  __arm64_sys_sendmsg+0x20/0x30
  invoke_syscall.constprop.0+0x4c/0xe0
  do_el0_svc+0x40/0xd0
  el0_svc+0x14/0x4c
  el0t_64_sync_handler+0x100/0x110
  el0t_64_sync+0x15c/0x160
 Code: d2800002 910092c0 52800023 f9800011 (885f7c01)
 ---[ end trace 7e42dd9a39ed2281 ]---

Fix by using mt76_dev_phy() which will map band_idx to the correct phy
for all hardware combinations.

Fixes: 0335c034e726 ("wifi: mt76: fix race condition related to checking tx queue fill status")
Link: https://github.com/openwrt/openwrt/issues/14548
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
 drivers/net/wireless/mediatek/mt76/mac80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kalle Valo July 31, 2024, 9:27 a.m. UTC | #1
Bjørn Mork <bjorn@mork.no> writes:

> mt7915_band_config() sets band_idx = 1 on the main phy for mt7986
> with MT7975_ONE_ADIE or MT7976_ONE_ADIE.
>
> Commit 0335c034e726 ("wifi: mt76: fix race condition related to
> checking tx queue fill status") introduced a dereference of the
> phys array indirectly indexed by band_idx via wcid->phy_idx in
> mt76_wcid_cleanup(). This caused the following Oops on affected
> mt7986 devices:
>
>  Unable to handle kernel read from unreadable memory at virtual address 0000000000000024
>  Mem abort info:
>    ESR = 0x0000000096000005
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x05: level 1 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000005
>    CM = 0, WnR = 0
>  user pgtable: 4k pages, 39-bit VAs, pgdp=0000000042545000
>  [0000000000000024] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
>  Internal error: Oops: 0000000096000005 [#1] SMP
>  Modules linked in: ... mt7915e mt76_connac_lib mt76 mac80211 cfg80211 ...
>  CPU: 2 PID: 1631 Comm: hostapd Not tainted 5.15.150 #0
>  Hardware name: ZyXEL EX5700 (Telenor) (DT)
>  pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : mt76_wcid_cleanup+0x84/0x22c [mt76]
>  lr : mt76_wcid_cleanup+0x64/0x22c [mt76]
>  sp : ffffffc00a803700
>  x29: ffffffc00a803700 x28: ffffff80008f7300 x27: ffffff80003f3c00
>  x26: ffffff80000a7880 x25: ffffffc008c26e00 x24: 0000000000000001
>  x23: ffffffc000a68114 x22: 0000000000000000 x21: ffffff8004172cc8
>  x20: ffffffc00a803748 x19: ffffff8004152020 x18: 0000000000000000
>  x17: 00000000000017c0 x16: ffffffc008ef5000 x15: 0000000000000be0
>  x14: ffffff8004172e28 x13: ffffff8004172e28 x12: 0000000000000000
>  x11: 0000000000000000 x10: ffffff8004172e30 x9 : ffffff8004172e28
>  x8 : 0000000000000000 x7 : ffffff8004156020 x6 : 0000000000000000
>  x5 : 0000000000000031 x4 : 0000000000000000 x3 : 0000000000000001
>  x2 : 0000000000000000 x1 : ffffff80008f7300 x0 : 0000000000000024
>  Call trace:
>   mt76_wcid_cleanup+0x84/0x22c [mt76]
>   __mt76_sta_remove+0x70/0xbc [mt76]
>   mt76_sta_state+0x8c/0x1a4 [mt76]
>   mt7915_eeprom_get_power_delta+0x11e4/0x23a0 [mt7915e]
>   drv_sta_state+0x144/0x274 [mac80211]
>   sta_info_move_state+0x1cc/0x2a4 [mac80211]
>   sta_set_sinfo+0xaf8/0xc24 [mac80211]
>   sta_info_destroy_addr_bss+0x4c/0x6c [mac80211]
>
>   ieee80211_color_change_finish+0x1c08/0x1e70 [mac80211]
>   cfg80211_check_station_change+0x1360/0x4710 [cfg80211]
>   genl_family_rcv_msg_doit+0xb4/0x110
>   genl_rcv_msg+0xd0/0x1bc
>   netlink_rcv_skb+0x58/0x120
>   genl_rcv+0x34/0x50
>   netlink_unicast+0x1f0/0x2ec
>   netlink_sendmsg+0x198/0x3d0
>   ____sys_sendmsg+0x1b0/0x210
>   ___sys_sendmsg+0x80/0xf0
>   __sys_sendmsg+0x44/0xa0
>   __arm64_sys_sendmsg+0x20/0x30
>   invoke_syscall.constprop.0+0x4c/0xe0
>   do_el0_svc+0x40/0xd0
>   el0_svc+0x14/0x4c
>   el0t_64_sync_handler+0x100/0x110
>   el0t_64_sync+0x15c/0x160
>  Code: d2800002 910092c0 52800023 f9800011 (885f7c01)
>  ---[ end trace 7e42dd9a39ed2281 ]---
>
> Fix by using mt76_dev_phy() which will map band_idx to the correct phy
> for all hardware combinations.
>
> Fixes: 0335c034e726 ("wifi: mt76: fix race condition related to checking tx queue fill status")
> Link: https://github.com/openwrt/openwrt/issues/14548
> Signed-off-by: Bjørn Mork <bjorn@mork.no>

Should this go to wireless tree?
Bjørn Mork July 31, 2024, 10:26 a.m. UTC | #2
Kalle Valo <kvalo@kernel.org> writes:
> Bjørn Mork <bjorn@mork.no> writes:
>
>> mt7915_band_config() sets band_idx = 1 on the main phy for mt7986
>> with MT7975_ONE_ADIE or MT7976_ONE_ADIE.
>>
>> Commit 0335c034e726 ("wifi: mt76: fix race condition related to
>> checking tx queue fill status") introduced a dereference of the
>> phys array indirectly indexed by band_idx via wcid->phy_idx in
>> mt76_wcid_cleanup(). This caused the following Oops on affected
>> mt7986 devices:
>>
>>  Unable to handle kernel read from unreadable memory at virtual address 0000000000000024
>>  Mem abort info:
>>    ESR = 0x0000000096000005
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x05: level 1 translation fault
>>  Data abort info:
>>    ISV = 0, ISS = 0x00000005
>>    CM = 0, WnR = 0
>>  user pgtable: 4k pages, 39-bit VAs, pgdp=0000000042545000
>>  [0000000000000024] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
>>  Internal error: Oops: 0000000096000005 [#1] SMP
>>  Modules linked in: ... mt7915e mt76_connac_lib mt76 mac80211 cfg80211 ...
>>  CPU: 2 PID: 1631 Comm: hostapd Not tainted 5.15.150 #0
>>  Hardware name: ZyXEL EX5700 (Telenor) (DT)
>>  pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : mt76_wcid_cleanup+0x84/0x22c [mt76]
>>  lr : mt76_wcid_cleanup+0x64/0x22c [mt76]
>>  sp : ffffffc00a803700
>>  x29: ffffffc00a803700 x28: ffffff80008f7300 x27: ffffff80003f3c00
>>  x26: ffffff80000a7880 x25: ffffffc008c26e00 x24: 0000000000000001
>>  x23: ffffffc000a68114 x22: 0000000000000000 x21: ffffff8004172cc8
>>  x20: ffffffc00a803748 x19: ffffff8004152020 x18: 0000000000000000
>>  x17: 00000000000017c0 x16: ffffffc008ef5000 x15: 0000000000000be0
>>  x14: ffffff8004172e28 x13: ffffff8004172e28 x12: 0000000000000000
>>  x11: 0000000000000000 x10: ffffff8004172e30 x9 : ffffff8004172e28
>>  x8 : 0000000000000000 x7 : ffffff8004156020 x6 : 0000000000000000
>>  x5 : 0000000000000031 x4 : 0000000000000000 x3 : 0000000000000001
>>  x2 : 0000000000000000 x1 : ffffff80008f7300 x0 : 0000000000000024
>>  Call trace:
>>   mt76_wcid_cleanup+0x84/0x22c [mt76]
>>   __mt76_sta_remove+0x70/0xbc [mt76]
>>   mt76_sta_state+0x8c/0x1a4 [mt76]
>>   mt7915_eeprom_get_power_delta+0x11e4/0x23a0 [mt7915e]
>>   drv_sta_state+0x144/0x274 [mac80211]
>>   sta_info_move_state+0x1cc/0x2a4 [mac80211]
>>   sta_set_sinfo+0xaf8/0xc24 [mac80211]
>>   sta_info_destroy_addr_bss+0x4c/0x6c [mac80211]
>>
>>   ieee80211_color_change_finish+0x1c08/0x1e70 [mac80211]
>>   cfg80211_check_station_change+0x1360/0x4710 [cfg80211]
>>   genl_family_rcv_msg_doit+0xb4/0x110
>>   genl_rcv_msg+0xd0/0x1bc
>>   netlink_rcv_skb+0x58/0x120
>>   genl_rcv+0x34/0x50
>>   netlink_unicast+0x1f0/0x2ec
>>   netlink_sendmsg+0x198/0x3d0
>>   ____sys_sendmsg+0x1b0/0x210
>>   ___sys_sendmsg+0x80/0xf0
>>   __sys_sendmsg+0x44/0xa0
>>   __arm64_sys_sendmsg+0x20/0x30
>>   invoke_syscall.constprop.0+0x4c/0xe0
>>   do_el0_svc+0x40/0xd0
>>   el0_svc+0x14/0x4c
>>   el0t_64_sync_handler+0x100/0x110
>>   el0t_64_sync+0x15c/0x160
>>  Code: d2800002 910092c0 52800023 f9800011 (885f7c01)
>>  ---[ end trace 7e42dd9a39ed2281 ]---
>>
>> Fix by using mt76_dev_phy() which will map band_idx to the correct phy
>> for all hardware combinations.
>>
>> Fixes: 0335c034e726 ("wifi: mt76: fix race condition related to checking tx queue fill status")
>> Link: https://github.com/openwrt/openwrt/issues/14548
>> Signed-off-by: Bjørn Mork <bjorn@mork.no>
>
> Should this go to wireless tree?

I believe it should. If fixes a regression on the affected hardware,
introduced by commit 0335c034e726.

It should also go into any still maintained v6.7, v6.8, v6.9, v6.10
stable trees.  But I assume they will pick it up automatically based on
the Fixes tag.



Bjørn
Kalle Valo July 31, 2024, 11:27 a.m. UTC | #3
Bjørn Mork <bjorn@mork.no> writes:

> Kalle Valo <kvalo@kernel.org> writes:
>
>> Bjørn Mork <bjorn@mork.no> writes:
>>
>>> mt7915_band_config() sets band_idx = 1 on the main phy for mt7986
>>> with MT7975_ONE_ADIE or MT7976_ONE_ADIE.
>>>
>>> Commit 0335c034e726 ("wifi: mt76: fix race condition related to
>>> checking tx queue fill status") introduced a dereference of the
>>> phys array indirectly indexed by band_idx via wcid->phy_idx in
>>> mt76_wcid_cleanup(). This caused the following Oops on affected
>>> mt7986 devices:
>>>
>>>  Unable to handle kernel read from unreadable memory at virtual address 0000000000000024
>>>  Mem abort info:
>>>    ESR = 0x0000000096000005
>>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>>    SET = 0, FnV = 0
>>>    EA = 0, S1PTW = 0
>>>    FSC = 0x05: level 1 translation fault
>>>  Data abort info:
>>>    ISV = 0, ISS = 0x00000005
>>>    CM = 0, WnR = 0
>>>  user pgtable: 4k pages, 39-bit VAs, pgdp=0000000042545000
>>>  [0000000000000024] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
>>>  Internal error: Oops: 0000000096000005 [#1] SMP
>>>  Modules linked in: ... mt7915e mt76_connac_lib mt76 mac80211 cfg80211 ...
>>>  CPU: 2 PID: 1631 Comm: hostapd Not tainted 5.15.150 #0
>>>  Hardware name: ZyXEL EX5700 (Telenor) (DT)
>>>  pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>  pc : mt76_wcid_cleanup+0x84/0x22c [mt76]
>>>  lr : mt76_wcid_cleanup+0x64/0x22c [mt76]
>>>  sp : ffffffc00a803700
>>>  x29: ffffffc00a803700 x28: ffffff80008f7300 x27: ffffff80003f3c00
>>>  x26: ffffff80000a7880 x25: ffffffc008c26e00 x24: 0000000000000001
>>>  x23: ffffffc000a68114 x22: 0000000000000000 x21: ffffff8004172cc8
>>>  x20: ffffffc00a803748 x19: ffffff8004152020 x18: 0000000000000000
>>>  x17: 00000000000017c0 x16: ffffffc008ef5000 x15: 0000000000000be0
>>>  x14: ffffff8004172e28 x13: ffffff8004172e28 x12: 0000000000000000
>>>  x11: 0000000000000000 x10: ffffff8004172e30 x9 : ffffff8004172e28
>>>  x8 : 0000000000000000 x7 : ffffff8004156020 x6 : 0000000000000000
>>>  x5 : 0000000000000031 x4 : 0000000000000000 x3 : 0000000000000001
>>>  x2 : 0000000000000000 x1 : ffffff80008f7300 x0 : 0000000000000024
>>>  Call trace:
>>>   mt76_wcid_cleanup+0x84/0x22c [mt76]
>>>   __mt76_sta_remove+0x70/0xbc [mt76]
>>>   mt76_sta_state+0x8c/0x1a4 [mt76]
>>>   mt7915_eeprom_get_power_delta+0x11e4/0x23a0 [mt7915e]
>>>   drv_sta_state+0x144/0x274 [mac80211]
>>>   sta_info_move_state+0x1cc/0x2a4 [mac80211]
>>>   sta_set_sinfo+0xaf8/0xc24 [mac80211]
>>>   sta_info_destroy_addr_bss+0x4c/0x6c [mac80211]
>>>
>>>   ieee80211_color_change_finish+0x1c08/0x1e70 [mac80211]
>>>   cfg80211_check_station_change+0x1360/0x4710 [cfg80211]
>>>   genl_family_rcv_msg_doit+0xb4/0x110
>>>   genl_rcv_msg+0xd0/0x1bc
>>>   netlink_rcv_skb+0x58/0x120
>>>   genl_rcv+0x34/0x50
>>>   netlink_unicast+0x1f0/0x2ec
>>>   netlink_sendmsg+0x198/0x3d0
>>>   ____sys_sendmsg+0x1b0/0x210
>>>   ___sys_sendmsg+0x80/0xf0
>>>   __sys_sendmsg+0x44/0xa0
>>>   __arm64_sys_sendmsg+0x20/0x30
>>>   invoke_syscall.constprop.0+0x4c/0xe0
>>>   do_el0_svc+0x40/0xd0
>>>   el0_svc+0x14/0x4c
>>>   el0t_64_sync_handler+0x100/0x110
>>>   el0t_64_sync+0x15c/0x160
>>>  Code: d2800002 910092c0 52800023 f9800011 (885f7c01)
>>>  ---[ end trace 7e42dd9a39ed2281 ]---
>>>
>>> Fix by using mt76_dev_phy() which will map band_idx to the correct phy
>>> for all hardware combinations.
>>>
>>> Fixes: 0335c034e726 ("wifi: mt76: fix race condition related to checking tx queue fill status")
>>> Link: https://github.com/openwrt/openwrt/issues/14548
>>> Signed-off-by: Bjørn Mork <bjorn@mork.no>
>>
>> Should this go to wireless tree?
>
> I believe it should. If fixes a regression on the affected hardware,
> introduced by commit 0335c034e726.

Ok, so the regression is introduced in v6.7-rc1. But I noticed that
Felix had applied already to his tree so this will go to v6.12:

https://github.com/nbd168/wireless/commit/cc9370fc0d7a83dab7159a3a91d363e6903d8eb2

> It should also go into any still maintained v6.7, v6.8, v6.9, v6.10
> stable trees.  But I assume they will pick it up automatically based on
> the Fixes tag.

The fixes tag is just a hint, it does not guarantee that stable
maintainers will pick the commit.
diff mbox series

Patch

diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index e8ba2e4e8484..b5dbcf925f92 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -1524,7 +1524,7 @@  EXPORT_SYMBOL_GPL(mt76_wcid_init);
 
 void mt76_wcid_cleanup(struct mt76_dev *dev, struct mt76_wcid *wcid)
 {
-	struct mt76_phy *phy = dev->phys[wcid->phy_idx];
+	struct mt76_phy *phy = mt76_dev_phy(dev, wcid->phy_idx);
 	struct ieee80211_hw *hw;
 	struct sk_buff_head list;
 	struct sk_buff *skb;