diff mbox series

wifi: mac80211: fix NULL dereference at band check in starting tx ba session

Message ID 20240523082200.15544-1-kevin_yang@realtek.com (mailing list archive)
State Under Review
Delegated to: Johannes Berg
Headers show
Series wifi: mac80211: fix NULL dereference at band check in starting tx ba session | expand

Commit Message

Zong-Zhe Yang May 23, 2024, 8:22 a.m. UTC
From: Zong-Zhe Yang <kevin_yang@realtek.com>

In MLD connection, link_data/link_conf are dynamically allocated. They
don't point to vif->bss_conf. So, there will be no chanreq assigned to
vif->bss_conf and then the chan will be NULL. To get deflink's chanreq,
we change the code to access vif->link_conf with deflink's link id.

This way should also work in non-MLD connection, because non-MLD deflink
link id should be 0 and non-MLD link_conf[0] will point to vif->bss_conf.

Crash log (with rtw89 version under MLO development):
[ 9890.526087] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 9890.526102] #PF: supervisor read access in kernel mode
[ 9890.526105] #PF: error_code(0x0000) - not-present page
[ 9890.526109] PGD 0 P4D 0
[ 9890.526114] Oops: 0000 [#1] PREEMPT SMP PTI
[ 9890.526119] CPU: 2 PID: 6367 Comm: kworker/u16:2 Kdump: loaded Tainted: G           OE      6.9.0 #1
[ 9890.526123] Hardware name: LENOVO 2356AD1/2356AD1, BIOS G7ETB3WW (2.73 ) 11/28/2018
[ 9890.526126] Workqueue: phy2 rtw89_core_ba_work [rtw89_core]
[ 9890.526203] RIP: 0010:ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211
[ 9890.526279] Code: f7 e8 d5 93 3e ea 48 83 c4 28 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8b 84 24 e0 f1 ff ff 48 8b 80 90 1b 00 00 <83> 38 03 0f 84 37 fe ff ff bb ea ff ff ff eb cc 49 8b 84 24 10 f3
All code
========
   0:	f7 e8                	imul   %eax
   2:	d5                   	(bad)
   3:	93                   	xchg   %eax,%ebx
   4:	3e ea                	ds (bad)
   6:	48 83 c4 28          	add    $0x28,%rsp
   a:	89 d8                	mov    %ebx,%eax
   c:	5b                   	pop    %rbx
   d:	41 5c                	pop    %r12
   f:	41 5d                	pop    %r13
  11:	41 5e                	pop    %r14
  13:	41 5f                	pop    %r15
  15:	5d                   	pop    %rbp
  16:	c3                   	retq
  17:	cc                   	int3
  18:	cc                   	int3
  19:	cc                   	int3
  1a:	cc                   	int3
  1b:	49 8b 84 24 e0 f1 ff 	mov    -0xe20(%r12),%rax
  22:	ff
  23:	48 8b 80 90 1b 00 00 	mov    0x1b90(%rax),%rax
  2a:*	83 38 03             	cmpl   $0x3,(%rax)		<-- trapping instruction
  2d:	0f 84 37 fe ff ff    	je     0xfffffffffffffe6a
  33:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
  38:	eb cc                	jmp    0x6
  3a:	49                   	rex.WB
  3b:	8b                   	.byte 0x8b
  3c:	84 24 10             	test   %ah,(%rax,%rdx,1)
  3f:	f3                   	repz

Code starting with the faulting instruction
===========================================
   0:	83 38 03             	cmpl   $0x3,(%rax)
   3:	0f 84 37 fe ff ff    	je     0xfffffffffffffe40
   9:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
   e:	eb cc                	jmp    0xffffffffffffffdc
  10:	49                   	rex.WB
  11:	8b                   	.byte 0x8b
  12:	84 24 10             	test   %ah,(%rax,%rdx,1)
  15:	f3                   	repz
[ 9890.526285] RSP: 0018:ffffb8db09013d68 EFLAGS: 00010246
[ 9890.526291] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9308e0d656c8
[ 9890.526295] RDX: 0000000000000000 RSI: ffffffffab99460b RDI: ffffffffab9a7685
[ 9890.526300] RBP: ffffb8db09013db8 R08: 0000000000000000 R09: 0000000000000873
[ 9890.526304] R10: ffff9308e0d64800 R11: 0000000000000002 R12: ffff9308e5ff6e70
[ 9890.526308] R13: ffff930952500e20 R14: ffff9309192a8c00 R15: 0000000000000000
[ 9890.526313] FS:  0000000000000000(0000) GS:ffff930b4e700000(0000) knlGS:0000000000000000
[ 9890.526316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9890.526318] CR2: 0000000000000000 CR3: 0000000391c58005 CR4: 00000000001706f0
[ 9890.526321] Call Trace:
[ 9890.526324]  <TASK>
[ 9890.526327] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 9890.526335] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 9890.526340] ? page_fault_oops (arch/x86/mm/fault.c:713)
[ 9890.526347] ? search_module_extables (kernel/module/main.c:3256 (discriminator 3))
[ 9890.526353] ? ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
---
 net/mac80211/agg-tx.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Kalle Valo May 25, 2024, 10:01 a.m. UTC | #1
<kevin_yang@realtek.com> writes:

> From: Zong-Zhe Yang <kevin_yang@realtek.com>
>
> In MLD connection, link_data/link_conf are dynamically allocated. They
> don't point to vif->bss_conf. So, there will be no chanreq assigned to
> vif->bss_conf and then the chan will be NULL. To get deflink's chanreq,
> we change the code to access vif->link_conf with deflink's link id.
>
> This way should also work in non-MLD connection, because non-MLD deflink
> link id should be 0 and non-MLD link_conf[0] will point to vif->bss_conf.
>
> Crash log (with rtw89 version under MLO development):
> [ 9890.526087] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 9890.526102] #PF: supervisor read access in kernel mode
> [ 9890.526105] #PF: error_code(0x0000) - not-present page
> [ 9890.526109] PGD 0 P4D 0
> [ 9890.526114] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 9890.526119] CPU: 2 PID: 6367 Comm: kworker/u16:2 Kdump: loaded Tainted: G           OE      6.9.0 #1
> [ 9890.526123] Hardware name: LENOVO 2356AD1/2356AD1, BIOS G7ETB3WW (2.73 ) 11/28/2018
> [ 9890.526126] Workqueue: phy2 rtw89_core_ba_work [rtw89_core]
> [ 9890.526203] RIP: 0010:ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211
> [ 9890.526279] Code: f7 e8 d5 93 3e ea 48 83 c4 28 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8b 84 24 e0 f1 ff ff 48 8b 80 90 1b 00 00 <83> 38 03 0f 84 37 fe ff ff bb ea ff ff ff eb cc 49 8b 84 24 10 f3
> All code
> ========
>    0:	f7 e8                	imul   %eax
>    2:	d5                   	(bad)
>    3:	93                   	xchg   %eax,%ebx
>    4:	3e ea                	ds (bad)
>    6:	48 83 c4 28          	add    $0x28,%rsp
>    a:	89 d8                	mov    %ebx,%eax
>    c:	5b                   	pop    %rbx
>    d:	41 5c                	pop    %r12
>    f:	41 5d                	pop    %r13
>   11:	41 5e                	pop    %r14
>   13:	41 5f                	pop    %r15
>   15:	5d                   	pop    %rbp
>   16:	c3                   	retq
>   17:	cc                   	int3
>   18:	cc                   	int3
>   19:	cc                   	int3
>   1a:	cc                   	int3
>   1b:	49 8b 84 24 e0 f1 ff 	mov    -0xe20(%r12),%rax
>   22:	ff
>   23:	48 8b 80 90 1b 00 00 	mov    0x1b90(%rax),%rax
>   2a:*	83 38 03             	cmpl   $0x3,(%rax)		<-- trapping instruction
>   2d:	0f 84 37 fe ff ff    	je     0xfffffffffffffe6a
>   33:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
>   38:	eb cc                	jmp    0x6
>   3a:	49                   	rex.WB
>   3b:	8b                   	.byte 0x8b
>   3c:	84 24 10             	test   %ah,(%rax,%rdx,1)
>   3f:	f3                   	repz
>
> Code starting with the faulting instruction
> ===========================================
>    0:	83 38 03             	cmpl   $0x3,(%rax)
>    3:	0f 84 37 fe ff ff    	je     0xfffffffffffffe40
>    9:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
>    e:	eb cc                	jmp    0xffffffffffffffdc
>   10:	49                   	rex.WB
>   11:	8b                   	.byte 0x8b
>   12:	84 24 10             	test   %ah,(%rax,%rdx,1)
>   15:	f3                   	repz
> [ 9890.526285] RSP: 0018:ffffb8db09013d68 EFLAGS: 00010246
> [ 9890.526291] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9308e0d656c8
> [ 9890.526295] RDX: 0000000000000000 RSI: ffffffffab99460b RDI: ffffffffab9a7685
> [ 9890.526300] RBP: ffffb8db09013db8 R08: 0000000000000000 R09: 0000000000000873
> [ 9890.526304] R10: ffff9308e0d64800 R11: 0000000000000002 R12: ffff9308e5ff6e70
> [ 9890.526308] R13: ffff930952500e20 R14: ffff9309192a8c00 R15: 0000000000000000
> [ 9890.526313] FS:  0000000000000000(0000) GS:ffff930b4e700000(0000) knlGS:0000000000000000
> [ 9890.526316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9890.526318] CR2: 0000000000000000 CR3: 0000000391c58005 CR4: 00000000001706f0
> [ 9890.526321] Call Trace:
> [ 9890.526324]  <TASK>
> [ 9890.526327] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 9890.526335] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
> [ 9890.526340] ? page_fault_oops (arch/x86/mm/fault.c:713)
> [ 9890.526347] ? search_module_extables (kernel/module/main.c:3256 (discriminator 3))
> [ 9890.526353] ? ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211
>
> Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>

Kevin's s-o-b missing.
Zong-Zhe Yang May 27, 2024, 6:46 a.m. UTC | #2
Kalle Valo <kvalo@kernel.org> wrote:
> 
> <kevin_yang@realtek.com> writes:
> 
> > From: Zong-Zhe Yang <kevin_yang@realtek.com>
> >
> > [...]
> >
> > Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
> 
> Kevin's s-o-b missing.
> 

Sorry, I didn't configure sendemail.from properly.
(I will double-check my configuration before sending patch next time.)
But, my s-o-b has been there.
Kalle Valo May 27, 2024, 8:51 a.m. UTC | #3
Zong-Zhe Yang <kevin_yang@realtek.com> writes:

> Kalle Valo <kvalo@kernel.org> wrote:
>> 
>> <kevin_yang@realtek.com> writes:
>> 
>> > From: Zong-Zhe Yang <kevin_yang@realtek.com>
>> >
>> > [...]
>> >
>> > Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
>> 
>> Kevin's s-o-b missing.
>> 
>
> Sorry, I didn't configure sendemail.from properly.
> (I will double-check my configuration before sending patch next time.)
> But, my s-o-b has been there.

Yeah, I got confused. Sorry for the noise.
Johannes Berg May 29, 2024, 1:22 p.m. UTC | #4
On Thu, 2024-05-23 at 16:22 +0800, kevin_yang@realtek.com wrote:
> 
> -	if (!pubsta->deflink.ht_cap.ht_supported &&
> -	    sta->sdata->vif.bss_conf.chanreq.oper.chan->band != NL80211_BAND_6GHZ)
> -		return -EINVAL;

I can see how this fixes the crash, and I can also see why we didn't
notice (TX agg sessions offloaded to FW), but ...

> +	if (!pubsta->deflink.ht_cap.ht_supported) {
> +		struct ieee80211_vif *vif = &sta->sdata->vif;
> +		struct ieee80211_bss_conf *bss_conf;
> +
> +		rcu_read_lock();
> +
> +		bss_conf = rcu_dereference(vif->link_conf[pubsta->deflink.link_id]);
> +		if (unlikely(!bss_conf) ||
> +		    bss_conf->chanreq.oper.chan->band != NL80211_BAND_6GHZ) {
> +			rcu_read_unlock();
> +			return -EINVAL;
> 

is this really right?

This checks that the *first* link the STA used isn't 6 GHz, but maybe it
should be *any* link?

But then again, we don't really need this check for an MLO STA since it
will have HT supported unless it associated on 6 GHz. Maybe we should
just not do the check this way, but check if it has HT or VHT or HE or
something like that?

johannes
Zong-Zhe Yang May 30, 2024, 1:49 p.m. UTC | #5
Johannes Berg <johannes@sipsolutions.net> wrote:
> 
> On Thu, 2024-05-23 at 16:22 +0800, kevin_yang@realtek.com wrote:
> 
> [...]
> 
> This checks that the *first* link the STA used isn't 6 GHz, but maybe it should be *any* link?
> 
> But then again, we don't really need this check for an MLO STA since it will have HT supported
> unless it associated on 6 GHz. Maybe we should just not do the check this way, but check if it
> has HT or VHT or HE or something like that?
> 

I think there are two points here.

1. the way to avoid this NULL dereference
(Current patch just followed original logic and made it runnable on both MLD and non-MLD.)

According to comments, I will change to check ht_supported/vht_supported/has_he/has_eht.
Then, it doesn't need to reference chanreq.oper.chan here. So, there won't be NULL dereference.

2. the check logic when MLD
(Current patch didn't consider this properly.)

According to spec., BA agreement does once per TID and apply to all corresponding links.
So, I am thinking maybe I check the conditions on all valid_links when MLD.
And, only check deflink when non-MLD.
How about it?
Johannes Berg June 7, 2024, 5:26 p.m. UTC | #6
On Thu, 2024-05-30 at 13:49 +0000, Zong-Zhe Yang wrote:
> Johannes Berg <johannes@sipsolutions.net> wrote:
> > 
> > On Thu, 2024-05-23 at 16:22 +0800, kevin_yang@realtek.com wrote:
> > 
> > [...]
> > 
> > This checks that the *first* link the STA used isn't 6 GHz, but maybe it should be *any* link?
> > 
> > But then again, we don't really need this check for an MLO STA since it will have HT supported
> > unless it associated on 6 GHz. Maybe we should just not do the check this way, but check if it
> > has HT or VHT or HE or something like that?
> > 
> 
> I think there are two points here.
> 
> 1. the way to avoid this NULL dereference
> (Current patch just followed original logic and made it runnable on both MLD and non-MLD.)
> 
> According to comments, I will change to check ht_supported/vht_supported/has_he/has_eht.
> Then, it doesn't need to reference chanreq.oper.chan here. So, there won't be NULL dereference.
> 
> 2. the check logic when MLD
> (Current patch didn't consider this properly.)
> 
> According to spec., BA agreement does once per TID and apply to all corresponding links.
> So, I am thinking maybe I check the conditions on all valid_links when MLD.
> And, only check deflink when non-MLD.

Well, spec also requires that you have EHT (on all links) to be able to
do MLO in the first place, so you shouldn't be connected. IOW, checking
one link should be sufficient? And that can even be deflink, because for
a STA that's always used as the assoc link (unlike for vif)

johannes
Zong-Zhe Yang June 14, 2024, 7:53 a.m. UTC | #7
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Thu, 2024-05-30 at 13:49 +0000, Zong-Zhe Yang wrote:
> >
> > I think there are two points here.
> >
> > 1. the way to avoid this NULL dereference (Current patch just followed
> > original logic and made it runnable on both MLD and non-MLD.)
> >
> > According to comments, I will change to check
> ht_supported/vht_supported/has_he/has_eht.
> > Then, it doesn't need to reference chanreq.oper.chan here. So, there won't be NULL
> dereference.
> >
> > 2. the check logic when MLD
> > (Current patch didn't consider this properly.)
> >
> > According to spec., BA agreement does once per TID and apply to all corresponding links.
> > So, I am thinking maybe I check the conditions on all valid_links when MLD.
> > And, only check deflink when non-MLD.
> 
> Well, spec also requires that you have EHT (on all links) to be able to do MLO in the first place,
> so you shouldn't be connected. IOW, checking one link should be sufficient? And that can even
> be deflink, because for a STA that's always used as the assoc link (unlike for vif)
> 

Then I am thinking to just check ht_supported/vht_supported/has_he/has_eht on sta deflink,
whether non-MLD connection or MLD connection.
Any further suggestions?
diff mbox series

Patch

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 21d55dc539f6..41f02d0baed7 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -615,9 +615,21 @@  int ieee80211_start_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid,
 		 "Requested to start BA session on reserved tid=%d", tid))
 		return -EINVAL;
 
-	if (!pubsta->deflink.ht_cap.ht_supported &&
-	    sta->sdata->vif.bss_conf.chanreq.oper.chan->band != NL80211_BAND_6GHZ)
-		return -EINVAL;
+	if (!pubsta->deflink.ht_cap.ht_supported) {
+		struct ieee80211_vif *vif = &sta->sdata->vif;
+		struct ieee80211_bss_conf *bss_conf;
+
+		rcu_read_lock();
+
+		bss_conf = rcu_dereference(vif->link_conf[pubsta->deflink.link_id]);
+		if (unlikely(!bss_conf) ||
+		    bss_conf->chanreq.oper.chan->band != NL80211_BAND_6GHZ) {
+			rcu_read_unlock();
+			return -EINVAL;
+		}
+
+		rcu_read_unlock();
+	}
 
 	if (WARN_ON_ONCE(!local->ops->ampdu_action))
 		return -EINVAL;