diff mbox series

[v2] wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()

Message ID 9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [v2] wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop() | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: davem@davemloft.net pabeni@redhat.com kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tetsuo Handa July 17, 2022, 12:21 p.m. UTC
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.

According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().

Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
---
Changes in v2:
  Use synchronize_rcu() instead of initializing fq.lock early.

This bug is current top crasher for syzbot. Please fix as soon as possible.

 net/mac80211/iface.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Toke Høiland-Jørgensen July 18, 2022, 11:22 a.m. UTC | #1
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
> for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
> that is being stopped") guards clear_bit() using fq.lock even before
> fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
>
> According to discussion [2], Toke was not happy with expanding usage of
> fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
> can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().

Ah, that's a neat solution! :)

Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Kalle Valo July 18, 2022, 12:01 p.m. UTC | #2
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
> for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
> that is being stopped") guards clear_bit() using fq.lock even before
> fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
> 
> According to discussion [2], Toke was not happy with expanding usage of
> fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
> can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
> 
> Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
> Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
> Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
> Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>

Patch applied to wireless-next.git, thanks.

3598cb6e1862 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
Tetsuo Handa July 26, 2022, 6:55 a.m. UTC | #3
Since this patch fixes a regression introduced in 5.19-rc7, can this patch go to 5.19-final ?

syzbot is failing to test linux.git for 12 days due to this regression.
syzbot will fail to bisect new bugs found in the upcoming merge window
if unable to test v5.19 due to this regression.

On 2022/07/18 21:01, Kalle Valo wrote:
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
>> for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
>> that is being stopped") guards clear_bit() using fq.lock even before
>> fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
>>
>> According to discussion [2], Toke was not happy with expanding usage of
>> fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
>> can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
>>
>> Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
>> Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
>> Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
>> Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
> 
> Patch applied to wireless-next.git, thanks.
> 
> 3598cb6e1862 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
>
Kalle Valo July 26, 2022, 2:38 p.m. UTC | #4
(please don't top post, I manually fixed that)

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> On 2022/07/18 21:01, Kalle Valo wrote:
>> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>> 
>>> lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
>>> for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
>>> that is being stopped") guards clear_bit() using fq.lock even before
>>> fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
>>>
>>> According to discussion [2], Toke was not happy with expanding usage of
>>> fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
>>> can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
>>>
>>> Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
>>> Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
>>> Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
>>> Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>>> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
>> 
>> Patch applied to wireless-next.git, thanks.
>> 
>> 3598cb6e1862 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
>
> Since this patch fixes a regression introduced in 5.19-rc7, can this patch go to 5.19-final ?
>
> syzbot is failing to test linux.git for 12 days due to this regression.
> syzbot will fail to bisect new bugs found in the upcoming merge window
> if unable to test v5.19 due to this regression.

I took this to wireless-next as I didn't think there's enough time to
get this to v5.19 (and I only heard Linus' -rc8 plans after the fact).
So this will be in v5.20-rc1 and I recommend pushing this to a v5.19
stable release.
Ben Greear July 26, 2022, 3:05 p.m. UTC | #5
On 7/26/22 7:38 AM, Kalle Valo wrote:
> (please don't top post, I manually fixed that)
> 
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> 
>> On 2022/07/18 21:01, Kalle Valo wrote:
>>> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>
>>>> lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
>>>> for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
>>>> that is being stopped") guards clear_bit() using fq.lock even before
>>>> fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
>>>>
>>>> According to discussion [2], Toke was not happy with expanding usage of
>>>> fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
>>>> can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
>>>>
>>>> Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
>>>> Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
>>>> Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>>> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
>>>> Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
>>>> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
>>>
>>> Patch applied to wireless-next.git, thanks.
>>>
>>> 3598cb6e1862 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
>>
>> Since this patch fixes a regression introduced in 5.19-rc7, can this patch go to 5.19-final ?
>>
>> syzbot is failing to test linux.git for 12 days due to this regression.
>> syzbot will fail to bisect new bugs found in the upcoming merge window
>> if unable to test v5.19 due to this regression.
> 
> I took this to wireless-next as I didn't think there's enough time to
> get this to v5.19 (and I only heard Linus' -rc8 plans after the fact).
> So this will be in v5.20-rc1 and I recommend pushing this to a v5.19
> stable release.

Would it be worth reverting the patch that broke things until the first stable 5.19.x
tree then?  Seems lame to ship an official kernel with a known bug like this.

Thanks,
Ben
Jakub Kicinski July 26, 2022, 9:38 p.m. UTC | #6
On Tue, 26 Jul 2022 08:05:12 -0700 Ben Greear wrote:
> >> Since this patch fixes a regression introduced in 5.19-rc7, can this patch go to 5.19-final ?
> >>
> >> syzbot is failing to test linux.git for 12 days due to this regression.
> >> syzbot will fail to bisect new bugs found in the upcoming merge window
> >> if unable to test v5.19 due to this regression.  
> > 
> > I took this to wireless-next as I didn't think there's enough time to
> > get this to v5.19 (and I only heard Linus' -rc8 plans after the fact).
> > So this will be in v5.20-rc1 and I recommend pushing this to a v5.19
> > stable release.  
> 
> Would it be worth reverting the patch that broke things until the first stable 5.19.x
> tree then?  Seems lame to ship an official kernel with a known bug like this.

I cherry-picked the fix across the trees after talking to Kalle and
DaveM. Let's see how that goes...
Tetsuo Handa July 28, 2022, 11 p.m. UTC | #7
On 2022/07/27 6:38, Jakub Kicinski wrote:
> On Tue, 26 Jul 2022 08:05:12 -0700 Ben Greear wrote:
>>>> Since this patch fixes a regression introduced in 5.19-rc7, can this patch go to 5.19-final ?
>>>>
>>>> syzbot is failing to test linux.git for 12 days due to this regression.
>>>> syzbot will fail to bisect new bugs found in the upcoming merge window
>>>> if unable to test v5.19 due to this regression.  
>>>
>>> I took this to wireless-next as I didn't think there's enough time to
>>> get this to v5.19 (and I only heard Linus' -rc8 plans after the fact).
>>> So this will be in v5.20-rc1 and I recommend pushing this to a v5.19
>>> stable release.  
>>
>> Would it be worth reverting the patch that broke things until the first stable 5.19.x
>> tree then?  Seems lame to ship an official kernel with a known bug like this.
> 
> I cherry-picked the fix across the trees after talking to Kalle and
> DaveM. Let's see how that goes...

This patch successfully arrived at linux.git, in time for 5.19-final.

Thank you.
diff mbox series

Patch

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 15a73b7fdd75..1a9ada411879 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -377,9 +377,8 @@  static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, bool going_do
 	bool cancel_scan;
 	struct cfg80211_nan_func *func;
 
-	spin_lock_bh(&local->fq.lock);
 	clear_bit(SDATA_STATE_RUNNING, &sdata->state);
-	spin_unlock_bh(&local->fq.lock);
+	synchronize_rcu(); /* flush _ieee80211_wake_txqs() */
 
 	cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
 	if (cancel_scan)