[v2] mac80211: remove warning message

Message ID	1557824507-17668-1-git-send-email-yiboz@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org> sender: yiboz@codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 290BB60AD1; Tue, 14 May 2019 09:02:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 290BB60AD1 From: Yibo Zhao <yiboz@codeaurora.org> To: linux-wireless@vger.kernel.org Subject: [PATCH v2] mac80211: remove warning message Date: Tue, 14 May 2019 17:01:47 +0800 Message-Id: <1557824507-17668-1-git-send-email-yiboz@codeaurora.org> Precedence: list Cc: Zhi Chen <zhichen@codeaurora.org>, Yibo Zhao <yiboz@codeaurora.org>, ath10k@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "ath10k" <ath10k-bounces@lists.infradead.org> Errors-To: ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org
Series	[v2] mac80211: remove warning message \| expand [v2] mac80211: remove warning message

Yibo Zhao May 14, 2019, 9:01 a.m. UTC

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.

If this case can be hit during normal functionality, there should be no
WARN_ON(). Those should be reserved to cases that are not supposed to be
hit at all or some other more specific cases like indicating obsolete
interface.

Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Yibo Zhao <yiboz@codeaurora.org>
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Johannes Berg May 14, 2019, 9:05 a.m. UTC | #1

On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:
> In multiple SSID cases, it takes time to prepare every AP interface
> to be ready in initializing phase. If a sta already knows everything it
> needs to join one of the APs and sends authentication to the AP which
> is not fully prepared at this point of time, AP's channel context
> could be NULL. As a result, warning message occurs.
> 
Err, what was the point in sending v2 without any changes?

johannes

Yibo Zhao May 14, 2019, 9:10 a.m. UTC | #2

On 2019-05-14 17:05, Johannes Berg wrote:
> On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:
>> In multiple SSID cases, it takes time to prepare every AP interface
>> to be ready in initializing phase. If a sta already knows everything 
>> it
>> needs to join one of the APs and sends authentication to the AP which
>> is not fully prepared at this point of time, AP's channel context
>> could be NULL. As a result, warning message occurs.
>> 
> Err, what was the point in sending v2 without any changes?
> 
> johannes
Hi Johannes,

I was planning to use WARN_ON_ONCE() in the first place to replace 
WARN_ON() then after some discussion, we think removing it could be 
better. So the first patch was based on my first version which is sent 
incorrectly. Please check again.

Sorry for the confusing.

Johannes Berg May 14, 2019, 9:12 a.m. UTC | #3

On Tue, 2019-05-14 at 17:10 +0800, Yibo Zhao wrote:
> On 2019-05-14 17:05, Johannes Berg wrote:
> > On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:
> > > In multiple SSID cases, it takes time to prepare every AP interface
> > > to be ready in initializing phase. If a sta already knows everything 
> > > it
> > > needs to join one of the APs and sends authentication to the AP which
> > > is not fully prepared at this point of time, AP's channel context
> > > could be NULL. As a result, warning message occurs.
> > > 
> > 
> > Err, what was the point in sending v2 without any changes?
> > 
> > johannes
> 
> Hi Johannes,
> 
> I was planning to use WARN_ON_ONCE() in the first place to replace 
> WARN_ON() then after some discussion, we think removing it could be 
> better. So the first patch was based on my first version which is sent 
> incorrectly. Please check again.

Oops, I didn't pay attention to the - code.

I guess changing it to WARN_ON_ONCE() makes sense, but as per my earlier
email I'm really not sure about removing it entirely, it doesn't seem
like a valid scenario and we should take steps elsewhere to prevent it.

johannes

Joe Perches May 14, 2019, 3:44 p.m. UTC | #4

On Tue, 2019-05-14 at 11:12 +0200, Johannes Berg wrote:
> On Tue, 2019-05-14 at 17:10 +0800, Yibo Zhao wrote:
> > On 2019-05-14 17:05, Johannes Berg wrote:
> > > On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:
> > > > In multiple SSID cases, it takes time to prepare every AP interface
> > > > to be ready in initializing phase. If a sta already knows everything 
> > > > it
> > > > needs to join one of the APs and sends authentication to the AP which
> > > > is not fully prepared at this point of time, AP's channel context
> > > > could be NULL. As a result, warning message occurs.
[]
> > I was planning to use WARN_ON_ONCE() in the first place to replace 
> > WARN_ON() then after some discussion, we think removing it could be 
> > better. So the first patch was based on my first version which is sent 
> > incorrectly. Please check again.
[]
> I guess changing it to WARN_ON_ONCE() makes sense,

WARN_ON_RATELIMIT exists.

> but as per my earlier
> email I'm really not sure about removing it entirely, it doesn't seem
> like a valid scenario and we should take steps elsewhere to prevent it.

Ben Greear May 14, 2019, 5:55 p.m. UTC | #5

On 5/14/19 8:44 AM, Joe Perches wrote:
> On Tue, 2019-05-14 at 11:12 +0200, Johannes Berg wrote:
>> On Tue, 2019-05-14 at 17:10 +0800, Yibo Zhao wrote:
>>> On 2019-05-14 17:05, Johannes Berg wrote:
>>>> On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:
>>>>> In multiple SSID cases, it takes time to prepare every AP interface
>>>>> to be ready in initializing phase. If a sta already knows everything
>>>>> it
>>>>> needs to join one of the APs and sends authentication to the AP which
>>>>> is not fully prepared at this point of time, AP's channel context
>>>>> could be NULL. As a result, warning message occurs.
> []
>>> I was planning to use WARN_ON_ONCE() in the first place to replace
>>> WARN_ON() then after some discussion, we think removing it could be
>>> better. So the first patch was based on my first version which is sent
>>> incorrectly. Please check again.
> []
>> I guess changing it to WARN_ON_ONCE() makes sense,
> 
> WARN_ON_RATELIMIT exists.

We know the WARN hits, we have the backtrace, and it is easy enough (in my setup
at least) to reproduce this.  So, the WARN logic has done its job.

Having more of these spam the kernel doesn't add much benefit I think.

Anyone have any suggestions on how to fix the underlying issue?

Thanks,
Ben

Johannes Berg May 14, 2019, 6:40 p.m. UTC | #6

> We know the WARN hits, we have the backtrace, and it is easy enough (in my setup
> at least) to reproduce this.  So, the WARN logic has done its job.
> 
> Having more of these spam the kernel doesn't add much benefit I think.

Well, this doesn't necessarily just catch a *single* issue, so it should
remain for the future, I'd think.

> Anyone have any suggestions on how to fix the underlying issue?

I don't even have the backtrace and scenario that causes it.

johannes

Ben Greear May 14, 2019, 6:54 p.m. UTC | #7

On 5/14/19 11:40 AM, Johannes Berg wrote:
> 
>> We know the WARN hits, we have the backtrace, and it is easy enough (in my setup
>> at least) to reproduce this.  So, the WARN logic has done its job.
>>
>> Having more of these spam the kernel doesn't add much benefit I think.
> 
> Well, this doesn't necessarily just catch a *single* issue, so it should
> remain for the future, I'd think.
> 
>> Anyone have any suggestions on how to fix the underlying issue?
> 
> I don't even have the backtrace and scenario that causes it.
> 
> johannes
> 

Here is the info I have in my commit that changed this to WARN_ON_ONCE.
I never posted it because I had to hack ath10k to get to this state, so maybe
this is not a valid case to debug.


Maybe Yibo Zhao has a better example.

     mac80211: don't spam kernel logs when chantx is null.

     I set up ath10k to be chandef based again in order to test
     WDS.  My WDS stations are not very functional yet, and
     when ethtool stats are queried, there is a WARN_ON splat
     generated.  Change this to WARN_ON_ONCE so that there is
     less kernel spam.

     [ 2401.445631] WARNING: CPU: 1 PID: 14070 at /home/greearb/git/linux-4.13.dev.y/net/mac80211/ieee80211_i.h:1452 sta_set_rate_info_tx+0x18c/0x1a0 [mac80211]
     [ 2401.445727] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink wanlink(O) ath10k_pci ath10k_core mac80211_hwsim ath5k ath9k ath9k_common 
ath9k_hw ath mac80211 cfg80211 nf_defrag_ipv4 libcrc32c 8021q garp mrp stp llc fuse macvlan pktgen nfsv3 nfs fscache amd64_edac_mod edac_mce_amd kvm_amd kvm 
irqbypass sp5100_tco fam15h_power k10temp i2c_piix4 ccp shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc sdhci_pci sdhci mmc_core 
igb hwmon ptp pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt [last unloaded: nfnetlink]
     [ 2401.445911] CPU: 1 PID: 14070 Comm: btserver Tainted: G        W  O    4.13.11+ #18
     [ 2401.445914] Hardware name: PC Engines apu2/apu2, BIOS 88a4f96 03/07/2016
     [ 2401.445918] task: ffff880118c73b80 task.stack: ffffc90001400000
     [ 2401.445973] RIP: 0010:sta_set_rate_info_tx+0x18c/0x1a0 [mac80211]
     [ 2401.446003] RSP: 0018:ffffc90001403820 EFLAGS: 00010246
     [ 2401.446007] RAX: 0000000000000000 RBX: ffff8800ca38e4a0 RCX: 000000000000000c
     [ 2401.446010] RDX: 0000000000000000 RSI: ffff8800ca38e4a0 RDI: ffff8800ca38e000
     [ 2401.446013] RBP: ffffc90001403850 R08: ffffffffa04437a0 R09: 0000000000002000
     [ 2401.446016] R10: 000000001183be82 R11: 0000000000000000 R12: ffffc90001403970
     [ 2401.446018] R13: 0000000000000000 R14: ffff8800c01d8900 R15: ffff8800ca180780
     [ 2401.446023] FS:  00007f8123ed7740(0000) GS:ffff88011ec80000(0000) knlGS:0000000000000000
     [ 2401.446026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [ 2401.446029] CR2: 00000000036e8018 CR3: 00000000c0b29000 CR4: 00000000000406e0
     [ 2401.446034] Call Trace:
     [ 2401.446140]  sta_set_sinfo+0x629/0x8b0 [mac80211]
     [ 2401.446192]  ieee80211_get_stats+0x3f2/0x8c0 [mac80211]
     [ 2401.446207]  ? __nla_put+0x20/0x30
     [ 2401.446221]  ? __kmalloc_reserve.isra.35+0x2c/0x80
     [ 2401.446229]  ? netlink_deliver_tap+0x2d/0x1e0
     [ 2401.446235]  ? sock_def_readable+0x6d/0x70
     [ 2401.446239]  ? __netlink_sendskb+0x36/0x40
     [ 2401.446245]  ? netlink_unicast+0x1b0/0x1f0
     [ 2401.446252]  ? rtnl_getlink+0x135/0x1c0
     [ 2401.446261]  ? get_page_from_freelist+0x913/0xac0
     [ 2401.446270]  ? vmap_page_range_noflush+0x27d/0x370
     [ 2401.446277]  ? map_vm_area+0x31/0x40
     [ 2401.446284]  ? __vmalloc_node_range+0x21f/0x270
     [ 2401.446319]  dev_ethtool+0x11d1/0x1ce0
     [ 2401.446325]  ? __rtnl_unlock+0x25/0x50
     [ 2401.446330]  ? netdev_run_todo+0x4d/0x2e0
     [ 2401.446338]  ? dev_get_by_name_rcu+0x6f/0xa0
     [ 2401.446344]  dev_ioctl+0x330/0x550
     [ 2401.446349]  ? reuse_swap_page+0x30/0x100
     [ 2401.446355]  sock_do_ioctl+0x3d/0x50
     [ 2401.446359]  ? sock_do_ioctl+0x3d/0x50
     [ 2401.446363]  sock_ioctl+0x1e5/0x2a0
     [ 2401.446370]  do_vfs_ioctl+0x8b/0x5b0
     [ 2401.446376]  ? getnstimeofday64+0x9/0x20
     [ 2401.446383]  ? __audit_syscall_entry+0xba/0x110
     [ 2401.446391]  ? syscall_trace_enter+0x1b0/0x2b0
     [ 2401.446395]  SyS_ioctl+0x74/0x80
     [ 2401.446400]  ? __audit_syscall_exit+0x215/0x2b0
     [ 2401.446405]  do_syscall_64+0x5c/0x190
     [ 2401.446412]  entry_SYSCALL64_slow_path+0x25/0x25


Thanks,
Ben

Johannes Berg May 14, 2019, 6:57 p.m. UTC | #8

On Tue, 2019-05-14 at 11:54 -0700, Ben Greear wrote:
> 
> Here is the info I have in my commit that changed this to WARN_ON_ONCE.
> I never posted it because I had to hack ath10k to get to this state, so maybe
> this is not a valid case to debug.
> 
> 
> Maybe Yibo Zhao has a better example.
> 
>      mac80211: don't spam kernel logs when chantx is null.
> 
>      I set up ath10k to be chandef based again in order to test
>      WDS.  My WDS stations are not very functional yet, and
>      when ethtool stats are queried, there is a WARN_ON splat
>      generated.  Change this to WARN_ON_ONCE so that there is
>      less kernel spam.

I'm totally fine with WARN_ON_ONCE, FWIW.

Sounds like different bugs though. You're talking about WDS here, and
Yibo was talking about something with AP interfaces prematurely
accepting frames or so.

johannes

Yibo Zhao May 20, 2019, 1:56 p.m. UTC | #9

On 2019-05-15 02:57, Johannes Berg wrote:
> On Tue, 2019-05-14 at 11:54 -0700, Ben Greear wrote:
>> 
>> Here is the info I have in my commit that changed this to 
>> WARN_ON_ONCE.
>> I never posted it because I had to hack ath10k to get to this state, 
>> so maybe
>> this is not a valid case to debug.
>> 
>> 
>> Maybe Yibo Zhao has a better example.
>> 
>>      mac80211: don't spam kernel logs when chantx is null.
>> 
>>      I set up ath10k to be chandef based again in order to test
>>      WDS.  My WDS stations are not very functional yet, and
>>      when ethtool stats are queried, there is a WARN_ON splat
>>      generated.  Change this to WARN_ON_ONCE so that there is
>>      less kernel spam.
> 
> I'm totally fine with WARN_ON_ONCE, FWIW.
> 
> Sounds like different bugs though. You're talking about WDS here, and
> Yibo was talking about something with AP interfaces prematurely
> accepting frames or so.

Yes, they might be different bugs that hit the same point. Looks like 
others found this too many warnings issue as well. Then I believe 
WARN_ON_ONCE() seems to be our solution for now.

> 
> johannes

Yibo Zhao June 14, 2019, 2:52 a.m. UTC | #10

On 2019-05-20 21:56, Yibo Zhao wrote:
> On 2019-05-15 02:57, Johannes Berg wrote:
>> On Tue, 2019-05-14 at 11:54 -0700, Ben Greear wrote:
>>> 
>>> Here is the info I have in my commit that changed this to 
>>> WARN_ON_ONCE.
>>> I never posted it because I had to hack ath10k to get to this state, 
>>> so maybe
>>> this is not a valid case to debug.
>>> 
>>> 
>>> Maybe Yibo Zhao has a better example.
>>> 
>>>      mac80211: don't spam kernel logs when chantx is null.
>>> 
>>>      I set up ath10k to be chandef based again in order to test
>>>      WDS.  My WDS stations are not very functional yet, and
>>>      when ethtool stats are queried, there is a WARN_ON splat
>>>      generated.  Change this to WARN_ON_ONCE so that there is
>>>      less kernel spam.
>> 
>> I'm totally fine with WARN_ON_ONCE, FWIW.
>> 
>> Sounds like different bugs though. You're talking about WDS here, and
>> Yibo was talking about something with AP interfaces prematurely
>> accepting frames or so.
> 
> Yes, they might be different bugs that hit the same point. Looks like
> others found this too many warnings issue as well. Then I believe
> WARN_ON_ONCE() seems to be our solution for now.
> 
Hi Johannes,

May I know if it is fine that WARN_ON_ONCE() to be applied in kernel in 
the future? If a separate patch for it is needed, please let me know so 
that I can raise a new one.

>> 
>> johannes

Johannes Berg June 14, 2019, 7:22 a.m. UTC | #11

On Fri, 2019-06-14 at 10:52 +0800, Yibo Zhao wrote:
> 
> May I know if it is fine that WARN_ON_ONCE() to be applied in kernel in 
> the future? If a separate patch for it is needed, please let me know so 
> that I can raise a new one.

Please do send a new patch.

johannes

[v2] mac80211: remove warning message

Commit Message

Comments

Patch