ath10k: Don't allow tx logic when state is not ON.
diff mbox

Message ID 1529506583-20204-1-git-send-email-greearb@candelatech.com
State New
Headers show

Commit Message

Ben Greear June 20, 2018, 2:56 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This is against the 4.16 kernel, likely applies to later kernels
as well.  Firmware is beta ath10k-ct firmware for 9984 NIC.  The
patch is not firmware or chipset specific.

When firmware crashes, packets can still be sent from the
mac80211 stack, and that can cause crashes in the ath10k
tx path.

After adding this patch, I saw cases where the tx path was
called in state ATH10K_STATE_RESTARTED.

I have not tested the tx_64 path, but assume it has similar
issues, so same patch was added to it.

Here is example of crash without the patch:

The line that crashes decodes as:

(gdb) l *(ath10k_htt_tx_32+0x18ba)
0x74a9a is in ath10k_htt_tx_32 (/home/greearb/git/linux-4.16.dev.y/drivers/net/wireless/ath/ath10k/htt_tx.c:1257).
1252				       sizeof(struct htt_msdu_ext_desc));
1253				frags = (struct htt_data_tx_desc_frag *)
1254					&ext_desc_t[msdu_id].frags;
1255				ext_desc = &ext_desc_t[msdu_id];
1256				frags[0].tword_addr.paddr_lo =
1257					__cpu_to_le32(skb_cb->paddr);
1258				frags[0].tword_addr.paddr_hi = 0;
1259				frags[0].tword_addr.len_16 = __cpu_to_le16(msdu->len);
1260
1261				frags_paddr =  htt->frag_desc.paddr +

ath10k_pci 0000:04:00.0: ATH10K_END
ath10k_pci 0000:04:00.0: firmware crashed! (guid 033040e0-a2e0-499c-b2a2-3e06832c649e)
ath10k_pci 0000:04:00.0: firmware register dump:
ath10k_pci 0000:04:00.0: [00]: 0x0000000A 0x00000000 0x00000000 0x00000000
... [snipped rest of crash dump for brevity] ...

ath10k_pci 0000:04:00.0: wmi unified ready event not received
ath10k_pci 0000:04:00.0: Could not init core: -110

Comments

Manikanta Pubbisetty June 20, 2018, 5:37 p.m. UTC | #1
> From: Ben Greear <greearb@candelatech.com>
>
> This is against the 4.16 kernel, likely applies to later kernels
> as well.  Firmware is beta ath10k-ct firmware for 9984 NIC.  The
> patch is not firmware or chipset specific.
>
> When firmware crashes, packets can still be sent from the
> mac80211 stack, and that can cause crashes in the ath10k
> tx path.

Yes, this is what I had observed and explained in the RFC patch which I 
have sent some time back. In my case the device never comes up and the 
entire system crashes; I hope you have observed the same behavior.

> After adding this patch, I saw cases where the tx path was
> called in state ATH10K_STATE_RESTARTED.
>
> I have not tested the tx_64 path, but assume it has similar
> issues, so same patch was added to it.

This case should not be dealt in ath10k, rather we should make sure 
packets does not reach the driver during hardware restart after the 
firmware crash.
Can you please try the RFC patch and see if it fixes the issue? I am 
working on the patch and probably send it out after doing some testing 
and cleanup.

https://patchwork.kernel.org/patch/10411967/

Thanks,
Manikanta
Ben Greear June 20, 2018, 5:42 p.m. UTC | #2
On 06/20/2018 10:37 AM, Manikanta Pubbisetty wrote:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This is against the 4.16 kernel, likely applies to later kernels
>> as well.  Firmware is beta ath10k-ct firmware for 9984 NIC.  The
>> patch is not firmware or chipset specific.
>>
>> When firmware crashes, packets can still be sent from the
>> mac80211 stack, and that can cause crashes in the ath10k
>> tx path.
>
> Yes, this is what I had observed and explained in the RFC patch which I have sent some time back. In my case the device never comes up and the entire system
> crashes; I hope you have observed the same behavior.
>
>> After adding this patch, I saw cases where the tx path was
>> called in state ATH10K_STATE_RESTARTED.
>>
>> I have not tested the tx_64 path, but assume it has similar
>> issues, so same patch was added to it.
>
> This case should not be dealt in ath10k, rather we should make sure packets does not reach the driver during hardware restart after the firmware crash.
> Can you please try the RFC patch and see if it fixes the issue? I am working on the patch and probably send it out after doing some testing and cleanup.
>
> https://patchwork.kernel.org/patch/10411967/

I did see your patch, but I was not sure it would make it upstream.

I think either way my patch might be useful in case bugs creep back in.

And yes, I did see full system crash in this case.

Thanks,
Ben
Rajkumar Manoharan June 20, 2018, 6:48 p.m. UTC | #3
On 2018-06-20 10:42, Ben Greear wrote:
> On 06/20/2018 10:37 AM, Manikanta Pubbisetty wrote:
>> 
[...]
>> This case should not be dealt in ath10k, rather we should make sure 
>> packets does not reach the driver during hardware restart after the 
>> firmware crash.
>> Can you please try the RFC patch and see if it fixes the issue? I am 
>> working on the patch and probably send it out after doing some testing 
>> and cleanup.
>> 
>> https://patchwork.kernel.org/patch/10411967/
> 
> I did see your patch, but I was not sure it would make it upstream.
> 
> I think either way my patch might be useful in case bugs creep back in.
> 

ar->state check can not be in hot path as it is protected by mutex lock.

-Rajkumar
Ben Greear June 20, 2018, 6:51 p.m. UTC | #4
On 06/20/2018 11:48 AM, Rajkumar Manoharan wrote:
> On 2018-06-20 10:42, Ben Greear wrote:
>> On 06/20/2018 10:37 AM, Manikanta Pubbisetty wrote:
>>>
> [...]
>>> This case should not be dealt in ath10k, rather we should make sure packets does not reach the driver during hardware restart after the firmware crash.
>>> Can you please try the RFC patch and see if it fixes the issue? I am working on the patch and probably send it out after doing some testing and cleanup.
>>>
>>> https://patchwork.kernel.org/patch/10411967/
>>
>> I did see your patch, but I was not sure it would make it upstream.
>>
>> I think either way my patch might be useful in case bugs creep back in.
>>
>
> ar->state check can not be in hot path as it is protected by mutex lock.

If the tx logic is ever called while state is changing, that would seem
to be a bug as well?

Thanks,
Ben
Rajkumar Manoharan June 20, 2018, 9:16 p.m. UTC | #5
On 2018-06-20 11:51, Ben Greear wrote:
> On 06/20/2018 11:48 AM, Rajkumar Manoharan wrote:
>> On 2018-06-20 10:42, Ben Greear wrote:
>>> On 06/20/2018 10:37 AM, Manikanta Pubbisetty wrote:
>>>> 
>> [...]
>>>> This case should not be dealt in ath10k, rather we should make sure 
>>>> packets does not reach the driver during hardware restart after the 
>>>> firmware crash.
>>>> Can you please try the RFC patch and see if it fixes the issue? I am 
>>>> working on the patch and probably send it out after doing some 
>>>> testing and cleanup.
>>>> 
>>>> https://patchwork.kernel.org/patch/10411967/
>>> 
>>> I did see your patch, but I was not sure it would make it upstream.
>>> 
>>> I think either way my patch might be useful in case bugs creep back 
>>> in.
>>> 
>> 
>> ar->state check can not be in hot path as it is protected by mutex 
>> lock.
> 
> If the tx logic is ever called while state is changing, that would seem
> to be a bug as well?
> 
Thats true. The assumption is that packets should not be given to driver
after ieee80211_stop_queues is called. Isn't it? Manikanta is trying to 
fix
some cases where packets are still continued to sent even after queue 
stop.
Worth trying that.

-Rajkumar

Patch
diff mbox

==================================================================
BUG: KASAN: null-ptr-deref in ath10k_htt_tx_32+0x18ba/0x2b00 [ath10k_core]
Write of size 64 at addr 0000000000000000 by task kworker/u8:2/5115
==================================================================
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: memset_erms+0x9/0x10
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP KASAN PTI
Modules linked in: ath10k_pci ath10k_core rpcsec_gss_krb5 nfsv4 nfs fscache nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf 8021q garp mrp stp llc fuse macvlan pktgen lm78 hwmon_vid iTCO_wdt iTCO_vendor_support coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel ath snd_hda_codec_hdmi kvm snd_hda_intel irqbypass mac80211 snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device cfg80211 snd_pcm i2c_i801 snd_timer snd shpchp soundcore mei_wdt intel_pch_thermal acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel serio_raw e1000e i915 igb hwmon dca i2c_algo_bit drm_kms_helper drm i2c_core video ipv6 crc_ccitt [last unloaded: ath10k_core]
CPU: 0 PID: 5115 Comm: kworker/u8:2 Tainted: G    B   W        4.16.15+ #19
Hardware name: _ _/, BIOS 5.11 08/26/2016
Workqueue: phy3 ieee80211_beacon_connection_loss_work [mac80211]
RIP: 0010:memset_erms+0x9/0x10
RSP: 0018:ffff8801489af828 EFLAGS: 00010292
RAX: ffff880146920000 RBX: ffff88014a08d040 RCX: 0000000000000040
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8801121f4fa8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8801489af630 R11: 1ffff10029135e88 R12: ffff8801121f44a0
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffec0000
FS:  0000000000000000(0000) GS:ffff88014de00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000003a14005 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ath10k_htt_tx_32+0x18ba/0x2b00 [ath10k_core]
 ? ath10k_htt_tx_free_msdu_id+0xc0/0xc0 [ath10k_core]
 ? invoke_tx_handlers_late+0x2340/0x2340 [mac80211]
 ath10k_mac_tx+0xd7c/0x1680 [ath10k_core]
 ath10k_mac_tx_push_txq+0x1a2/0x3e0 [ath10k_core]
 ath10k_mac_op_wake_tx_queue+0x2fa/0x4e0 [ath10k_core]
 ieee80211_queue_skb+0x7cf/0xfa0 [mac80211]
 ieee80211_tx+0x259/0x330 [mac80211]
 ? ieee80211_tx_prepare_skb+0x3f0/0x3f0 [mac80211]
 ? ieee80211_xmit+0x26b/0x520 [mac80211]
 __ieee80211_tx_skb_tid_band+0x1e6/0x290 [mac80211]
 ieee80211_send_nullfunc+0x223/0x3f0 [mac80211]
 ieee80211_mgd_probe_ap_send+0x1af/0x4b0 [mac80211]
 ieee80211_mgd_probe_ap.part.22+0x28d/0x380 [mac80211]
 process_one_work+0x5f7/0x14d0
 ? pwq_dec_nr_in_flight+0x2b0/0x2b0
 ? _raw_spin_unlock_irq+0x24/0x40
 worker_thread+0xdc/0x12d0
 ? rescuer_thread+0x12b0/0x12b0
 kthread+0x2cf/0x3c0
 ? kthread_delayed_work_timer_fn+0x1e0/0x1e0
 ret_from_fork+0x24/0x30
ce 48 b8 01 01 01 01 01
RIP: memset_erms+0x9/0x10 RSP: ffff8801489af828
CR2: 0000000000000000
---[ end trace 5e24737a5c492997 ]---
Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath10k/htt_tx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index fa134f4..709bb1d 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -1173,6 +1173,18 @@  static int ath10k_htt_tx_32(struct ath10k_htt *htt,
 	struct htt_msdu_ext_desc *ext_desc_t = NULL;
 	u32 peer_id = HTT_INVALID_PEERID;
 
+	if (ar->state != ATH10K_STATE_ON) {
+		static bool done_once = 0;
+		if (!done_once) {
+			done_once = true;
+			ath10k_err(ar, "Invalid state: %d in ath10k_htt_tx_32, warning will not be repeated.\n",
+				   ar->state);
+			WARN_ON(1);
+		}
+		res = -ENODEV;
+		goto err;
+	}
+
 	if (unlikely(info->flags & IEEE80211_TX_CTL_TX_OFFCHAN))
 		freq = ar->scan.roc_freq;
 
@@ -1409,6 +1421,18 @@  static int ath10k_htt_tx_64(struct ath10k_htt *htt,
 	struct htt_msdu_ext_desc_64 *ext_desc = NULL;
 	struct htt_msdu_ext_desc_64 *ext_desc_t = NULL;
 
+	if (ar->state != ATH10K_STATE_ON) {
+		static bool done_once = 0;
+		if (!done_once) {
+			done_once = true;
+			ath10k_err(ar, "Invalid state: %d in ath10k_htt_tx_64, warning will not be repeated.\n",
+				   ar->state);
+			WARN_ON(1);
+		}
+		res = -ENODEV;
+		goto err;
+	}
+
 	spin_lock_bh(&htt->tx_lock);
 	res = ath10k_htt_tx_alloc_msdu_id(htt, msdu);
 	spin_unlock_bh(&htt->tx_lock);