Message ID | 20180621122548.23863-1-me@bobcopeland.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Bob Copeland <me@bobcopeland.com> wrote: > In our environment we are occasionally seeing the following stack trace > in ath10k: > > Unable to handle kernel paging request at virtual address 0000a800 > pgd = c0204000 > [0000a800] *pgd=00000000 > Internal error: Oops: 17 [#1] SMP ARM > Modules linked in: dwc3 dwc3_of_simple phy_qcom_dwc3 nf_nat xt_connmark > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.31 #2 > Hardware name: Generic DT based system > task: c09f4f40 task.stack: c09ee000 > PC is at kfree_skb_list+0x1c/0x2c > LR is at skb_release_data+0x6c/0x108 > pc : [<c065dcc4>] lr : [<c065da5c>] psr: 200f0113 > sp : c09efb68 ip : c09efb80 fp : c09efb7c > r10: 00000000 r9 : 00000000 r8 : 043fddd1 > r7 : bf15d160 r6 : 00000000 r5 : d4ca2f00 r4 : ca7c6480 > r3 : 000000a0 r2 : 01000000 r1 : c0a57470 r0 : 0000a800 > Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5787d Table: 56e6006a DAC: 00000051 > Process swapper/0 (pid: 0, stack limit = 0xc09ee210) > Stack: (0xc09efb68 to 0xc09f0000) > fb60: ca7c6480 d4ca2f00 c09efb9c c09efb80 c065da5c c065dcb4 > fb80: d4ca2f00 00000000 dcbf8400 bf15d160 c09efbb4 c09efba0 c065db28 c065d9fc > fba0: d4ca2f00 00000000 c09efbcc c09efbb8 c065db48 c065db04 d4ca2f00 00000000 > fbc0: c09efbe4 c09efbd0 c065ddd0 c065db38 d4ca2f00 00000000 c09efc64 c09efbe8 > fbe0: bf09bd00 c065dd10 00000003 7fffffff c09efc24 dcbfc9c0 01200000 00000000 > fc00: 00000000 00000000 ddb7e440 c09e9440 c09efc48 1d195000 c09efc7c c09efc28 > fc20: c027bb68 c028aa00 ddb7e4f8 bf13231c ddb7e454 0004091f bf154571 d4ca2f00 > fc40: dcbf8d00 ca7c5df6 bf154538 01200000 00000000 bf154538 c09efd1c c09efc68 > fc60: bf132458 bf09bbbc ca7c5dec 00000041 bf154538 bf154539 000007bf bf154545 > fc80: bf154538 bf154538 bf154538 bf154538 bf154538 00000000 00000000 000016c1 > fca0: 00000001 c09efcb0 01200000 00000000 00000000 00000000 00000000 00000001 > fcc0: bf154539 00000041 00000000 00000007 00000000 000000d0 ffffffff 3160ffff > fce0: 9ad93e97 3e973160 7bf09ad9 0004091f d4ca2f00 c09efdb0 dcbf94e8 00000000 > fd00: dcbf8d00 01200000 00000000 dcbf8d00 c09efd44 c09efd20 bf132544 bf132130 > fd20: dcbf8d00 00000000 d4ca2f00 c09efdb0 00000001 d4ca2f00 c09efdec c09efd48 > fd40: bf133630 bf1324d0 ca7c5cc0 000007c0 c09efd88 c09efd70 c0764230 c02277d8 > fd60: 200f0113 ffffffff dcbf94c8 bf000000 dcbf93b0 dcbf8d00 00000040 dcbf945c > fd80: dcbf94e8 00000000 c09efdcc 00000000 c09efd90 c09efd90 00000000 00000024 > fda0: dcbf8d00 00000000 00000005 dcbf8d00 c09efdb0 c09efdb0 00000000 00000040 > fdc0: c09efdec dcbf8d00 dcbfc9c0 c09ed140 00000040 00000000 00000100 00000040 > fde0: c09efe14 c09efdf0 bf1739b4 bf132840 dcbfc9c0 ddb82140 c09ed140 1d195000 > fe00: 00000001 00000100 c09efe64 c09efe18 c067136c bf173958 ddb7fac8 c09f0d00 > fe20: 001df678 0000012c c09efe28 c09efe28 c09efe30 c09efe30 c0a7fb28 ffffe000 > fe40: c09f008c 00000003 00000008 c0a598c0 00000100 c09f0080 c09efeb4 c09efe68 > fe60: c02096e0 c0671278 c0494584 00000080 dd5c3300 c09f0d00 00000004 001df677 > fe80: 0000000a 00200100 dd5c3300 00000000 00000000 c09eaa70 00000060 dd410800 > fea0: c09ee000 00000000 c09efecc c09efeb8 c0227944 c02094c4 00000000 00000000 > fec0: c09efef4 c09efed0 c0268b64 c02278ac de802000 c09f1b1c c09eff20 c0a16cc0 > fee0: de803000 c09ee000 c09eff1c c09efef8 c020947c c0268ae0 c02103dc 600f0013 > ff00: ffffffff c09eff54 ffffe000 c09ee000 c09eff7c c09eff20 c021448c c0209424 > ff20: 00000001 00000000 00000000 c021ddc0 00000000 00000000 c09f1024 00000001 > ff40: ffffe000 c09f1078 00000000 c09eff7c c09eff80 c09eff70 c02103ec c02103dc > ff60: 600f0013 ffffffff 00000051 00000000 c09eff8c c09eff80 c0763cc4 c02103bc > ff80: c09effa4 c09eff90 c025f0e4 c0763c98 c0a59040 c09f1000 c09effb4 c09effa8 > ffa0: c075efe0 c025efd4 c09efff4 c09effb8 c097dcac c075ef7c ffffffff ffffffff > ffc0: 00000000 c097d6c4 00000000 c09c1a28 c0a59294 c09f101c c09c1a24 c09f61c0 > ffe0: 4220406a 512f04d0 00000000 c09efff8 4220807c c097d95c 00000000 00000000 > [<c065dcc4>] (kfree_skb_list) from [<c065da5c>] (skb_release_data+0x6c/0x108) > [<c065da5c>] (skb_release_data) from [<c065db28>] (skb_release_all+0x30/0x34) > [<c065db28>] (skb_release_all) from [<c065db48>] (__kfree_skb+0x1c/0x9c) > [<c065db48>] (__kfree_skb) from [<c065ddd0>] (consume_skb+0xcc/0xd8) > [<c065ddd0>] (consume_skb) from [<bf09bd00>] (ieee80211_rx_napi+0x150/0x82c [mac80211]) > [<bf09bd00>] (ieee80211_rx_napi [mac80211]) from [<bf132458>] (ath10k_htt_t2h_msg_handler+0x15e8/0x19c4 [ath10k_core]) > [<bf132458>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf132544>] (ath10k_htt_t2h_msg_handler+0x16d4/0x19c4 [ath10k_core]) > [<bf132544>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf133630>] (ath10k_htt_txrx_compl_task+0xdfc/0x12cc [ath10k_core]) > [<bf133630>] (ath10k_htt_txrx_compl_task [ath10k_core]) from [<bf1739b4>] (ath10k_pci_napi_poll+0x68/0xf4 [ath10k_pci]) > [<bf1739b4>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c067136c>] (net_rx_action+0x100/0x33c) > [<c067136c>] (net_rx_action) from [<c02096e0>] (__do_softirq+0x228/0x31c) > [<c02096e0>] (__do_softirq) from [<c0227944>] (irq_exit+0xa4/0x114) > > The trace points to a corrupt skb inside kfree_skb(), seemingly because > one of the shared skb queues is getting corrupted. Most of the skb queues > ath10k uses are local to a single call stack, but three are shared among > multiple codepaths: > > - rx_msdus_q, > - rx_in_ord_compl_q, and > - tx_fetch_ind_q > > Of the three, the first two are manipulated using the unlocked skb_queue > functions without any additional lock protecting them. Use the locked > variants of skb_queue_* functions to protect these manipulations. > > Signed-off-by: Bob Copeland <bobcopeland@fb.com> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Patch applied to ath-next branch of ath.git, thanks. 62652555c616 ath10k: use locked skb_dequeue for rx completions
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index c72d8af122a2..86accfb8eb88 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -1089,7 +1089,7 @@ static void ath10k_htt_rx_h_queue_msdu(struct ath10k *ar, status = IEEE80211_SKB_RXCB(skb); *status = *rx_status; - __skb_queue_tail(&ar->htt.rx_msdus_q, skb); + skb_queue_tail(&ar->htt.rx_msdus_q, skb); } static void ath10k_process_rx(struct ath10k *ar, struct sk_buff *skb) @@ -2810,7 +2810,7 @@ bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb) break; } case HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND: { - __skb_queue_tail(&htt->rx_in_ord_compl_q, skb); + skb_queue_tail(&htt->rx_in_ord_compl_q, skb); return false; } case HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND: @@ -2874,7 +2874,7 @@ static int ath10k_htt_rx_deliver_msdu(struct ath10k *ar, int quota, int budget) if (skb_queue_empty(&ar->htt.rx_msdus_q)) break; - skb = __skb_dequeue(&ar->htt.rx_msdus_q); + skb = skb_dequeue(&ar->htt.rx_msdus_q); if (!skb) break; ath10k_process_rx(ar, skb); @@ -2905,7 +2905,7 @@ int ath10k_htt_txrx_compl_task(struct ath10k *ar, int budget) goto exit; } - while ((skb = __skb_dequeue(&htt->rx_in_ord_compl_q))) { + while ((skb = skb_dequeue(&htt->rx_in_ord_compl_q))) { spin_lock_bh(&htt->rx_ring.lock); ret = ath10k_htt_rx_in_ord_ind(ar, skb); spin_unlock_bh(&htt->rx_ring.lock);
In our environment we are occasionally seeing the following stack trace in ath10k: Unable to handle kernel paging request at virtual address 0000a800 pgd = c0204000 [0000a800] *pgd=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: dwc3 dwc3_of_simple phy_qcom_dwc3 nf_nat xt_connmark CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.31 #2 Hardware name: Generic DT based system task: c09f4f40 task.stack: c09ee000 PC is at kfree_skb_list+0x1c/0x2c LR is at skb_release_data+0x6c/0x108 pc : [<c065dcc4>] lr : [<c065da5c>] psr: 200f0113 sp : c09efb68 ip : c09efb80 fp : c09efb7c r10: 00000000 r9 : 00000000 r8 : 043fddd1 r7 : bf15d160 r6 : 00000000 r5 : d4ca2f00 r4 : ca7c6480 r3 : 000000a0 r2 : 01000000 r1 : c0a57470 r0 : 0000a800 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5787d Table: 56e6006a DAC: 00000051 Process swapper/0 (pid: 0, stack limit = 0xc09ee210) Stack: (0xc09efb68 to 0xc09f0000) fb60: ca7c6480 d4ca2f00 c09efb9c c09efb80 c065da5c c065dcb4 fb80: d4ca2f00 00000000 dcbf8400 bf15d160 c09efbb4 c09efba0 c065db28 c065d9fc fba0: d4ca2f00 00000000 c09efbcc c09efbb8 c065db48 c065db04 d4ca2f00 00000000 fbc0: c09efbe4 c09efbd0 c065ddd0 c065db38 d4ca2f00 00000000 c09efc64 c09efbe8 fbe0: bf09bd00 c065dd10 00000003 7fffffff c09efc24 dcbfc9c0 01200000 00000000 fc00: 00000000 00000000 ddb7e440 c09e9440 c09efc48 1d195000 c09efc7c c09efc28 fc20: c027bb68 c028aa00 ddb7e4f8 bf13231c ddb7e454 0004091f bf154571 d4ca2f00 fc40: dcbf8d00 ca7c5df6 bf154538 01200000 00000000 bf154538 c09efd1c c09efc68 fc60: bf132458 bf09bbbc ca7c5dec 00000041 bf154538 bf154539 000007bf bf154545 fc80: bf154538 bf154538 bf154538 bf154538 bf154538 00000000 00000000 000016c1 fca0: 00000001 c09efcb0 01200000 00000000 00000000 00000000 00000000 00000001 fcc0: bf154539 00000041 00000000 00000007 00000000 000000d0 ffffffff 3160ffff fce0: 9ad93e97 3e973160 7bf09ad9 0004091f d4ca2f00 c09efdb0 dcbf94e8 00000000 fd00: dcbf8d00 01200000 00000000 dcbf8d00 c09efd44 c09efd20 bf132544 bf132130 fd20: dcbf8d00 00000000 d4ca2f00 c09efdb0 00000001 d4ca2f00 c09efdec c09efd48 fd40: bf133630 bf1324d0 ca7c5cc0 000007c0 c09efd88 c09efd70 c0764230 c02277d8 fd60: 200f0113 ffffffff dcbf94c8 bf000000 dcbf93b0 dcbf8d00 00000040 dcbf945c fd80: dcbf94e8 00000000 c09efdcc 00000000 c09efd90 c09efd90 00000000 00000024 fda0: dcbf8d00 00000000 00000005 dcbf8d00 c09efdb0 c09efdb0 00000000 00000040 fdc0: c09efdec dcbf8d00 dcbfc9c0 c09ed140 00000040 00000000 00000100 00000040 fde0: c09efe14 c09efdf0 bf1739b4 bf132840 dcbfc9c0 ddb82140 c09ed140 1d195000 fe00: 00000001 00000100 c09efe64 c09efe18 c067136c bf173958 ddb7fac8 c09f0d00 fe20: 001df678 0000012c c09efe28 c09efe28 c09efe30 c09efe30 c0a7fb28 ffffe000 fe40: c09f008c 00000003 00000008 c0a598c0 00000100 c09f0080 c09efeb4 c09efe68 fe60: c02096e0 c0671278 c0494584 00000080 dd5c3300 c09f0d00 00000004 001df677 fe80: 0000000a 00200100 dd5c3300 00000000 00000000 c09eaa70 00000060 dd410800 fea0: c09ee000 00000000 c09efecc c09efeb8 c0227944 c02094c4 00000000 00000000 fec0: c09efef4 c09efed0 c0268b64 c02278ac de802000 c09f1b1c c09eff20 c0a16cc0 fee0: de803000 c09ee000 c09eff1c c09efef8 c020947c c0268ae0 c02103dc 600f0013 ff00: ffffffff c09eff54 ffffe000 c09ee000 c09eff7c c09eff20 c021448c c0209424 ff20: 00000001 00000000 00000000 c021ddc0 00000000 00000000 c09f1024 00000001 ff40: ffffe000 c09f1078 00000000 c09eff7c c09eff80 c09eff70 c02103ec c02103dc ff60: 600f0013 ffffffff 00000051 00000000 c09eff8c c09eff80 c0763cc4 c02103bc ff80: c09effa4 c09eff90 c025f0e4 c0763c98 c0a59040 c09f1000 c09effb4 c09effa8 ffa0: c075efe0 c025efd4 c09efff4 c09effb8 c097dcac c075ef7c ffffffff ffffffff ffc0: 00000000 c097d6c4 00000000 c09c1a28 c0a59294 c09f101c c09c1a24 c09f61c0 ffe0: 4220406a 512f04d0 00000000 c09efff8 4220807c c097d95c 00000000 00000000 [<c065dcc4>] (kfree_skb_list) from [<c065da5c>] (skb_release_data+0x6c/0x108) [<c065da5c>] (skb_release_data) from [<c065db28>] (skb_release_all+0x30/0x34) [<c065db28>] (skb_release_all) from [<c065db48>] (__kfree_skb+0x1c/0x9c) [<c065db48>] (__kfree_skb) from [<c065ddd0>] (consume_skb+0xcc/0xd8) [<c065ddd0>] (consume_skb) from [<bf09bd00>] (ieee80211_rx_napi+0x150/0x82c [mac80211]) [<bf09bd00>] (ieee80211_rx_napi [mac80211]) from [<bf132458>] (ath10k_htt_t2h_msg_handler+0x15e8/0x19c4 [ath10k_core]) [<bf132458>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf132544>] (ath10k_htt_t2h_msg_handler+0x16d4/0x19c4 [ath10k_core]) [<bf132544>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf133630>] (ath10k_htt_txrx_compl_task+0xdfc/0x12cc [ath10k_core]) [<bf133630>] (ath10k_htt_txrx_compl_task [ath10k_core]) from [<bf1739b4>] (ath10k_pci_napi_poll+0x68/0xf4 [ath10k_pci]) [<bf1739b4>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c067136c>] (net_rx_action+0x100/0x33c) [<c067136c>] (net_rx_action) from [<c02096e0>] (__do_softirq+0x228/0x31c) [<c02096e0>] (__do_softirq) from [<c0227944>] (irq_exit+0xa4/0x114) The trace points to a corrupt skb inside kfree_skb(), seemingly because one of the shared skb queues is getting corrupted. Most of the skb queues ath10k uses are local to a single call stack, but three are shared among multiple codepaths: - rx_msdus_q, - rx_in_ord_compl_q, and - tx_fetch_ind_q Of the three, the first two are manipulated using the unlocked skb_queue functions without any additional lock protecting them. Use the locked variants of skb_queue_* functions to protect these manipulations. Signed-off-by: Bob Copeland <bobcopeland@fb.com> --- I may have misunderstood the locking regime here, so please let me know if so. This seems to have fixed the crash, but I don't know a reproducer other than "wait a while and see." drivers/net/wireless/ath/ath10k/htt_rx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)