Message ID | 1598617348-2325-1-git-send-email-wgong@codeaurora.org (mailing list archive) |
---|---|
State | Accepted |
Commit | e2f8b74e58cb1560c1399ba94a470b770e858259 |
Delegated to: | Kalle Valo |
Headers | show |
Series | [v3] ath10k: add flag to protect napi operation to avoid dead loop hang | expand |
On Fri, Aug 28, 2020 at 5:53 PM Wen Gong <wgong@codeaurora.org> wrote: > > It happened "Kernel panic - not syncing: hung_task: blocked tasks" when > test simulate crash and ifconfig down/rmmod meanwhile. > > Test steps: > > 1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC. > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio > echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci > > 2. dmesg: > [ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash > [ 5622.655995] ieee80211 phy0: Hardware restart was requested > [ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds. > [ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds. > [ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks > [ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137 > [ 5776.359846] Hardware name: MediaTek krane sku176 board (DT) > [ 5776.359855] Call trace: > [ 5776.359868] dump_backtrace+0x0/0x170 > [ 5776.359881] show_stack+0x20/0x2c > [ 5776.359896] dump_stack+0xd4/0x10c > [ 5776.359916] panic+0x12c/0x29c > [ 5776.359937] hung_task_panic+0x0/0x50 > [ 5776.359953] kthread+0x120/0x130 > [ 5776.359965] ret_from_fork+0x10/0x18 > [ 5776.359986] SMP: stopping secondary CPUs > [ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000 > [ 5776.360026] CPU features: 0x0,2188200c > [ 5776.360035] Memory Limit: none > > command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked > callstack of ifconfig: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] __dev_change_flags+0xe0/0x1d0 > [<0>] dev_change_flags+0x30/0x6c > [<0>] devinet_ioctl+0x370/0x564 > [<0>] inet_ioctl+0xdc/0x304 > [<0>] sock_do_ioctl+0x50/0x288 > [<0>] compat_sock_ioctl+0x1b4/0x1aac > [<0>] __se_compat_sys_ioctl+0x100/0x26fc > [<0>] __arm64_compat_sys_ioctl+0x20/0x2c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > callstack of rmmod: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] dev_close_many+0x70/0x100 > [<0>] dev_close+0x4c/0x80 > [<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211] > [<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211] > [<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211] > [<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core] > [<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio] > [<0>] sdio_bus_remove+0x48/0x108 > [<0>] device_release_driver_internal+0x138/0x1ec > [<0>] driver_detach+0x6c/0xa8 > [<0>] bus_remove_driver+0x78/0xa8 > [<0>] driver_unregister+0x30/0x50 > [<0>] sdio_unregister_driver+0x28/0x34 > [<0>] cleanup_module+0x14/0x6bc [ath10k_sdio] > [<0>] __arm64_sys_delete_module+0x1e0/0x22c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > SNOC: > [ 647.156863] Call trace: > [ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c > [ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798 > [ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94 > [ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c > [ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40 > [ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38 > [ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc] > [ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core] > [ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core] > [ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core] > [ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211] > [ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211] > [ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211] > [ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211] > [ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc > [ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100 > [ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80 > [ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211] > [ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211] > [ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211] > [ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core] > [ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc] > [ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50 > [ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8 > [ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8 > [ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8 > [ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50 > [ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28 > [ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc] > [ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c > > PCIe: > [ 615.392770] rmmod D 0 3523 3458 0x00000080 > [ 615.392777] Call Trace: > [ 615.392784] __schedule+0x617/0x7d3 > [ 615.392791] ? __mod_timer+0x263/0x35c > [ 615.392797] schedule+0x62/0x72 > [ 615.392803] schedule_timeout+0x8d/0xf3 > [ 615.392809] ? run_local_timers+0x6b/0x6b > [ 615.392814] msleep+0x1b/0x22 > [ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci] > [ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core] > [ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core] > [ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core] > [ 615.392893] drv_stop+0x9a/0x13c [mac80211] > [ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211] > [ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211] > [ 615.392945] __dev_close_many+0x9e/0xf0 > [ 615.392952] dev_close_many+0x62/0xe8 > [ 615.392958] dev_close+0x54/0x7d > [ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211] > [ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211] > [ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211] > [ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core] > [ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core] > [ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci] > [ 615.393117] pci_device_remove+0x41/0xa6 > [ 615.393129] device_release_driver_internal+0x123/0x1ec > [ 615.393140] driver_detach+0x60/0x90 > [ 615.393152] bus_remove_driver+0x72/0x9f > [ 615.393164] pci_unregister_driver+0x1e/0x87 > [ 615.393177] SyS_delete_module+0x1d7/0x277 > [ 615.393188] do_syscall_64+0x6b/0xf7 > [ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > > The test command run simulate_fw_crash firstly and it call into > ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable > is called and bit NAPI_STATE_SCHED is set. After that, function > ath10k_sdio_hif_stop is called again from ath10k_stop by command > "ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked. > > It is blocked by napi_synchronize, napi_disable will set bit with > NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop > becuase bit NAPI_STATE_SCHED is set by napi_disable. > > function of napi_synchronize > static inline void napi_synchronize(const struct napi_struct *n) > { > if (IS_ENABLED(CONFIG_SMP)) > while (test_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > else > barrier(); > } > > function of napi_disable > void napi_disable(struct napi_struct *n) > { > might_sleep(); > set_bit(NAPI_STATE_DISABLE, &n->state); > > while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) > msleep(1); > > hrtimer_cancel(&n->timer); > > clear_bit(NAPI_STATE_DISABLE, &n->state); > } > > Add flag for it avoid the hang and crash. > > Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049 > Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1 > Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2 > > Signed-off-by: Wen Gong <wgong@codeaurora.org> > --- > v3: change all bus type > > v2: change to use flag > drivers/net/wireless/ath/ath10k/ahb.c | 5 ++--- > drivers/net/wireless/ath/ath10k/core.c | 20 ++++++++++++++++++++ > drivers/net/wireless/ath/ath10k/core.h | 3 +++ > drivers/net/wireless/ath/ath10k/pci.c | 7 ++++--- > drivers/net/wireless/ath/ath10k/sdio.c | 5 ++--- > drivers/net/wireless/ath/ath10k/snoc.c | 5 ++--- > 6 files changed, 33 insertions(+), 12 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c > index 05a61975c83f..869524852fba 100644 > --- a/drivers/net/wireless/ath/ath10k/ahb.c > +++ b/drivers/net/wireless/ath/ath10k/ahb.c > @@ -626,7 +626,7 @@ static int ath10k_ahb_hif_start(struct ath10k *ar) > { > ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot ahb hif start\n"); > > - napi_enable(&ar->napi); > + ath10k_core_napi_enable(ar); > ath10k_ce_enable_interrupts(ar); > ath10k_pci_enable_legacy_irq(ar); > > @@ -644,8 +644,7 @@ static void ath10k_ahb_hif_stop(struct ath10k *ar) > ath10k_ahb_irq_disable(ar); > synchronize_irq(ar_ahb->irq); > > - napi_synchronize(&ar->napi); > - napi_disable(&ar->napi); > + ath10k_core_napi_sync_disable(ar); > > ath10k_pci_flush(ar); > } > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c > index cfffd20df0cc..7e52fd14659d 100644 > --- a/drivers/net/wireless/ath/ath10k/core.c > +++ b/drivers/net/wireless/ath/ath10k/core.c > @@ -2277,6 +2277,26 @@ static int ath10k_init_hw_params(struct ath10k *ar) > return 0; > } > > +void ath10k_core_napi_enable(struct ath10k *ar) > +{ > + if (ar->napi_enabled) > + return; > + > + napi_enable(&ar->napi); > + ar->napi_enabled = true; > +} > +EXPORT_SYMBOL(ath10k_core_napi_enable); > + > +void ath10k_core_napi_sync_disable(struct ath10k *ar) > +{ > + if (ar->napi_enabled) { > + napi_synchronize(&ar->napi); > + napi_disable(&ar->napi); > + ar->napi_enabled = false; > + } > +} > +EXPORT_SYMBOL(ath10k_core_napi_sync_disable); > + > static void ath10k_core_restart(struct work_struct *work) > { > struct ath10k *ar = container_of(work, struct ath10k, restart_work); > diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h > index 5c18f6c20462..efb26420cc20 100644 > --- a/drivers/net/wireless/ath/ath10k/core.h > +++ b/drivers/net/wireless/ath/ath10k/core.h > @@ -1230,6 +1230,7 @@ struct ath10k { > /* NAPI */ > struct net_device napi_dev; > struct napi_struct napi; > + bool napi_enabled; > > struct work_struct set_coverage_class_work; > /* protected by conf_mutex */ > @@ -1276,6 +1277,8 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar) > > extern unsigned long ath10k_coredump_mask; > > +void ath10k_core_napi_sync_disable(struct ath10k *ar); > +void ath10k_core_napi_enable(struct ath10k *ar); > struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev, > enum ath10k_bus bus, > enum ath10k_hw_rev hw_rev, > diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c > index 36426efdb2ea..7786accc5f96 100644 > --- a/drivers/net/wireless/ath/ath10k/pci.c > +++ b/drivers/net/wireless/ath/ath10k/pci.c > @@ -1958,7 +1958,7 @@ static int ath10k_pci_hif_start(struct ath10k *ar) > > ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n"); > > - napi_enable(&ar->napi); > + ath10k_core_napi_enable(ar); > > ath10k_pci_irq_enable(ar); > ath10k_pci_rx_post(ar); > @@ -2075,8 +2075,9 @@ static void ath10k_pci_hif_stop(struct ath10k *ar) > > ath10k_pci_irq_disable(ar); > ath10k_pci_irq_sync(ar); > - napi_synchronize(&ar->napi); > - napi_disable(&ar->napi); > + > + ath10k_core_napi_sync_disable(ar); > + > cancel_work_sync(&ar_pci->dump_work); > > /* Most likely the device has HTT Rx ring configured. The only way to > diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c > index 81ddaafb6721..8834f118ef77 100644 > --- a/drivers/net/wireless/ath/ath10k/sdio.c > +++ b/drivers/net/wireless/ath/ath10k/sdio.c > @@ -1859,7 +1859,7 @@ static int ath10k_sdio_hif_start(struct ath10k *ar) > struct ath10k_sdio *ar_sdio = ath10k_sdio_priv(ar); > int ret; > > - napi_enable(&ar->napi); > + ath10k_core_napi_enable(ar); > > /* Sleep 20 ms before HIF interrupts are disabled. > * This will give target plenty of time to process the BMI done > @@ -1986,8 +1986,7 @@ static void ath10k_sdio_hif_stop(struct ath10k *ar) > > spin_unlock_bh(&ar_sdio->wr_async_lock); > > - napi_synchronize(&ar->napi); > - napi_disable(&ar->napi); > + ath10k_core_napi_sync_disable(ar); > } > > #ifdef CONFIG_PM > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c > index 645ed5f63ef8..d54438c01df8 100644 > --- a/drivers/net/wireless/ath/ath10k/snoc.c > +++ b/drivers/net/wireless/ath/ath10k/snoc.c > @@ -913,8 +913,7 @@ static void ath10k_snoc_hif_stop(struct ath10k *ar) > if (!test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags)) > ath10k_snoc_irq_disable(ar); > > - napi_synchronize(&ar->napi); > - napi_disable(&ar->napi); > + ath10k_core_napi_sync_disable(ar); > ath10k_snoc_buffer_cleanup(ar); > ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif stop\n"); > } > @@ -923,7 +922,7 @@ static int ath10k_snoc_hif_start(struct ath10k *ar) > { > struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); > > - napi_enable(&ar->napi); > + ath10k_core_napi_enable(ar); > ath10k_snoc_irq_enable(ar); > ath10k_snoc_rx_post(ar); > LGTM, except maybe a shorted title "ath10k: Prevent deinitializing NAPI twice".
Wen Gong <wgong@codeaurora.org> writes: > It happened "Kernel panic - not syncing: hung_task: blocked tasks" when > test simulate crash and ifconfig down/rmmod meanwhile. > > Test steps: > > 1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC. > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio > echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci > > 2. dmesg: > [ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash > [ 5622.655995] ieee80211 phy0: Hardware restart was requested > [ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds. > [ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds. > [ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks > [ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137 > [ 5776.359846] Hardware name: MediaTek krane sku176 board (DT) > [ 5776.359855] Call trace: > [ 5776.359868] dump_backtrace+0x0/0x170 > [ 5776.359881] show_stack+0x20/0x2c > [ 5776.359896] dump_stack+0xd4/0x10c > [ 5776.359916] panic+0x12c/0x29c > [ 5776.359937] hung_task_panic+0x0/0x50 > [ 5776.359953] kthread+0x120/0x130 > [ 5776.359965] ret_from_fork+0x10/0x18 > [ 5776.359986] SMP: stopping secondary CPUs > [ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000 > [ 5776.360026] CPU features: 0x0,2188200c > [ 5776.360035] Memory Limit: none > > command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked > callstack of ifconfig: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] __dev_change_flags+0xe0/0x1d0 > [<0>] dev_change_flags+0x30/0x6c > [<0>] devinet_ioctl+0x370/0x564 > [<0>] inet_ioctl+0xdc/0x304 > [<0>] sock_do_ioctl+0x50/0x288 > [<0>] compat_sock_ioctl+0x1b4/0x1aac > [<0>] __se_compat_sys_ioctl+0x100/0x26fc > [<0>] __arm64_compat_sys_ioctl+0x20/0x2c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > callstack of rmmod: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] dev_close_many+0x70/0x100 > [<0>] dev_close+0x4c/0x80 > [<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211] > [<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211] > [<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211] > [<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core] > [<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio] > [<0>] sdio_bus_remove+0x48/0x108 > [<0>] device_release_driver_internal+0x138/0x1ec > [<0>] driver_detach+0x6c/0xa8 > [<0>] bus_remove_driver+0x78/0xa8 > [<0>] driver_unregister+0x30/0x50 > [<0>] sdio_unregister_driver+0x28/0x34 > [<0>] cleanup_module+0x14/0x6bc [ath10k_sdio] > [<0>] __arm64_sys_delete_module+0x1e0/0x22c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > SNOC: > [ 647.156863] Call trace: > [ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c > [ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798 > [ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94 > [ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c > [ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40 > [ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38 > [ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc] > [ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core] > [ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core] > [ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core] > [ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211] > [ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211] > [ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211] > [ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211] > [ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc > [ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100 > [ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80 > [ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211] > [ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211] > [ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211] > [ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core] > [ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc] > [ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50 > [ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8 > [ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8 > [ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8 > [ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50 > [ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28 > [ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc] > [ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c > > PCIe: > [ 615.392770] rmmod D 0 3523 3458 0x00000080 > [ 615.392777] Call Trace: > [ 615.392784] __schedule+0x617/0x7d3 > [ 615.392791] ? __mod_timer+0x263/0x35c > [ 615.392797] schedule+0x62/0x72 > [ 615.392803] schedule_timeout+0x8d/0xf3 > [ 615.392809] ? run_local_timers+0x6b/0x6b > [ 615.392814] msleep+0x1b/0x22 > [ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci] > [ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core] > [ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core] > [ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core] > [ 615.392893] drv_stop+0x9a/0x13c [mac80211] > [ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211] > [ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211] > [ 615.392945] __dev_close_many+0x9e/0xf0 > [ 615.392952] dev_close_many+0x62/0xe8 > [ 615.392958] dev_close+0x54/0x7d > [ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211] > [ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211] > [ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211] > [ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core] > [ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core] > [ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci] > [ 615.393117] pci_device_remove+0x41/0xa6 > [ 615.393129] device_release_driver_internal+0x123/0x1ec > [ 615.393140] driver_detach+0x60/0x90 > [ 615.393152] bus_remove_driver+0x72/0x9f > [ 615.393164] pci_unregister_driver+0x1e/0x87 > [ 615.393177] SyS_delete_module+0x1d7/0x277 > [ 615.393188] do_syscall_64+0x6b/0xf7 > [ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > > The test command run simulate_fw_crash firstly and it call into > ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable > is called and bit NAPI_STATE_SCHED is set. After that, function > ath10k_sdio_hif_stop is called again from ath10k_stop by command > "ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked. > > It is blocked by napi_synchronize, napi_disable will set bit with > NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop > becuase bit NAPI_STATE_SCHED is set by napi_disable. > > function of napi_synchronize > static inline void napi_synchronize(const struct napi_struct *n) > { > if (IS_ENABLED(CONFIG_SMP)) > while (test_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > else > barrier(); > } > > function of napi_disable > void napi_disable(struct napi_struct *n) > { > might_sleep(); > set_bit(NAPI_STATE_DISABLE, &n->state); > > while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) > msleep(1); > > hrtimer_cancel(&n->timer); > > clear_bit(NAPI_STATE_DISABLE, &n->state); > } > > Add flag for it avoid the hang and crash. > > Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049 > Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1 > Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2 > > Signed-off-by: Wen Gong <wgong@codeaurora.org> [...] > +void ath10k_core_napi_enable(struct ath10k *ar) > +{ > + if (ar->napi_enabled) > + return; > + > + napi_enable(&ar->napi); > + ar->napi_enabled = true; > +} > +EXPORT_SYMBOL(ath10k_core_napi_enable); > + > +void ath10k_core_napi_sync_disable(struct ath10k *ar) > +{ > + if (ar->napi_enabled) { > + napi_synchronize(&ar->napi); > + napi_disable(&ar->napi); > + ar->napi_enabled = false; > + } > +} > +EXPORT_SYMBOL(ath10k_core_napi_sync_disable); Just like with the recent firmware restart patch, isn't ar->napi_enabled racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? Or are we holding a lock? But then that should be documented with lockdep_assert_held().
On 2020-09-08 00:22, Kalle Valo wrote: > Wen Gong <wgong@codeaurora.org> writes: [...] > Just like with the recent firmware restart patch, isn't > ar->napi_enabled > racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? > > Or are we holding a lock? But then that should be documented with > lockdep_assert_held(). yes, ath10k_hif_start is only called from ath10k_core_start, it has "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only called from ath10k_core_stop, it also has "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both enter ath10k_hif_start/ath10k_hif_stop meanwhile.
Krishna Chaitanya <chaitanya.mgit@gmail.com> writes: > On Fri, Aug 28, 2020 at 5:53 PM Wen Gong <wgong@codeaurora.org> wrote: >> >> It happened "Kernel panic - not syncing: hung_task: blocked tasks" when >> test simulate crash and ifconfig down/rmmod meanwhile. (editing out hundreds of lines of unnecessary text, please edit your quotes) > LGTM, except maybe a shorted title "ath10k: Prevent deinitializing NAPI twice". Thanks, I fixed the title in the pending branch.
Wen Gong <wgong@codeaurora.org> writes: > On 2020-09-08 00:22, Kalle Valo wrote: > >> Just like with the recent firmware restart patch, isn't >> ar->napi_enabled >> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >> >> Or are we holding a lock? But then that should be documented with >> lockdep_assert_held(). > > yes, ath10k_hif_start is only called from ath10k_core_start, it has > "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only > called from ath10k_core_stop, it also has > "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both > enter ath10k_hif_start/ath10k_hif_stop meanwhile. Ok, but every function depending on a lock being held should still call lockdep_assert_held(), that way we can catch the bug if locking changes later. So it's not enough that ath10k_core_stop() has lockdep_assert_held(), also these napi functions should have it. I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with set_bit() & co, simpler locking that way and no lockdep_assert_held() needed anymore. Please check my changes in the pending branch, I have only compile tested them: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9
On 12/9/20 1:24 AM, Kalle Valo wrote: > Wen Gong <wgong@codeaurora.org> writes: > >> On 2020-09-08 00:22, Kalle Valo wrote: >> >>> Just like with the recent firmware restart patch, isn't >>> ar->napi_enabled >>> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >>> >>> Or are we holding a lock? But then that should be documented with >>> lockdep_assert_held(). >> >> yes, ath10k_hif_start is only called from ath10k_core_start, it has >> "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only >> called from ath10k_core_stop, it also has >> "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both >> enter ath10k_hif_start/ath10k_hif_stop meanwhile. > > Ok, but every function depending on a lock being held should still call > lockdep_assert_held(), that way we can catch the bug if locking changes > later. So it's not enough that ath10k_core_stop() has > lockdep_assert_held(), also these napi functions should have it. > > I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with > set_bit() & co, simpler locking that way and no lockdep_assert_held() > needed anymore. Please check my changes in the pending branch, I have > only compile tested them: > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9 > Why do you not need locking? You can't just check a bit is set and then do work and set it later without locking, two concurrent CPU threads can pass the first check and both get into the logic below it? Thanks, Ben
On 2020-12-09 17:24, Kalle Valo wrote: > Wen Gong <wgong@codeaurora.org> writes: > >> On 2020-09-08 00:22, Kalle Valo wrote: >> >>> Just like with the recent firmware restart patch, isn't >>> ar->napi_enabled >>> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >>> >>> Or are we holding a lock? But then that should be documented with >>> lockdep_assert_held(). >> >> yes, ath10k_hif_start is only called from ath10k_core_start, it has >> "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only >> called from ath10k_core_stop, it also has >> "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both >> enter ath10k_hif_start/ath10k_hif_stop meanwhile. > > Ok, but every function depending on a lock being held should still call > lockdep_assert_held(), that way we can catch the bug if locking changes > later. So it's not enough that ath10k_core_stop() has > lockdep_assert_held(), also these napi functions should have it. > > I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with > set_bit() & co, simpler locking that way and no lockdep_assert_held() > needed anymore. Please check my changes in the pending branch, I have > only compile tested them: I checked, it only changed ar->napi_enabled to flag ATH10K_FLAG_NAPI_ENABLED, not found probelm. > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9
On 2020-12-09 23:00, Ben Greear wrote: > On 12/9/20 1:24 AM, Kalle Valo wrote: >> Wen Gong <wgong@codeaurora.org> writes: >> >>> On 2020-09-08 00:22, Kalle Valo wrote: >>> >>>> Just like with the recent firmware restart patch, isn't >>>> ar->napi_enabled >>>> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >>>> >>>> Or are we holding a lock? But then that should be documented with >>>> lockdep_assert_held(). >>> >>> yes, ath10k_hif_start is only called from ath10k_core_start, it has >>> "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only >>> called from ath10k_core_stop, it also has >>> "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread >>> both >>> enter ath10k_hif_start/ath10k_hif_stop meanwhile. >> >> Ok, but every function depending on a lock being held should still >> call >> lockdep_assert_held(), that way we can catch the bug if locking >> changes >> later. So it's not enough that ath10k_core_stop() has >> lockdep_assert_held(), also these napi functions should have it. >> >> I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with >> set_bit() & co, simpler locking that way and no lockdep_assert_held() >> needed anymore. Please check my changes in the pending branch, I have >> only compile tested them: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9 >> > > Why do you not need locking? You can't just check a bit is set and > then do work and set > it later without locking, two concurrent CPU threads can pass the > first check and both get into > the logic below it? > maybe because which I said before: ath10k_hif_start is only called from ath10k_core_start, it has "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only called from ath10k_core_stop, it also has "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both enter ath10k_hif_start/ath10k_hif_stop meanwhile. > Thanks, > Ben
Ben Greear <greearb@candelatech.com> writes: > On 12/9/20 1:24 AM, Kalle Valo wrote: >> Wen Gong <wgong@codeaurora.org> writes: >> >>> On 2020-09-08 00:22, Kalle Valo wrote: >>> >>>> Just like with the recent firmware restart patch, isn't >>>> ar->napi_enabled >>>> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >>>> >>>> Or are we holding a lock? But then that should be documented with >>>> lockdep_assert_held(). >>> >>> yes, ath10k_hif_start is only called from ath10k_core_start, it has >>> "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only >>> called from ath10k_core_stop, it also has >>> "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both >>> enter ath10k_hif_start/ath10k_hif_stop meanwhile. >> >> Ok, but every function depending on a lock being held should still call >> lockdep_assert_held(), that way we can catch the bug if locking changes >> later. So it's not enough that ath10k_core_stop() has >> lockdep_assert_held(), also these napi functions should have it. >> >> I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with >> set_bit() & co, simpler locking that way and no lockdep_assert_held() >> needed anymore. Please check my changes in the pending branch, I have >> only compile tested them: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9 >> > > Why do you not need locking? You can't just check a bit is set and > then do work and set it later without locking, two concurrent CPU > threads can pass the first check and both get into the logic below it? Good point, there is a race. I now fixed the patch in the pending and documented that core_mutex needs to be held when changing the NAPI state: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=2fe769592ef6d4ae14260989dcbdbde4bff01cb6
Wen Gong <wgong@codeaurora.org> writes: > On 2020-12-09 23:00, Ben Greear wrote: >> On 12/9/20 1:24 AM, Kalle Valo wrote: >>> Wen Gong <wgong@codeaurora.org> writes: >>> >>>> On 2020-09-08 00:22, Kalle Valo wrote: >>>> >>>>> Just like with the recent firmware restart patch, isn't >>>>> ar->napi_enabled >>>>> racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? >>>>> >>>>> Or are we holding a lock? But then that should be documented with >>>>> lockdep_assert_held(). >>>> >>>> yes, ath10k_hif_start is only called from ath10k_core_start, it has >>>> "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only >>>> called from ath10k_core_stop, it also has >>>> "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread >>>> both >>>> enter ath10k_hif_start/ath10k_hif_stop meanwhile. >>> >>> Ok, but every function depending on a lock being held should still >>> call >>> lockdep_assert_held(), that way we can catch the bug if locking >>> changes >>> later. So it's not enough that ath10k_core_stop() has >>> lockdep_assert_held(), also these napi functions should have it. >>> >>> I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with >>> set_bit() & co, simpler locking that way and no lockdep_assert_held() >>> needed anymore. Please check my changes in the pending branch, I have >>> only compile tested them: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=e0a466d296bd862080f7796b41349f9f586272c9 >>> >> >> Why do you not need locking? You can't just check a bit is set and >> then do work and set >> it later without locking, two concurrent CPU threads can pass the >> first check and both get into >> the logic below it? >> > maybe because which I said before: > > ath10k_hif_start is only called from ath10k_core_start, it has > "lockdep_assert_held(&ar->conf_mutex)", and ath10k_hif_stop is only > called from ath10k_core_stop, it also has > "lockdep_assert_held(&ar->conf_mutex)". then it will not 2 thread both > enter ath10k_hif_start/ath10k_hif_stop meanwhile. Yeah, but that was not visible from the code. I now changed the patch in pending branch that this is clearly documented with lockdep_assert_held() and lockdep will warn if someone breaks the locking later on. If a function relies on a lock being held, lockdep_assert_held() needs to be _always_ used to make the locking dependencies clearly visible.
Wen Gong <wgong@codeaurora.org> wrote: > It happened "Kernel panic - not syncing: hung_task: blocked tasks" when > test simulate crash and ifconfig down/rmmod meanwhile. > > Test steps: > > 1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC. > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down > echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio > echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci > > 2. dmesg: > [ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash > [ 5622.655995] ieee80211 phy0: Hardware restart was requested > [ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds. > [ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds. > [ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks > [ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137 > [ 5776.359846] Hardware name: MediaTek krane sku176 board (DT) > [ 5776.359855] Call trace: > [ 5776.359868] dump_backtrace+0x0/0x170 > [ 5776.359881] show_stack+0x20/0x2c > [ 5776.359896] dump_stack+0xd4/0x10c > [ 5776.359916] panic+0x12c/0x29c > [ 5776.359937] hung_task_panic+0x0/0x50 > [ 5776.359953] kthread+0x120/0x130 > [ 5776.359965] ret_from_fork+0x10/0x18 > [ 5776.359986] SMP: stopping secondary CPUs > [ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000 > [ 5776.360026] CPU features: 0x0,2188200c > [ 5776.360035] Memory Limit: none > > command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked > callstack of ifconfig: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] __dev_change_flags+0xe0/0x1d0 > [<0>] dev_change_flags+0x30/0x6c > [<0>] devinet_ioctl+0x370/0x564 > [<0>] inet_ioctl+0xdc/0x304 > [<0>] sock_do_ioctl+0x50/0x288 > [<0>] compat_sock_ioctl+0x1b4/0x1aac > [<0>] __se_compat_sys_ioctl+0x100/0x26fc > [<0>] __arm64_compat_sys_ioctl+0x20/0x2c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > callstack of rmmod: > [<0>] __switch_to+0x120/0x13c > [<0>] msleep+0x28/0x38 > [<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio] > [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] > [<0>] ath10k_halt+0x120/0x178 [ath10k_core] > [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] > [<0>] drv_stop+0xe0/0x1e4 [mac80211] > [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] > [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] > [<0>] ieee80211_stop+0x20/0x30 [mac80211] > [<0>] __dev_close_many+0xb8/0x11c > [<0>] dev_close_many+0x70/0x100 > [<0>] dev_close+0x4c/0x80 > [<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211] > [<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211] > [<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211] > [<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core] > [<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio] > [<0>] sdio_bus_remove+0x48/0x108 > [<0>] device_release_driver_internal+0x138/0x1ec > [<0>] driver_detach+0x6c/0xa8 > [<0>] bus_remove_driver+0x78/0xa8 > [<0>] driver_unregister+0x30/0x50 > [<0>] sdio_unregister_driver+0x28/0x34 > [<0>] cleanup_module+0x14/0x6bc [ath10k_sdio] > [<0>] __arm64_sys_delete_module+0x1e0/0x22c > [<0>] el0_svc_common+0xa4/0x154 > [<0>] el0_svc_compat_handler+0x2c/0x38 > [<0>] el0_svc_compat+0x8/0x18 > [<0>] 0xffffffffffffffff > > SNOC: > [ 647.156863] Call trace: > [ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c > [ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798 > [ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94 > [ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c > [ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40 > [ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38 > [ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc] > [ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core] > [ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core] > [ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core] > [ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211] > [ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211] > [ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211] > [ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211] > [ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc > [ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100 > [ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80 > [ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211] > [ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211] > [ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211] > [ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] > [ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core] > [ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc] > [ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50 > [ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8 > [ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8 > [ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8 > [ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50 > [ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28 > [ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc] > [ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c > > PCIe: > [ 615.392770] rmmod D 0 3523 3458 0x00000080 > [ 615.392777] Call Trace: > [ 615.392784] __schedule+0x617/0x7d3 > [ 615.392791] ? __mod_timer+0x263/0x35c > [ 615.392797] schedule+0x62/0x72 > [ 615.392803] schedule_timeout+0x8d/0xf3 > [ 615.392809] ? run_local_timers+0x6b/0x6b > [ 615.392814] msleep+0x1b/0x22 > [ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci] > [ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core] > [ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core] > [ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core] > [ 615.392893] drv_stop+0x9a/0x13c [mac80211] > [ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211] > [ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211] > [ 615.392945] __dev_close_many+0x9e/0xf0 > [ 615.392952] dev_close_many+0x62/0xe8 > [ 615.392958] dev_close+0x54/0x7d > [ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211] > [ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211] > [ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211] > [ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core] > [ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core] > [ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci] > [ 615.393117] pci_device_remove+0x41/0xa6 > [ 615.393129] device_release_driver_internal+0x123/0x1ec > [ 615.393140] driver_detach+0x60/0x90 > [ 615.393152] bus_remove_driver+0x72/0x9f > [ 615.393164] pci_unregister_driver+0x1e/0x87 > [ 615.393177] SyS_delete_module+0x1d7/0x277 > [ 615.393188] do_syscall_64+0x6b/0xf7 > [ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > > The test command run simulate_fw_crash firstly and it call into > ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable > is called and bit NAPI_STATE_SCHED is set. After that, function > ath10k_sdio_hif_stop is called again from ath10k_stop by command > "ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked. > > It is blocked by napi_synchronize, napi_disable will set bit with > NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop > becuase bit NAPI_STATE_SCHED is set by napi_disable. > > function of napi_synchronize > static inline void napi_synchronize(const struct napi_struct *n) > { > if (IS_ENABLED(CONFIG_SMP)) > while (test_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > else > barrier(); > } > > function of napi_disable > void napi_disable(struct napi_struct *n) > { > might_sleep(); > set_bit(NAPI_STATE_DISABLE, &n->state); > > while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) > msleep(1); > while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) > msleep(1); > > hrtimer_cancel(&n->timer); > > clear_bit(NAPI_STATE_DISABLE, &n->state); > } > > Add flag for it avoid the hang and crash. > > Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049 > Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1 > Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2 > > Signed-off-by: Wen Gong <wgong@codeaurora.org> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Patch applied to ath-next branch of ath.git, thanks. e2f8b74e58cb ath10k: prevent deinitializing NAPI twice
diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c index 05a61975c83f..869524852fba 100644 --- a/drivers/net/wireless/ath/ath10k/ahb.c +++ b/drivers/net/wireless/ath/ath10k/ahb.c @@ -626,7 +626,7 @@ static int ath10k_ahb_hif_start(struct ath10k *ar) { ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot ahb hif start\n"); - napi_enable(&ar->napi); + ath10k_core_napi_enable(ar); ath10k_ce_enable_interrupts(ar); ath10k_pci_enable_legacy_irq(ar); @@ -644,8 +644,7 @@ static void ath10k_ahb_hif_stop(struct ath10k *ar) ath10k_ahb_irq_disable(ar); synchronize_irq(ar_ahb->irq); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + ath10k_core_napi_sync_disable(ar); ath10k_pci_flush(ar); } diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index cfffd20df0cc..7e52fd14659d 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -2277,6 +2277,26 @@ static int ath10k_init_hw_params(struct ath10k *ar) return 0; } +void ath10k_core_napi_enable(struct ath10k *ar) +{ + if (ar->napi_enabled) + return; + + napi_enable(&ar->napi); + ar->napi_enabled = true; +} +EXPORT_SYMBOL(ath10k_core_napi_enable); + +void ath10k_core_napi_sync_disable(struct ath10k *ar) +{ + if (ar->napi_enabled) { + napi_synchronize(&ar->napi); + napi_disable(&ar->napi); + ar->napi_enabled = false; + } +} +EXPORT_SYMBOL(ath10k_core_napi_sync_disable); + static void ath10k_core_restart(struct work_struct *work) { struct ath10k *ar = container_of(work, struct ath10k, restart_work); diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 5c18f6c20462..efb26420cc20 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -1230,6 +1230,7 @@ struct ath10k { /* NAPI */ struct net_device napi_dev; struct napi_struct napi; + bool napi_enabled; struct work_struct set_coverage_class_work; /* protected by conf_mutex */ @@ -1276,6 +1277,8 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar) extern unsigned long ath10k_coredump_mask; +void ath10k_core_napi_sync_disable(struct ath10k *ar); +void ath10k_core_napi_enable(struct ath10k *ar); struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev, enum ath10k_bus bus, enum ath10k_hw_rev hw_rev, diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index 36426efdb2ea..7786accc5f96 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -1958,7 +1958,7 @@ static int ath10k_pci_hif_start(struct ath10k *ar) ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n"); - napi_enable(&ar->napi); + ath10k_core_napi_enable(ar); ath10k_pci_irq_enable(ar); ath10k_pci_rx_post(ar); @@ -2075,8 +2075,9 @@ static void ath10k_pci_hif_stop(struct ath10k *ar) ath10k_pci_irq_disable(ar); ath10k_pci_irq_sync(ar); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + + ath10k_core_napi_sync_disable(ar); + cancel_work_sync(&ar_pci->dump_work); /* Most likely the device has HTT Rx ring configured. The only way to diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c index 81ddaafb6721..8834f118ef77 100644 --- a/drivers/net/wireless/ath/ath10k/sdio.c +++ b/drivers/net/wireless/ath/ath10k/sdio.c @@ -1859,7 +1859,7 @@ static int ath10k_sdio_hif_start(struct ath10k *ar) struct ath10k_sdio *ar_sdio = ath10k_sdio_priv(ar); int ret; - napi_enable(&ar->napi); + ath10k_core_napi_enable(ar); /* Sleep 20 ms before HIF interrupts are disabled. * This will give target plenty of time to process the BMI done @@ -1986,8 +1986,7 @@ static void ath10k_sdio_hif_stop(struct ath10k *ar) spin_unlock_bh(&ar_sdio->wr_async_lock); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + ath10k_core_napi_sync_disable(ar); } #ifdef CONFIG_PM diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c index 645ed5f63ef8..d54438c01df8 100644 --- a/drivers/net/wireless/ath/ath10k/snoc.c +++ b/drivers/net/wireless/ath/ath10k/snoc.c @@ -913,8 +913,7 @@ static void ath10k_snoc_hif_stop(struct ath10k *ar) if (!test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags)) ath10k_snoc_irq_disable(ar); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + ath10k_core_napi_sync_disable(ar); ath10k_snoc_buffer_cleanup(ar); ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif stop\n"); } @@ -923,7 +922,7 @@ static int ath10k_snoc_hif_start(struct ath10k *ar) { struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); - napi_enable(&ar->napi); + ath10k_core_napi_enable(ar); ath10k_snoc_irq_enable(ar); ath10k_snoc_rx_post(ar);
It happened "Kernel panic - not syncing: hung_task: blocked tasks" when test simulate crash and ifconfig down/rmmod meanwhile. Test steps: 1.Test commands, either can reproduce the hang for PCIe, SDIO and SNOC. echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 0.05;ifconfig wlan0 down echo soft > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_sdio echo hw-restart > /sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod ath10k_pci 2. dmesg: [ 5622.548630] ath10k_sdio mmc1:0001:1: simulating soft firmware crash [ 5622.655995] ieee80211 phy0: Hardware restart was requested [ 5776.355164] INFO: task shill:1572 blocked for more than 122 seconds. [ 5776.355687] INFO: task kworker/1:2:24437 blocked for more than 122 seconds. [ 5776.359812] Kernel panic - not syncing: hung_task: blocked tasks [ 5776.359836] CPU: 1 PID: 55 Comm: khungtaskd Tainted: G W 4.19.86 #137 [ 5776.359846] Hardware name: MediaTek krane sku176 board (DT) [ 5776.359855] Call trace: [ 5776.359868] dump_backtrace+0x0/0x170 [ 5776.359881] show_stack+0x20/0x2c [ 5776.359896] dump_stack+0xd4/0x10c [ 5776.359916] panic+0x12c/0x29c [ 5776.359937] hung_task_panic+0x0/0x50 [ 5776.359953] kthread+0x120/0x130 [ 5776.359965] ret_from_fork+0x10/0x18 [ 5776.359986] SMP: stopping secondary CPUs [ 5776.360012] Kernel Offset: 0x141ea00000 from 0xffffff8008000000 [ 5776.360026] CPU features: 0x0,2188200c [ 5776.360035] Memory Limit: none command "ifconfig wlan0 down" or "rmmod ath10k_sdio" will be blocked callstack of ifconfig: [<0>] __switch_to+0x120/0x13c [<0>] msleep+0x28/0x38 [<0>] ath10k_sdio_hif_stop+0x24c/0x294 [ath10k_sdio] [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] [<0>] ath10k_halt+0x120/0x178 [ath10k_core] [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] [<0>] drv_stop+0xe0/0x1e4 [mac80211] [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] [<0>] ieee80211_stop+0x20/0x30 [mac80211] [<0>] __dev_close_many+0xb8/0x11c [<0>] __dev_change_flags+0xe0/0x1d0 [<0>] dev_change_flags+0x30/0x6c [<0>] devinet_ioctl+0x370/0x564 [<0>] inet_ioctl+0xdc/0x304 [<0>] sock_do_ioctl+0x50/0x288 [<0>] compat_sock_ioctl+0x1b4/0x1aac [<0>] __se_compat_sys_ioctl+0x100/0x26fc [<0>] __arm64_compat_sys_ioctl+0x20/0x2c [<0>] el0_svc_common+0xa4/0x154 [<0>] el0_svc_compat_handler+0x2c/0x38 [<0>] el0_svc_compat+0x8/0x18 [<0>] 0xffffffffffffffff callstack of rmmod: [<0>] __switch_to+0x120/0x13c [<0>] msleep+0x28/0x38 [<0>] ath10k_sdio_hif_stop+0x294/0x31c [ath10k_sdio] [<0>] ath10k_core_stop+0x50/0x78 [ath10k_core] [<0>] ath10k_halt+0x120/0x178 [ath10k_core] [<0>] ath10k_stop+0x4c/0x8c [ath10k_core] [<0>] drv_stop+0xe0/0x1e4 [mac80211] [<0>] ieee80211_stop_device+0x48/0x54 [mac80211] [<0>] ieee80211_do_stop+0x678/0x6f8 [mac80211] [<0>] ieee80211_stop+0x20/0x30 [mac80211] [<0>] __dev_close_many+0xb8/0x11c [<0>] dev_close_many+0x70/0x100 [<0>] dev_close+0x4c/0x80 [<0>] cfg80211_shutdown_all_interfaces+0x50/0xcc [cfg80211] [<0>] ieee80211_remove_interfaces+0x58/0x1a0 [mac80211] [<0>] ieee80211_unregister_hw+0x40/0x100 [mac80211] [<0>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] [<0>] ath10k_core_unregister+0x38/0x7c [ath10k_core] [<0>] ath10k_sdio_remove+0x8c/0xd0 [ath10k_sdio] [<0>] sdio_bus_remove+0x48/0x108 [<0>] device_release_driver_internal+0x138/0x1ec [<0>] driver_detach+0x6c/0xa8 [<0>] bus_remove_driver+0x78/0xa8 [<0>] driver_unregister+0x30/0x50 [<0>] sdio_unregister_driver+0x28/0x34 [<0>] cleanup_module+0x14/0x6bc [ath10k_sdio] [<0>] __arm64_sys_delete_module+0x1e0/0x22c [<0>] el0_svc_common+0xa4/0x154 [<0>] el0_svc_compat_handler+0x2c/0x38 [<0>] el0_svc_compat+0x8/0x18 [<0>] 0xffffffffffffffff SNOC: [ 647.156863] Call trace: [ 647.162166] [<ffffff80080855a4>] __switch_to+0x120/0x13c [ 647.164512] [<ffffff800899d8b8>] __schedule+0x5ec/0x798 [ 647.170062] [<ffffff800899dad8>] schedule+0x74/0x94 [ 647.175050] [<ffffff80089a0848>] schedule_timeout+0x314/0x42c [ 647.179874] [<ffffff80089a0a14>] schedule_timeout_uninterruptible+0x34/0x40 [ 647.185780] [<ffffff80082a494>] msleep+0x28/0x38 [ 647.192546] [<ffffff800117ec4c>] ath10k_snoc_hif_stop+0x4c/0x1e0 [ath10k_snoc] [ 647.197439] [<ffffff80010dfbd8>] ath10k_core_stop+0x50/0x7c [ath10k_core] [ 647.204652] [<ffffff80010c8f48>] ath10k_halt+0x114/0x16c [ath10k_core] [ 647.211420] [<ffffff80010cad68>] ath10k_stop+0x4c/0x88 [ath10k_core] [ 647.217865] [<ffffff8000fdbf54>] drv_stop+0x110/0x244 [mac80211] [ 647.224367] [<ffffff80010147ac>] ieee80211_stop_device+0x48/0x54 [mac80211] [ 647.230359] [<ffffff8000ff3eec>] ieee80211_do_stop+0x6a4/0x73c [mac80211] [ 647.237033] [<ffffff8000ff4500>] ieee80211_stop+0x20/0x30 [mac80211] [ 647.243942] [<ffffff80087e39b8>] __dev_close_many+0xa0/0xfc [ 647.250435] [<ffffff80087e3888>] dev_close_many+0x70/0x100 [ 647.255651] [<ffffff80087e3a60>] dev_close+0x4c/0x80 [ 647.261244] [<ffffff8000f1ba54>] cfg80211_shutdown_all_interfaces+0x44/0xcc [cfg80211] [ 647.266383] [<ffffff8000ff3fdc>] ieee80211_remove_interfaces+0x58/0x1b4 [mac80211] [ 647.274128] [<ffffff8000fda540>] ieee80211_unregister_hw+0x50/0x120 [mac80211] [ 647.281659] [<ffffff80010ca314>] ath10k_mac_unregister+0x1c/0x44 [ath10k_core] [ 647.288839] [<ffffff80010dfc94>] ath10k_core_unregister+0x48/0x90 [ath10k_core] [ 647.296027] [<ffffff800117e598>] ath10k_snoc_remove+0x5c/0x150 [ath10k_snoc] [ 647.303229] [<ffffff80085625fc>] platform_drv_remove+0x28/0x50 [ 647.310517] [<ffffff80085601a4>] device_release_driver_internal+0x114/0x1b8 [ 647.316257] [<ffffff80085602e4>] driver_detach+0x6c/0xa8 [ 647.323021] [<ffffff800855e5b8>] bus_remove_driver+0x78/0xa8 [ 647.328571] [<ffffff800856107c>] driver_unregister+0x30/0x50 [ 647.334213] [<ffffff8008562674>] platform_driver_unregister+0x1c/0x28 [ 647.339876] [<ffffff800117fefc>] cleanup_module+0x1c/0x120 [ath10k_snoc] [ 647.346196] [<ffffff8008143ab8>] SyS_delete_module+0x1dc/0x22c PCIe: [ 615.392770] rmmod D 0 3523 3458 0x00000080 [ 615.392777] Call Trace: [ 615.392784] __schedule+0x617/0x7d3 [ 615.392791] ? __mod_timer+0x263/0x35c [ 615.392797] schedule+0x62/0x72 [ 615.392803] schedule_timeout+0x8d/0xf3 [ 615.392809] ? run_local_timers+0x6b/0x6b [ 615.392814] msleep+0x1b/0x22 [ 615.392824] ath10k_pci_hif_stop+0x68/0xd6 [ath10k_pci] [ 615.392844] ath10k_core_stop+0x44/0x67 [ath10k_core] [ 615.392859] ath10k_halt+0x102/0x153 [ath10k_core] [ 615.392873] ath10k_stop+0x38/0x75 [ath10k_core] [ 615.392893] drv_stop+0x9a/0x13c [mac80211] [ 615.392915] ieee80211_do_stop+0x772/0x7cd [mac80211] [ 615.392937] ieee80211_stop+0x1a/0x1e [mac80211] [ 615.392945] __dev_close_many+0x9e/0xf0 [ 615.392952] dev_close_many+0x62/0xe8 [ 615.392958] dev_close+0x54/0x7d [ 615.392975] cfg80211_shutdown_all_interfaces+0x6e/0xa5 [cfg80211] [ 615.393021] ieee80211_remove_interfaces+0x52/0x1aa [mac80211] [ 615.393049] ieee80211_unregister_hw+0x54/0x136 [mac80211] [ 615.393068] ath10k_mac_unregister+0x19/0x4a [ath10k_core] [ 615.393091] ath10k_core_unregister+0x39/0x7e [ath10k_core] [ 615.393104] ath10k_pci_remove+0x3d/0x7f [ath10k_pci] [ 615.393117] pci_device_remove+0x41/0xa6 [ 615.393129] device_release_driver_internal+0x123/0x1ec [ 615.393140] driver_detach+0x60/0x90 [ 615.393152] bus_remove_driver+0x72/0x9f [ 615.393164] pci_unregister_driver+0x1e/0x87 [ 615.393177] SyS_delete_module+0x1d7/0x277 [ 615.393188] do_syscall_64+0x6b/0xf7 [ 615.393199] entry_SYSCALL_64_after_hwframe+0x41/0xa6 The test command run simulate_fw_crash firstly and it call into ath10k_sdio_hif_stop from ath10k_core_restart, then napi_disable is called and bit NAPI_STATE_SCHED is set. After that, function ath10k_sdio_hif_stop is called again from ath10k_stop by command "ifconfig wlan0 down" or "rmmod ath10k_sdio", then command blocked. It is blocked by napi_synchronize, napi_disable will set bit with NAPI_STATE_SCHED, and then napi_synchronize will enter dead loop becuase bit NAPI_STATE_SCHED is set by napi_disable. function of napi_synchronize static inline void napi_synchronize(const struct napi_struct *n) { if (IS_ENABLED(CONFIG_SMP)) while (test_bit(NAPI_STATE_SCHED, &n->state)) msleep(1); else barrier(); } function of napi_disable void napi_disable(struct napi_struct *n) { might_sleep(); set_bit(NAPI_STATE_DISABLE, &n->state); while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) msleep(1); while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) msleep(1); hrtimer_cancel(&n->timer); clear_bit(NAPI_STATE_DISABLE, &n->state); } Add flag for it avoid the hang and crash. Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049 Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1 Tested-on: WCN3990 hw1.0 SNOC hw1.0 WLAN.HL.3.1-01307.1-QCAHLSWMTPL-2 Signed-off-by: Wen Gong <wgong@codeaurora.org> --- v3: change all bus type v2: change to use flag drivers/net/wireless/ath/ath10k/ahb.c | 5 ++--- drivers/net/wireless/ath/ath10k/core.c | 20 ++++++++++++++++++++ drivers/net/wireless/ath/ath10k/core.h | 3 +++ drivers/net/wireless/ath/ath10k/pci.c | 7 ++++--- drivers/net/wireless/ath/ath10k/sdio.c | 5 ++--- drivers/net/wireless/ath/ath10k/snoc.c | 5 ++--- 6 files changed, 33 insertions(+), 12 deletions(-)