Message ID | 20250408-fix_reboot_issues_with_hw_grouping-v4-1-95e7bf048595@oss.qualcomm.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jeff Johnson |
Headers | show |
Series | wifi: ath12k: fixes for rmmod and recovery issues with hardware grouping | expand |
On 4/8/2025 11:36 AM, Aditya Kumar Singh wrote: > During rmmod of ath12k module with SLUB debug enabled, following print is > seen - > > ============================================================================= > BUG kmalloc-1k (Not tainted): Object already free > ----------------------------------------------------------------------------- > > Allocated in ath12k_reg_build_regd+0x94/0xa20 [ath12k] age=10470 cpu=0 pid=0 > __kmalloc_noprof+0xf4/0x368 > ath12k_reg_build_regd+0x94/0xa20 [ath12k] > ath12k_wmi_op_rx+0x199c/0x2c14 [ath12k] > ath12k_htc_rx_completion_handler+0x398/0x554 [ath12k] > ath12k_ce_per_engine_service+0x248/0x368 [ath12k] > ath12k_pci_ce_workqueue+0x28/0x50 [ath12k] > process_one_work+0x14c/0x28c > bh_worker+0x22c/0x27c > workqueue_softirq_action+0x80/0x90 > tasklet_action+0x14/0x3c > handle_softirqs+0x108/0x240 > __do_softirq+0x14/0x20 > Freed in ath12k_reg_free+0x40/0x74 [ath12k] age=136 cpu=2 pid=166 > kfree+0x148/0x248 > ath12k_reg_free+0x40/0x74 [ath12k] > ath12k_core_hw_group_destroy+0x68/0xac [ath12k] > ath12k_core_deinit+0xd8/0x124 [ath12k] > ath12k_pci_remove+0x6c/0x130 [ath12k] > pci_device_remove+0x44/0xe8 > device_remove+0x4c/0x80 > device_release_driver_internal+0x1d0/0x22c > driver_detach+0x50/0x98 > bus_remove_driver+0x70/0xf4 > driver_unregister+0x30/0x60 > pci_unregister_driver+0x24/0x9c > ath12k_pci_exit+0x18/0x24 [ath12k] > __arm64_sys_delete_module+0x1a0/0x2a8 > invoke_syscall+0x48/0x110 > el0_svc_common.constprop.0+0x40/0xe0 > Slab 0xfffffdffc0033600 objects=10 used=6 fp=0xffff000000cdcc00 flags=0x3fffe0000000240(workingset|head|node=0|zone=0|lastcpupid=0x1ffff) > Object 0xffff000000cdcc00 @offset=19456 fp=0xffff000000cde400 > [...] > > This issue arises because in ath12k_core_hw_group_destroy(), each device > calls ath12k_core_soc_destroy() for itself and all its partners within the > same group. Since ath12k_core_hw_group_destroy() is invoked for each > device, this results in a double free condition, eventually causing the > SLUB bug. > > To resolve this, set the freed pointers to NULL. And since there could be > a race condition to read these pointers, guard these with the available > mutex lock. > > Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 > Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 > Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 > > Fixes: 6f245ea0ec6c ("wifi: ath12k: introduce device group abstraction") > Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
diff --git a/drivers/net/wireless/ath/ath12k/reg.c b/drivers/net/wireless/ath/ath12k/reg.c index 893650f76fb2d9f24177d524c5a979693b543657..3260df2ad60059117d5340c27a3b82fdcfabd02b 100644 --- a/drivers/net/wireless/ath/ath12k/reg.c +++ b/drivers/net/wireless/ath/ath12k/reg.c @@ -817,8 +817,12 @@ void ath12k_reg_free(struct ath12k_base *ab) { int i; + mutex_lock(&ab->core_lock); for (i = 0; i < ab->hw_params->max_radios; i++) { kfree(ab->default_regd[i]); kfree(ab->new_regd[i]); + ab->default_regd[i] = NULL; + ab->new_regd[i] = NULL; } + mutex_unlock(&ab->core_lock); }
During rmmod of ath12k module with SLUB debug enabled, following print is seen - ============================================================================= BUG kmalloc-1k (Not tainted): Object already free ----------------------------------------------------------------------------- Allocated in ath12k_reg_build_regd+0x94/0xa20 [ath12k] age=10470 cpu=0 pid=0 __kmalloc_noprof+0xf4/0x368 ath12k_reg_build_regd+0x94/0xa20 [ath12k] ath12k_wmi_op_rx+0x199c/0x2c14 [ath12k] ath12k_htc_rx_completion_handler+0x398/0x554 [ath12k] ath12k_ce_per_engine_service+0x248/0x368 [ath12k] ath12k_pci_ce_workqueue+0x28/0x50 [ath12k] process_one_work+0x14c/0x28c bh_worker+0x22c/0x27c workqueue_softirq_action+0x80/0x90 tasklet_action+0x14/0x3c handle_softirqs+0x108/0x240 __do_softirq+0x14/0x20 Freed in ath12k_reg_free+0x40/0x74 [ath12k] age=136 cpu=2 pid=166 kfree+0x148/0x248 ath12k_reg_free+0x40/0x74 [ath12k] ath12k_core_hw_group_destroy+0x68/0xac [ath12k] ath12k_core_deinit+0xd8/0x124 [ath12k] ath12k_pci_remove+0x6c/0x130 [ath12k] pci_device_remove+0x44/0xe8 device_remove+0x4c/0x80 device_release_driver_internal+0x1d0/0x22c driver_detach+0x50/0x98 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 pci_unregister_driver+0x24/0x9c ath12k_pci_exit+0x18/0x24 [ath12k] __arm64_sys_delete_module+0x1a0/0x2a8 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 Slab 0xfffffdffc0033600 objects=10 used=6 fp=0xffff000000cdcc00 flags=0x3fffe0000000240(workingset|head|node=0|zone=0|lastcpupid=0x1ffff) Object 0xffff000000cdcc00 @offset=19456 fp=0xffff000000cde400 [...] This issue arises because in ath12k_core_hw_group_destroy(), each device calls ath12k_core_soc_destroy() for itself and all its partners within the same group. Since ath12k_core_hw_group_destroy() is invoked for each device, this results in a double free condition, eventually causing the SLUB bug. To resolve this, set the freed pointers to NULL. And since there could be a race condition to read these pointers, guard these with the available mutex lock. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Fixes: 6f245ea0ec6c ("wifi: ath12k: introduce device group abstraction") Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> --- drivers/net/wireless/ath/ath12k/reg.c | 4 ++++ 1 file changed, 4 insertions(+)