diff mbox series

[ath-next,v4,1/9] wifi: ath12k: fix SLUB BUG - Object already free in ath12k_reg_free()

Message ID 20250408-fix_reboot_issues_with_hw_grouping-v4-1-95e7bf048595@oss.qualcomm.com (mailing list archive)
State Accepted
Delegated to: Jeff Johnson
Headers show
Series wifi: ath12k: fixes for rmmod and recovery issues with hardware grouping | expand

Checks

Context Check Description
wifibot/fixes_present success Fixes tag not required for -next series
wifibot/series_format success Posting correctly formatted
wifibot/tree_selection success Clearly marked for ath-next
wifibot/ynl success Generated files up to date; no warnings/errors; no diff in generated;
wifibot/build_clang success Errors and warnings before: 0 this patch: 0
wifibot/build_32bit success Errors and warnings before: 0 this patch: 0
wifibot/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
wifibot/build_clang_rust success No Rust files in patch. Skipping build
wifibot/build_tools success No tools touched, skip
wifibot/check_selftest success No net selftest shell script
wifibot/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
wifibot/deprecated_api success None detected
wifibot/header_inline success No static functions without inline keyword in header files
wifibot/kdoc success Errors and warnings before: 0 this patch: 0
wifibot/source_inline success Was 0 now: 0
wifibot/verify_fixes success Fixes tag looks correct
wifibot/verify_signedoff success Signed-off-by tag matches author and committer

Commit Message

Aditya Kumar Singh April 8, 2025, 6:06 a.m. UTC
During rmmod of ath12k module with SLUB debug enabled, following print is
seen -

=============================================================================
BUG kmalloc-1k (Not tainted): Object already free
-----------------------------------------------------------------------------

Allocated in ath12k_reg_build_regd+0x94/0xa20 [ath12k] age=10470 cpu=0 pid=0
 __kmalloc_noprof+0xf4/0x368
 ath12k_reg_build_regd+0x94/0xa20 [ath12k]
 ath12k_wmi_op_rx+0x199c/0x2c14 [ath12k]
 ath12k_htc_rx_completion_handler+0x398/0x554 [ath12k]
 ath12k_ce_per_engine_service+0x248/0x368 [ath12k]
 ath12k_pci_ce_workqueue+0x28/0x50 [ath12k]
 process_one_work+0x14c/0x28c
 bh_worker+0x22c/0x27c
 workqueue_softirq_action+0x80/0x90
 tasklet_action+0x14/0x3c
 handle_softirqs+0x108/0x240
 __do_softirq+0x14/0x20
Freed in ath12k_reg_free+0x40/0x74 [ath12k] age=136 cpu=2 pid=166
 kfree+0x148/0x248
 ath12k_reg_free+0x40/0x74 [ath12k]
 ath12k_core_hw_group_destroy+0x68/0xac [ath12k]
 ath12k_core_deinit+0xd8/0x124 [ath12k]
 ath12k_pci_remove+0x6c/0x130 [ath12k]
 pci_device_remove+0x44/0xe8
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1d0/0x22c
 driver_detach+0x50/0x98
 bus_remove_driver+0x70/0xf4
 driver_unregister+0x30/0x60
 pci_unregister_driver+0x24/0x9c
 ath12k_pci_exit+0x18/0x24 [ath12k]
 __arm64_sys_delete_module+0x1a0/0x2a8
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0x40/0xe0
Slab 0xfffffdffc0033600 objects=10 used=6 fp=0xffff000000cdcc00 flags=0x3fffe0000000240(workingset|head|node=0|zone=0|lastcpupid=0x1ffff)
Object 0xffff000000cdcc00 @offset=19456 fp=0xffff000000cde400
[...]

This issue arises because in ath12k_core_hw_group_destroy(), each device
calls ath12k_core_soc_destroy() for itself and all its partners within the
same group. Since ath12k_core_hw_group_destroy() is invoked for each
device, this results in a double free condition, eventually causing the
SLUB bug.

To resolve this, set the freed pointers to NULL. And since there could be
a race condition to read these pointers, guard these with the available
mutex lock.

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3

Fixes: 6f245ea0ec6c ("wifi: ath12k: introduce device group abstraction")
Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com>
---
 drivers/net/wireless/ath/ath12k/reg.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Vasanthakumar Thiagarajan April 8, 2025, 9:50 a.m. UTC | #1
On 4/8/2025 11:36 AM, Aditya Kumar Singh wrote:
> During rmmod of ath12k module with SLUB debug enabled, following print is
> seen -
> 
> =============================================================================
> BUG kmalloc-1k (Not tainted): Object already free
> -----------------------------------------------------------------------------
> 
> Allocated in ath12k_reg_build_regd+0x94/0xa20 [ath12k] age=10470 cpu=0 pid=0
>   __kmalloc_noprof+0xf4/0x368
>   ath12k_reg_build_regd+0x94/0xa20 [ath12k]
>   ath12k_wmi_op_rx+0x199c/0x2c14 [ath12k]
>   ath12k_htc_rx_completion_handler+0x398/0x554 [ath12k]
>   ath12k_ce_per_engine_service+0x248/0x368 [ath12k]
>   ath12k_pci_ce_workqueue+0x28/0x50 [ath12k]
>   process_one_work+0x14c/0x28c
>   bh_worker+0x22c/0x27c
>   workqueue_softirq_action+0x80/0x90
>   tasklet_action+0x14/0x3c
>   handle_softirqs+0x108/0x240
>   __do_softirq+0x14/0x20
> Freed in ath12k_reg_free+0x40/0x74 [ath12k] age=136 cpu=2 pid=166
>   kfree+0x148/0x248
>   ath12k_reg_free+0x40/0x74 [ath12k]
>   ath12k_core_hw_group_destroy+0x68/0xac [ath12k]
>   ath12k_core_deinit+0xd8/0x124 [ath12k]
>   ath12k_pci_remove+0x6c/0x130 [ath12k]
>   pci_device_remove+0x44/0xe8
>   device_remove+0x4c/0x80
>   device_release_driver_internal+0x1d0/0x22c
>   driver_detach+0x50/0x98
>   bus_remove_driver+0x70/0xf4
>   driver_unregister+0x30/0x60
>   pci_unregister_driver+0x24/0x9c
>   ath12k_pci_exit+0x18/0x24 [ath12k]
>   __arm64_sys_delete_module+0x1a0/0x2a8
>   invoke_syscall+0x48/0x110
>   el0_svc_common.constprop.0+0x40/0xe0
> Slab 0xfffffdffc0033600 objects=10 used=6 fp=0xffff000000cdcc00 flags=0x3fffe0000000240(workingset|head|node=0|zone=0|lastcpupid=0x1ffff)
> Object 0xffff000000cdcc00 @offset=19456 fp=0xffff000000cde400
> [...]
> 
> This issue arises because in ath12k_core_hw_group_destroy(), each device
> calls ath12k_core_soc_destroy() for itself and all its partners within the
> same group. Since ath12k_core_hw_group_destroy() is invoked for each
> device, this results in a double free condition, eventually causing the
> SLUB bug.
> 
> To resolve this, set the freed pointers to NULL. And since there could be
> a race condition to read these pointers, guard these with the available
> mutex lock.
> 
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
> Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
> 
> Fixes: 6f245ea0ec6c ("wifi: ath12k: introduce device group abstraction")
> Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com>

Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath12k/reg.c b/drivers/net/wireless/ath/ath12k/reg.c
index 893650f76fb2d9f24177d524c5a979693b543657..3260df2ad60059117d5340c27a3b82fdcfabd02b 100644
--- a/drivers/net/wireless/ath/ath12k/reg.c
+++ b/drivers/net/wireless/ath/ath12k/reg.c
@@ -817,8 +817,12 @@  void ath12k_reg_free(struct ath12k_base *ab)
 {
 	int i;
 
+	mutex_lock(&ab->core_lock);
 	for (i = 0; i < ab->hw_params->max_radios; i++) {
 		kfree(ab->default_regd[i]);
 		kfree(ab->new_regd[i]);
+		ab->default_regd[i] = NULL;
+		ab->new_regd[i] = NULL;
 	}
+	mutex_unlock(&ab->core_lock);
 }