Message ID | 20241016123722.206899-1-jtornosm@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jeff Johnson |
Headers | show |
Series | [v2,1/2] wifi: ath12k: fix crash when unbinding | expand |
On 10/16/2024 5:37 AM, Jose Ignacio Tornos Martinez wrote: > If there is an error during some initialization realated to firmware, previous review comment missed: s/realated/related/ > the buffers dp->tx_ring[i].tx_status are released. > However this is released again when the device is unbinded (ath12k_pci), > and we get: > [ 41.271233] WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80 > [ 41.271246] Modules linked in: uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc qrtr_mhi intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm rapl qrtr snd_hda_codec_generic ath12k qmi_helpers snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi iTCO_wdt intel_pmc_bxt mac80211 snd_hda_codec iTCO_vendor_support libarc4 snd_hda_core snd_hwdep snd_seq snd_seq_device cfg80211 snd_pcm pcspkr i2c_i801 snd_timer i2c_smbus snd rfkill soundcore lpc_ich mhi virtio_balloon joydev xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_blk virtio_console virtio_gpu net_failover failover virtio_dma_buf serio_raw fuse qemu_fw_cfg > [ 41.271284] CPU: 0 UID: 0 PID: 2098 Comm: bash Kdump: loaded Not tainted 6.12.0-rc1+ #29 > [ 41.271286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 > [ 41.271287] RIP: 0010:free_large_kmalloc+0x4d/0x80 > [ 41.271289] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89 c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 59 0e ce 00 e9 5b 0e ce 00 <0f> 0b 80 3d c8 29 3c 02 00 0f 84 2d 0e ce 00 b8 00 f0 ff ff eb d1 > [ 41.271290] RSP: 0018:ffffa40881a33c50 EFLAGS: 00010246 > [ 41.271292] RAX: 000fffffc0000000 RBX: ffffe697c0278000 RCX: 0000000000000000 > [ 41.271293] RDX: ffffe697c0b60008 RSI: ffff8d00c9e00000 RDI: ffffe697c0278000 > [ 41.271294] RBP: ffff8d00c3af0000 R08: ffff8d00f215d0c0 R09: 0000000080400038 > [ 41.271294] R10: 0000000080400038 R11: 0000000000000000 R12: 0000000000000001 > [ 41.271295] R13: ffffffffc0ef8948 R14: ffffffffc0ef8948 R15: ffff8d00c1277560 > [ 41.271296] FS: 00007fd31e556740(0000) GS:ffff8d011e400000(0000) knlGS:0000000000000000 > [ 41.271297] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.271298] CR2: 00007f778d3ffb38 CR3: 00000000065dc000 CR4: 0000000000752ef0 > [ 41.271301] PKRU: 55555554 > [ 41.271302] Call Trace: > [ 41.271304] <TASK> > [ 41.271304] ? free_large_kmalloc+0x4d/0x80 > [ 41.271306] ? __warn.cold+0x93/0xfa > [ 41.271308] ? free_large_kmalloc+0x4d/0x80 > [ 41.271311] ? report_bug+0xff/0x140 > [ 41.271314] ? handle_bug+0x58/0x90 > [ 41.271316] ? exc_invalid_op+0x17/0x70 > [ 41.271317] ? asm_exc_invalid_op+0x1a/0x20 > [ 41.271321] ? free_large_kmalloc+0x4d/0x80 > [ 41.271323] ath12k_dp_free+0xdc/0x110 [ath12k] > [ 41.271337] ath12k_core_deinit+0x8d/0xb0 [ath12k] > [ 41.271345] ath12k_pci_remove+0x50/0xf0 [ath12k] > [ 41.271354] pci_device_remove+0x3f/0xb0 > [ 41.271356] device_release_driver_internal+0x19c/0x200 > [ 41.271359] unbind_store+0xa1/0xb0 FYI I didn't comment on this previously but <https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages> has some guidance on trimming backtraces in commit messages. > ... > > The issue is always reproducible from a VM because the MSI addressing > initialization is failing. > > In order to fix the issue, just check if the buffers were already released > and if they need to be released, in addition set to NULL for the checking. > > cc: stable@vger.kernel.org > Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") > Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> > --- > v2: > - Fix the commit size in Fixes > v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@redhat.com/ > > drivers/net/wireless/ath/ath12k/dp.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c > index 789d430e4455..9d878d815f3c 100644 > --- a/drivers/net/wireless/ath/ath12k/dp.c > +++ b/drivers/net/wireless/ath/ath12k/dp.c > @@ -1277,8 +1277,12 @@ void ath12k_dp_free(struct ath12k_base *ab) > > ath12k_dp_rx_reo_cmd_list_cleanup(ab); > > - for (i = 0; i < ab->hw_params->max_tx_ring; i++) > - kfree(dp->tx_ring[i].tx_status); > + for (i = 0; i < ab->hw_params->max_tx_ring; i++) { > + if (dp->tx_ring[i].tx_status) { I still don't understand why this check is necessary. Why isn't just adding the "dp->tx_ring[i].tx_status = NULL" inside the loop sufficient? You previously wrote the "warning comes from free_large_kmalloc". Is that warning due to a NULL pointer, or due to a double free? I don't see how it could be from a NULL pointer since in kfree() the 1st thing it does is trace the entry, but then the 2nd thing it does is return "if (unlikely(ZERO_OR_NULL_PTR(object)))" So I'm suspecting a double free, and setting dp->tx_ring[i].tx_status = NULL should fix that issue. > + kfree(dp->tx_ring[i].tx_status); > + dp->tx_ring[i].tx_status = NULL; > + } > + } > > ath12k_dp_rx_free(ab); > /* Deinit any SOC level resource */
Hello Jeff, > FYI I didn't comment on this previously but > <https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages> > has some guidance on trimming backtraces in commit messages. Ok, I will trim the backtrace in a next version of the patch. And I will fix the typos too. > I still don't understand why this check is necessary. > > Why isn't just adding the "dp->tx_ring[i].tx_status = NULL" inside the loop > sufficient? > > You previously wrote the "warning comes from free_large_kmalloc". Is that > warning due to a NULL pointer, or due to a double free? > > I don't see how it could be from a NULL pointer since in kfree() the 1st thing > it does is trace the entry, but then the 2nd thing it does is return "if > (unlikely(ZERO_OR_NULL_PTR(object)))" > > So I'm suspecting a double free, and setting dp->tx_ring[i].tx_status = NULL > should fix that issue. Sorry, I didn't understand you at the beginning and I thought that you were referring to the complete fix (including setting to NULL). You are right, it is a double free and the test is not necessary, just set to NULL after freeing. Thanks Best regards José Ignacio
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c index 789d430e4455..9d878d815f3c 100644 --- a/drivers/net/wireless/ath/ath12k/dp.c +++ b/drivers/net/wireless/ath/ath12k/dp.c @@ -1277,8 +1277,12 @@ void ath12k_dp_free(struct ath12k_base *ab) ath12k_dp_rx_reo_cmd_list_cleanup(ab); - for (i = 0; i < ab->hw_params->max_tx_ring; i++) - kfree(dp->tx_ring[i].tx_status); + for (i = 0; i < ab->hw_params->max_tx_ring; i++) { + if (dp->tx_ring[i].tx_status) { + kfree(dp->tx_ring[i].tx_status); + dp->tx_ring[i].tx_status = NULL; + } + } ath12k_dp_rx_free(ab); /* Deinit any SOC level resource */
If there is an error during some initialization realated to firmware, the buffers dp->tx_ring[i].tx_status are released. However this is released again when the device is unbinded (ath12k_pci), and we get: [ 41.271233] WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80 [ 41.271246] Modules linked in: uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc qrtr_mhi intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm rapl qrtr snd_hda_codec_generic ath12k qmi_helpers snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi iTCO_wdt intel_pmc_bxt mac80211 snd_hda_codec iTCO_vendor_support libarc4 snd_hda_core snd_hwdep snd_seq snd_seq_device cfg80211 snd_pcm pcspkr i2c_i801 snd_timer i2c_smbus snd rfkill soundcore lpc_ich mhi virtio_balloon joydev xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_blk virtio_console virtio_gpu net_failover failover virtio_dma_buf serio_raw fuse qemu_fw_cfg [ 41.271284] CPU: 0 UID: 0 PID: 2098 Comm: bash Kdump: loaded Not tainted 6.12.0-rc1+ #29 [ 41.271286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 41.271287] RIP: 0010:free_large_kmalloc+0x4d/0x80 [ 41.271289] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89 c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 59 0e ce 00 e9 5b 0e ce 00 <0f> 0b 80 3d c8 29 3c 02 00 0f 84 2d 0e ce 00 b8 00 f0 ff ff eb d1 [ 41.271290] RSP: 0018:ffffa40881a33c50 EFLAGS: 00010246 [ 41.271292] RAX: 000fffffc0000000 RBX: ffffe697c0278000 RCX: 0000000000000000 [ 41.271293] RDX: ffffe697c0b60008 RSI: ffff8d00c9e00000 RDI: ffffe697c0278000 [ 41.271294] RBP: ffff8d00c3af0000 R08: ffff8d00f215d0c0 R09: 0000000080400038 [ 41.271294] R10: 0000000080400038 R11: 0000000000000000 R12: 0000000000000001 [ 41.271295] R13: ffffffffc0ef8948 R14: ffffffffc0ef8948 R15: ffff8d00c1277560 [ 41.271296] FS: 00007fd31e556740(0000) GS:ffff8d011e400000(0000) knlGS:0000000000000000 [ 41.271297] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.271298] CR2: 00007f778d3ffb38 CR3: 00000000065dc000 CR4: 0000000000752ef0 [ 41.271301] PKRU: 55555554 [ 41.271302] Call Trace: [ 41.271304] <TASK> [ 41.271304] ? free_large_kmalloc+0x4d/0x80 [ 41.271306] ? __warn.cold+0x93/0xfa [ 41.271308] ? free_large_kmalloc+0x4d/0x80 [ 41.271311] ? report_bug+0xff/0x140 [ 41.271314] ? handle_bug+0x58/0x90 [ 41.271316] ? exc_invalid_op+0x17/0x70 [ 41.271317] ? asm_exc_invalid_op+0x1a/0x20 [ 41.271321] ? free_large_kmalloc+0x4d/0x80 [ 41.271323] ath12k_dp_free+0xdc/0x110 [ath12k] [ 41.271337] ath12k_core_deinit+0x8d/0xb0 [ath12k] [ 41.271345] ath12k_pci_remove+0x50/0xf0 [ath12k] [ 41.271354] pci_device_remove+0x3f/0xb0 [ 41.271356] device_release_driver_internal+0x19c/0x200 [ 41.271359] unbind_store+0xa1/0xb0 ... The issue is always reproducible from a VM because the MSI addressing initialization is failing. In order to fix the issue, just check if the buffers were already released and if they need to be released, in addition set to NULL for the checking. cc: stable@vger.kernel.org Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> --- v2: - Fix the commit size in Fixes v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@redhat.com/ drivers/net/wireless/ath/ath12k/dp.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)