Message ID | 20240605034639.3942219-1-quic_miaoqing@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | [v2] wifi: ath11k: Add firmware coredump collection support | expand |
On 6/4/2024 8:46 PM, Miaoqing Pan wrote: > In case of firmware assert snapshot of firmware memory is essential for > debugging. Add firmware coredump collection support for PCI bus. > Collect RDDM and firmware paging dumps from MHI and pack them in TLV > format and also pack various memory shared during QMI phase in separate > TLVs. Add necessary header and share the dumps to user space using dev > coredump framework. Coredump collection is disabled by default and can > be enabled using menuconfig. Dump collected for a radio is 55 MB > approximately. > > The changeset is mostly copied from: > https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/. > > Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 > > Signed-off-by: Miaoqing Pan <quic_miaoqing@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Miaoqing Pan <quic_miaoqing@quicinc.com> writes: > In case of firmware assert snapshot of firmware memory is essential for > debugging. Add firmware coredump collection support for PCI bus. > Collect RDDM and firmware paging dumps from MHI and pack them in TLV > format and also pack various memory shared during QMI phase in separate > TLVs. Add necessary header and share the dumps to user space using dev > coredump framework. Coredump collection is disabled by default and can > be enabled using menuconfig. Dump collected for a radio is 55 MB > approximately. > > The changeset is mostly copied from: > https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/. > > Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 > > Signed-off-by: Miaoqing Pan <quic_miaoqing@quicinc.com> This has a similar KASAN warning as did the ath12k patch: [ 48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left [ 48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200 [ 48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 [ 63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash [ 64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM [ 64.132388] ================================================================== [ 64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88 [ 64.132578] [ 64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678 [ 64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k] [ 64.132791] Call Trace: [ 64.132827] <TASK> [ 64.132867] dump_stack_lvl+0x7d/0xe0 [ 64.132910] print_address_description.constprop.0+0x33/0x3a0 [ 64.132958] print_report+0xb5/0x260 [ 64.133038] ? kasan_addr_to_slab+0xd/0x80 [ 64.133096] kasan_report+0xd8/0x110 [ 64.133132] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133179] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133225] __asan_report_store_n_noabort+0x12/0x20 [ 64.133266] ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133311] ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci] [ 64.133358] ? lock_sync+0x1a0/0x1a0 [ 64.133398] ath11k_coredump_collect+0x60/0x73 [ath11k] [ 64.133466] ath11k_core_reset+0x225/0x640 [ath11k] [ 64.133524] ? debug_smp_processor_id+0x17/0x20 [ 64.133564] process_one_work+0x8cc/0x19c0 [ 64.133893] ? pwq_dec_nr_in_flight+0x580/0x580 [ 64.133934] ? move_linked_works+0x128/0x2c0 [ 64.134007] ? assign_work+0x15e/0x270 [ 64.134074] worker_thread+0x715/0x1230 [ 64.134114] ? __this_cpu_preempt_check+0x13/0x20 [ 64.134153] ? lockdep_hardirqs_on+0x7d/0x100 [ 64.134192] ? rescuer_thread+0xdb0/0xdb0 [ 64.134229] kthread+0x2fa/0x3f0 [ 64.134266] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 64.134308] ret_from_fork+0x31/0x70 [ 64.134345] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 64.134386] ret_from_fork_asm+0x11/0x20 [ 64.134426] </TASK> [ 64.134459] [ 64.134498] The buggy address belongs to the virtual mapping at#012[ 64.134498] [ffffc90003965000, ffffc90004977000) created by:#012[ 64.134498] ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci] [ 64.134576] [ 64.134606] The buggy address belongs to the physical page: [ 64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee [ 64.134699] flags: 0x200000000000000(node=0|zone=2) [ 64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 [ 64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 64.135523] page dumped because: kasan: bad access detected [ 64.136273] [ 64.136928] Memory state around the buggy address: [ 64.137641] ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 64.138361] ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.139742] ^ [ 64.140495] ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.141222] ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.141892] ================================================================== [ 64.142674] Disabling lock debugging due to kernel taint [ 64.143630] ath11k_pci 0000:06:00.0: Uploading coredump > --- > v2: fix implicit declaration of function 'vzalloc'. > > drivers/net/wireless/ath/ath11k/Kconfig | 11 ++ > drivers/net/wireless/ath/ath11k/Makefile | 1 + > drivers/net/wireless/ath/ath11k/core.c | 2 + > drivers/net/wireless/ath/ath11k/core.h | 5 + > drivers/net/wireless/ath/ath11k/coredump.c | 52 ++++++ > drivers/net/wireless/ath/ath11k/coredump.h | 79 +++++++++ > drivers/net/wireless/ath/ath11k/hif.h | 7 + > drivers/net/wireless/ath/ath11k/mhi.c | 5 + > drivers/net/wireless/ath/ath11k/mhi.h | 1 + > drivers/net/wireless/ath/ath11k/pci.c | 191 +++++++++++++++++++++ > drivers/net/wireless/ath/ath11k/qmi.c | 45 ++--- > drivers/net/wireless/ath/ath11k/qmi.h | 9 +- > 12 files changed, 384 insertions(+), 24 deletions(-) > create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c > create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h I feel that the QMI changes should be in a separat patch and explaining in detail what they are about. Didn't review those now as there's no explanation. > diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig > index 27f0523bf967..bb91da0098b4 100644 > --- a/drivers/net/wireless/ath/ath11k/Kconfig > +++ b/drivers/net/wireless/ath/ath11k/Kconfig > @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL > Enable ath11k spectral scan support > > Say Y to enable access to the FFT/spectral data via debugfs. > + > +config ATH11K_COREDUMP > + bool "ath11k coredump" > + depends on ATH11K > + select WANT_DEV_COREDUMP > + help > + Enable ath11k coredump collection > + > + If unsure, say Y to make it easier to debug problems. But if > + dump collection not required choose N. I'm not sure if a new Kconfig option is justified? Maybe instead just use CONFIG_DEV_COREDUMP directly.
On 7/11/2024 12:20 AM, Kalle Valo wrote: > Miaoqing Pan <quic_miaoqing@quicinc.com> writes: > >> In case of firmware assert snapshot of firmware memory is essential for >> debugging. Add firmware coredump collection support for PCI bus. >> Collect RDDM and firmware paging dumps from MHI and pack them in TLV >> format and also pack various memory shared during QMI phase in separate >> TLVs. Add necessary header and share the dumps to user space using dev >> coredump framework. Coredump collection is disabled by default and can >> be enabled using menuconfig. Dump collected for a radio is 55 MB >> approximately. >> >> The changeset is mostly copied from: >> https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/. >> >> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 >> >> Signed-off-by: Miaoqing Pan <quic_miaoqing@quicinc.com> > > This has a similar KASAN warning as did the ath12k patch: > > [ 48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left > [ 48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200 > [ 48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 > [ 63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash > [ 64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM > [ 64.132388] ================================================================== > [ 64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] > [ 64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88 > [ 64.132578] > [ 64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678 > [ 64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 > [ 64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k] > [ 64.132791] Call Trace: > [ 64.132827] <TASK> > [ 64.132867] dump_stack_lvl+0x7d/0xe0 > [ 64.132910] print_address_description.constprop.0+0x33/0x3a0 > [ 64.132958] print_report+0xb5/0x260 > [ 64.133038] ? kasan_addr_to_slab+0xd/0x80 > [ 64.133096] kasan_report+0xd8/0x110 > [ 64.133132] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] > [ 64.133179] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] > [ 64.133225] __asan_report_store_n_noabort+0x12/0x20 > [ 64.133266] ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] > [ 64.133311] ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci] > [ 64.133358] ? lock_sync+0x1a0/0x1a0 > [ 64.133398] ath11k_coredump_collect+0x60/0x73 [ath11k] > [ 64.133466] ath11k_core_reset+0x225/0x640 [ath11k] > [ 64.133524] ? debug_smp_processor_id+0x17/0x20 > [ 64.133564] process_one_work+0x8cc/0x19c0 > [ 64.133893] ? pwq_dec_nr_in_flight+0x580/0x580 > [ 64.133934] ? move_linked_works+0x128/0x2c0 > [ 64.134007] ? assign_work+0x15e/0x270 > [ 64.134074] worker_thread+0x715/0x1230 > [ 64.134114] ? __this_cpu_preempt_check+0x13/0x20 > [ 64.134153] ? lockdep_hardirqs_on+0x7d/0x100 > [ 64.134192] ? rescuer_thread+0xdb0/0xdb0 > [ 64.134229] kthread+0x2fa/0x3f0 > [ 64.134266] ? kthread_insert_work_sanity_check+0xd0/0xd0 > [ 64.134308] ret_from_fork+0x31/0x70 > [ 64.134345] ? kthread_insert_work_sanity_check+0xd0/0xd0 > [ 64.134386] ret_from_fork_asm+0x11/0x20 > [ 64.134426] </TASK> > [ 64.134459] > [ 64.134498] The buggy address belongs to the virtual mapping at#012[ 64.134498] [ffffc90003965000, ffffc90004977000) created by:#012[ 64.134498] ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci] > [ 64.134576] > [ 64.134606] The buggy address belongs to the physical page: > [ 64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee > [ 64.134699] flags: 0x200000000000000(node=0|zone=2) > [ 64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 > [ 64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > [ 64.135523] page dumped because: kasan: bad access detected > [ 64.136273] > [ 64.136928] Memory state around the buggy address: > [ 64.137641] ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 64.138361] ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > [ 64.139742] ^ > [ 64.140495] ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > [ 64.141222] ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > [ 64.141892] ================================================================== > [ 64.142674] Disabling lock debugging due to kernel taint > [ 64.143630] ath11k_pci 0000:06:00.0: Uploading coredump Thanks for catching this issue, it's triggered by the statement of 'dump_tlv->type = cpu_to_le32(mem_idx);' The 'buf' which assigned to 'dump_tlv' is out-of-bounds. Will be fixed in the next version. > >> --- >> v2: fix implicit declaration of function 'vzalloc'. >> >> drivers/net/wireless/ath/ath11k/Kconfig | 11 ++ >> drivers/net/wireless/ath/ath11k/Makefile | 1 + >> drivers/net/wireless/ath/ath11k/core.c | 2 + >> drivers/net/wireless/ath/ath11k/core.h | 5 + >> drivers/net/wireless/ath/ath11k/coredump.c | 52 ++++++ >> drivers/net/wireless/ath/ath11k/coredump.h | 79 +++++++++ >> drivers/net/wireless/ath/ath11k/hif.h | 7 + >> drivers/net/wireless/ath/ath11k/mhi.c | 5 + >> drivers/net/wireless/ath/ath11k/mhi.h | 1 + >> drivers/net/wireless/ath/ath11k/pci.c | 191 +++++++++++++++++++++ >> drivers/net/wireless/ath/ath11k/qmi.c | 45 ++--- >> drivers/net/wireless/ath/ath11k/qmi.h | 9 +- >> 12 files changed, 384 insertions(+), 24 deletions(-) >> create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c >> create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h > > I feel that the QMI changes should be in a separat patch and explaining > in detail what they are about. Didn't review those now as there's no > explanation. Minor changes for updating 'iaddr' definition. IMO, don't need a separate patch. struct target_mem_chunk { u32 prev_size; u32 prev_type; dma_addr_t paddr; - u32 *vaddr; - void __iomem *iaddr; + union { + u32 *vaddr; + void __iomem *iaddr; + } v; }; > >> diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig >> index 27f0523bf967..bb91da0098b4 100644 >> --- a/drivers/net/wireless/ath/ath11k/Kconfig >> +++ b/drivers/net/wireless/ath/ath11k/Kconfig >> @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL >> Enable ath11k spectral scan support >> >> Say Y to enable access to the FFT/spectral data via debugfs. >> + >> +config ATH11K_COREDUMP >> + bool "ath11k coredump" >> + depends on ATH11K >> + select WANT_DEV_COREDUMP >> + help >> + Enable ath11k coredump collection >> + >> + If unsure, say Y to make it easier to debug problems. But if >> + dump collection not required choose N. > > I'm not sure if a new Kconfig option is justified? Maybe instead just > use CONFIG_DEV_COREDUMP directly. ok, will be removed.
On 7/12/2024 2:38 AM, Miaoqing Pan wrote: > On 7/11/2024 12:20 AM, Kalle Valo wrote: >> Miaoqing Pan <quic_miaoqing@quicinc.com> writes: >> I feel that the QMI changes should be in a separat patch and explaining >> in detail what they are about. Didn't review those now as there's no >> explanation. > > Minor changes for updating 'iaddr' definition. IMO, don't need a > separate patch. > struct target_mem_chunk { > u32 prev_size; > u32 prev_type; > dma_addr_t paddr; > - u32 *vaddr; > - void __iomem *iaddr; > + union { > + u32 *vaddr; > + void __iomem *iaddr; > + } v; > }; Putting something into a union isn't minor. You should justify the reason for doing it and defend why it is safe to do it. And note that if you make it an anonymous union then most, if not all, of the code changes are unnecessary. /jeff
diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig index 27f0523bf967..bb91da0098b4 100644 --- a/drivers/net/wireless/ath/ath11k/Kconfig +++ b/drivers/net/wireless/ath/ath11k/Kconfig @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL Enable ath11k spectral scan support Say Y to enable access to the FFT/spectral data via debugfs. + +config ATH11K_COREDUMP + bool "ath11k coredump" + depends on ATH11K + select WANT_DEV_COREDUMP + help + Enable ath11k coredump collection + + If unsure, say Y to make it easier to debug problems. But if + dump collection not required choose N. + diff --git a/drivers/net/wireless/ath/ath11k/Makefile b/drivers/net/wireless/ath/ath11k/Makefile index 43d2d8ddcdc0..685341dd28fa 100644 --- a/drivers/net/wireless/ath/ath11k/Makefile +++ b/drivers/net/wireless/ath/ath11k/Makefile @@ -27,6 +27,7 @@ ath11k-$(CONFIG_ATH11K_TRACING) += trace.o ath11k-$(CONFIG_THERMAL) += thermal.o ath11k-$(CONFIG_ATH11K_SPECTRAL) += spectral.o ath11k-$(CONFIG_PM) += wow.o +ath11k-$(CONFIG_ATH11K_COREDUMP) += coredump.o obj-$(CONFIG_ATH11K_AHB) += ath11k_ahb.o ath11k_ahb-y += ahb.o diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index 03187df26000..56a7195bddef 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -2177,6 +2177,7 @@ static void ath11k_core_reset(struct work_struct *work) reinit_completion(&ab->recovery_start); atomic_set(&ab->recovery_start_count, 0); + ath11k_coredump_collect(ab); ath11k_core_pre_reconfigure_recovery(ab); reinit_completion(&ab->reconfigure_complete); @@ -2313,6 +2314,7 @@ struct ath11k_base *ath11k_core_alloc(struct device *dev, size_t priv_size, INIT_WORK(&ab->restart_work, ath11k_core_restart); INIT_WORK(&ab->update_11d_work, ath11k_update_11d); INIT_WORK(&ab->reset_work, ath11k_core_reset); + INIT_WORK(&ab->dump_work, ath11k_coredump_upload); timer_setup(&ab->rx_replenish_retry, ath11k_ce_rx_replenish_retry, 0); init_completion(&ab->htc_suspend); init_completion(&ab->wow.wakeup_completed); diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h index 205f40ee6b66..978ec9bdf868 100644 --- a/drivers/net/wireless/ath/ath11k/core.h +++ b/drivers/net/wireless/ath/ath11k/core.h @@ -32,6 +32,7 @@ #include "spectral.h" #include "wow.h" #include "fw.h" +#include "coredump.h" #define SM(_v, _f) (((_v) << _f##_LSB) & _f##_MASK) @@ -892,6 +893,10 @@ struct ath11k_base { /* HW channel counters frequency value in hertz common to all MACs */ u32 cc_freq_hz; + struct ath11k_dump_file_data *dump_data; + size_t ath11k_coredump_len; + struct work_struct dump_work; + struct ath11k_htc htc; struct ath11k_dp dp; diff --git a/drivers/net/wireless/ath/ath11k/coredump.c b/drivers/net/wireless/ath/ath11k/coredump.c new file mode 100644 index 000000000000..b8bad358cebe --- /dev/null +++ b/drivers/net/wireless/ath/ath11k/coredump.c @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * Copyright (c) 2020 The Linux Foundation. All rights reserved. + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. + */ +#include <linux/devcoredump.h> +#include "hif.h" +#include "coredump.h" +#include "debug.h" + +enum +ath11k_fw_crash_dump_type ath11k_coredump_get_dump_type(int type) +{ + enum ath11k_fw_crash_dump_type dump_type; + + switch (type) { + case HOST_DDR_REGION_TYPE: + dump_type = FW_CRASH_DUMP_REMOTE_MEM_DATA; + break; + case M3_DUMP_REGION_TYPE: + dump_type = FW_CRASH_DUMP_M3_DUMP; + break; + case PAGEABLE_MEM_REGION_TYPE: + dump_type = FW_CRASH_DUMP_PAGEABLE_DATA; + break; + case BDF_MEM_REGION_TYPE: + case CALDB_MEM_REGION_TYPE: + dump_type = FW_CRASH_DUMP_NONE; + break; + default: + dump_type = FW_CRASH_DUMP_TYPE_MAX; + break; + } + + return dump_type; +} +EXPORT_SYMBOL(ath11k_coredump_get_dump_type); + +void ath11k_coredump_upload(struct work_struct *work) +{ + struct ath11k_base *ab = container_of(work, struct ath11k_base, dump_work); + + ath11k_info(ab, "Uploading coredump\n"); + /* dev_coredumpv() takes ownership of the buffer */ + dev_coredumpv(ab->dev, ab->dump_data, ab->ath11k_coredump_len, GFP_KERNEL); + ab->dump_data = NULL; +} + +void ath11k_coredump_collect(struct ath11k_base *ab) +{ + ath11k_hif_coredump_download(ab); +} diff --git a/drivers/net/wireless/ath/ath11k/coredump.h b/drivers/net/wireless/ath/ath11k/coredump.h new file mode 100644 index 000000000000..b9ea5e4de939 --- /dev/null +++ b/drivers/net/wireless/ath/ath11k/coredump.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: BSD-3-Clause-Clear */ +/* + * Copyright (c) 2020 The Linux Foundation. All rights reserved. + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. + */ +#ifndef _ATH11K_COREDUMP_H_ +#define _ATH11K_COREDUMP_H_ + +#define ATH11K_FW_CRASH_DUMP_V2 2 + +enum ath11k_fw_crash_dump_type { + FW_CRASH_DUMP_PAGING_DATA, + FW_CRASH_DUMP_RDDM_DATA, + FW_CRASH_DUMP_REMOTE_MEM_DATA, + FW_CRASH_DUMP_PAGEABLE_DATA, + FW_CRASH_DUMP_M3_DUMP, + FW_CRASH_DUMP_NONE, + + /* keep last */ + FW_CRASH_DUMP_TYPE_MAX, +}; + +#define COREDUMP_TLV_HDR_SIZE 8 + +struct ath11k_tlv_dump_data { + /* see ath11k_fw_crash_dump_type above */ + __le32 type; + + /* in bytes */ + __le32 tlv_len; + + /* pad to 32-bit boundaries as needed */ + u8 tlv_data[]; +} __packed; + +struct ath11k_dump_file_data { + /* "ATH11K-FW-DUMP" */ + char df_magic[16]; + /* total dump len in bytes */ + __le32 len; + /* file dump version */ + __le32 version; + /* pci device id */ + __le32 chip_id; + /* qrtr instance id */ + __le32 qrtr_id; + /* pci domain id */ + __le32 bus_id; + guid_t guid; + /* time-of-day stamp */ + __le64 tv_sec; + /* time-of-day stamp, nano-seconds */ + __le64 tv_nsec; + /* room for growth w/out changing binary format */ + u8 unused[128]; + u8 data[]; +} __packed; + +#ifdef CONFIG_ATH11K_COREDUMP +enum ath11k_fw_crash_dump_type ath11k_coredump_get_dump_type(int type); +void ath11k_coredump_upload(struct work_struct *work); +void ath11k_coredump_collect(struct ath11k_base *ab); +#else +static inline enum +ath11k_fw_crash_dump_type ath11k_coredump_get_dump_type(int type) +{ + return FW_CRASH_DUMP_TYPE_MAX; +} + +static inline void ath11k_coredump_upload(struct work_struct *work) +{ +} + +static inline void ath11k_coredump_collect(struct ath11k_base *ab) +{ +} +#endif + +#endif diff --git a/drivers/net/wireless/ath/ath11k/hif.h b/drivers/net/wireless/ath/ath11k/hif.h index c4c6cc09c7c1..762db7c95ad8 100644 --- a/drivers/net/wireless/ath/ath11k/hif.h +++ b/drivers/net/wireless/ath/ath11k/hif.h @@ -31,6 +31,7 @@ struct ath11k_hif_ops { void (*ce_irq_enable)(struct ath11k_base *ab); void (*ce_irq_disable)(struct ath11k_base *ab); void (*get_ce_msi_idx)(struct ath11k_base *ab, u32 ce_id, u32 *msi_idx); + void (*coredump_download)(struct ath11k_base *ab); }; static inline void ath11k_hif_ce_irq_enable(struct ath11k_base *ab) @@ -152,4 +153,10 @@ static inline void ath11k_get_ce_msi_idx(struct ath11k_base *ab, u32 ce_id, *msi_data_idx = ce_id; } +static inline void ath11k_hif_coredump_download(struct ath11k_base *ab) +{ + if (ab->hif.ops->coredump_download) + ab->hif.ops->coredump_download(ab); +} + #endif /* _HIF_H_ */ diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c index ab182690aed3..db23a82a0c0f 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.c +++ b/drivers/net/wireless/ath/ath11k/mhi.c @@ -498,3 +498,8 @@ int ath11k_mhi_resume(struct ath11k_pci *ab_pci) return 0; } + +void ath11k_mhi_coredump(struct mhi_controller *mhi_ctrl, bool in_panic) +{ + mhi_download_rddm_image(mhi_ctrl, in_panic); +} diff --git a/drivers/net/wireless/ath/ath11k/mhi.h b/drivers/net/wireless/ath/ath11k/mhi.h index 2d567705e732..4a7e20da3c7d 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.h +++ b/drivers/net/wireless/ath/ath11k/mhi.h @@ -26,4 +26,5 @@ void ath11k_mhi_clear_vector(struct ath11k_base *ab); int ath11k_mhi_suspend(struct ath11k_pci *ar_pci); int ath11k_mhi_resume(struct ath11k_pci *ar_pci); +void ath11k_mhi_coredump(struct mhi_controller *mhi_ctrl, bool in_panic); #endif diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c index 8d63b84d1261..088669c59697 100644 --- a/drivers/net/wireless/ath/ath11k/pci.c +++ b/drivers/net/wireless/ath/ath11k/pci.c @@ -8,6 +8,8 @@ #include <linux/msi.h> #include <linux/pci.h> #include <linux/of.h> +#include <linux/time.h> +#include <linux/vmalloc.h> #include "pci.h" #include "core.h" @@ -610,6 +612,190 @@ static void ath11k_pci_aspm_restore(struct ath11k_pci *ab_pci) PCI_EXP_LNKCTL_ASPMC); } +#ifdef CONFIG_ATH11K_COREDUMP +static int ath11k_pci_coredump_calculate_size(struct ath11k_base *ab, u32 *dump_seg_sz) +{ + struct ath11k_pci *ab_pci = ath11k_pci_priv(ab); + struct mhi_controller *mhi_ctrl = ab_pci->mhi_ctrl; + struct image_info *rddm_img, *fw_img; + struct ath11k_tlv_dump_data *dump_tlv; + enum ath11k_fw_crash_dump_type mem_type; + u32 len = 0, rddm_tlv_sz = 0, paging_tlv_sz = 0; + struct ath11k_dump_file_data *file_data; + int i; + + rddm_img = mhi_ctrl->rddm_image; + if (!rddm_img) { + ath11k_err(ab, "No RDDM dump found\n"); + return 0; + } + + fw_img = mhi_ctrl->fbc_image; + + for (i = 0; i < fw_img->entries ; i++) { + if (!fw_img->mhi_buf[i].buf) + continue; + + paging_tlv_sz += fw_img->mhi_buf[i].len; + } + dump_seg_sz[FW_CRASH_DUMP_PAGING_DATA] = paging_tlv_sz; + + for (i = 0; i < rddm_img->entries; i++) { + if (!rddm_img->mhi_buf[i].buf) + continue; + + rddm_tlv_sz += rddm_img->mhi_buf[i].len; + } + dump_seg_sz[FW_CRASH_DUMP_RDDM_DATA] = rddm_tlv_sz; + + for (i = 0; i < ab->qmi.mem_seg_count; i++) { + mem_type = ath11k_coredump_get_dump_type(ab->qmi.target_mem[i].type); + + if (mem_type == FW_CRASH_DUMP_NONE) + continue; + + if (mem_type == FW_CRASH_DUMP_TYPE_MAX) { + ath11k_dbg(ab, ATH11K_DBG_PCI, + "target mem region type %d not supported", + ab->qmi.target_mem[i].type); + continue; + } + + if (!ab->qmi.target_mem[i].v.iaddr) + continue; + + dump_seg_sz[mem_type] += ab->qmi.target_mem[i].size; + } + + for (i = 0; i < FW_CRASH_DUMP_TYPE_MAX; i++) { + if (!dump_seg_sz[i]) + continue; + + len += sizeof(*dump_tlv) + dump_seg_sz[i]; + } + + if (len) + len += sizeof(*file_data); + + return len; +} + +static void ath11k_pci_coredump_download(struct ath11k_base *ab) +{ + struct ath11k_pci *ab_pci = ath11k_pci_priv(ab); + struct mhi_controller *mhi_ctrl = ab_pci->mhi_ctrl; + struct image_info *rddm_img, *fw_img; + struct timespec64 timestamp; + int i, len, mem_idx; + enum ath11k_fw_crash_dump_type mem_type; + struct ath11k_dump_file_data *file_data; + struct ath11k_tlv_dump_data *dump_tlv; + size_t hdr_len = sizeof(*file_data); + void *buf; + u32 dump_seg_sz[FW_CRASH_DUMP_TYPE_MAX] = { 0 }; + + ath11k_mhi_coredump(mhi_ctrl, false); + + len = ath11k_pci_coredump_calculate_size(ab, dump_seg_sz); + if (!len) { + ath11k_warn(ab, "No crash dump data found for devcoredump"); + return; + } + + rddm_img = mhi_ctrl->rddm_image; + fw_img = mhi_ctrl->fbc_image; + + /* dev_coredumpv() requires vmalloc data */ + buf = vzalloc(len); + if (!buf) + return; + + ab->dump_data = buf; + ab->ath11k_coredump_len = len; + file_data = ab->dump_data; + strscpy(file_data->df_magic, "ATH11K-FW-DUMP", sizeof(file_data->df_magic)); + file_data->len = cpu_to_le32(len); + file_data->version = cpu_to_le32(ATH11K_FW_CRASH_DUMP_V2); + file_data->chip_id = cpu_to_le32(ab_pci->dev_id); + file_data->qrtr_id = cpu_to_le32(ab_pci->ab->qmi.service_ins_id); + file_data->bus_id = cpu_to_le32(pci_domain_nr(ab_pci->pdev->bus)); + guid_gen(&file_data->guid); + ktime_get_real_ts64(×tamp); + file_data->tv_sec = cpu_to_le64(timestamp.tv_sec); + file_data->tv_nsec = cpu_to_le64(timestamp.tv_nsec); + buf += hdr_len; + dump_tlv = buf; + dump_tlv->type = cpu_to_le32(FW_CRASH_DUMP_PAGING_DATA); + dump_tlv->tlv_len = cpu_to_le32(dump_seg_sz[FW_CRASH_DUMP_PAGING_DATA]); + buf += COREDUMP_TLV_HDR_SIZE; + + /* append all segments together as they are all part of a single contiguous + * block of memory + */ + for (i = 0; i < fw_img->entries ; i++) { + if (!fw_img->mhi_buf[i].buf) + continue; + + memcpy_fromio(buf, (void const __iomem *)fw_img->mhi_buf[i].buf, + fw_img->mhi_buf[i].len); + buf += fw_img->mhi_buf[i].len; + } + + dump_tlv = buf; + dump_tlv->type = cpu_to_le32(FW_CRASH_DUMP_RDDM_DATA); + dump_tlv->tlv_len = cpu_to_le32(dump_seg_sz[FW_CRASH_DUMP_RDDM_DATA]); + buf += COREDUMP_TLV_HDR_SIZE; + + /* append all segments together as they are all part of a single contiguous + * block of memory + */ + for (i = 0; i < rddm_img->entries; i++) { + if (!rddm_img->mhi_buf[i].buf) + continue; + + memcpy_fromio(buf, (void const __iomem *)rddm_img->mhi_buf[i].buf, + rddm_img->mhi_buf[i].len); + buf += rddm_img->mhi_buf[i].len; + } + + mem_idx = FW_CRASH_DUMP_REMOTE_MEM_DATA; + for (; mem_idx < FW_CRASH_DUMP_TYPE_MAX; mem_idx++) { + if (mem_idx == FW_CRASH_DUMP_NONE) + continue; + + dump_tlv = buf; + dump_tlv->type = cpu_to_le32(mem_idx); + dump_tlv->tlv_len = cpu_to_le32(dump_seg_sz[mem_idx]); + buf += COREDUMP_TLV_HDR_SIZE; + + if (!dump_tlv->tlv_len) + continue; + + for (i = 0; i < ab->qmi.mem_seg_count; i++) { + mem_type = ath11k_coredump_get_dump_type + (ab->qmi.target_mem[i].type); + + if (mem_type != mem_idx) + continue; + + if (!ab->qmi.target_mem[i].v.iaddr) { + ath11k_dbg(ab, ATH11K_DBG_PCI, + "Skipping mem region type %d", + ab->qmi.target_mem[i].type); + continue; + } + + memcpy_fromio(buf, ab->qmi.target_mem[i].v.iaddr, + ab->qmi.target_mem[i].size); + + buf += ab->qmi.target_mem[i].size; + } + } + + queue_work(ab->workqueue, &ab->dump_work); +} +#endif + static int ath11k_pci_power_up(struct ath11k_base *ab) { struct ath11k_pci *ab_pci = ath11k_pci_priv(ab); @@ -713,6 +899,9 @@ static const struct ath11k_hif_ops ath11k_pci_hif_ops = { .ce_irq_enable = ath11k_pci_hif_ce_irq_enable, .ce_irq_disable = ath11k_pci_hif_ce_irq_disable, .get_ce_msi_idx = ath11k_pcic_get_ce_msi_idx, +#ifdef CONFIG_ATH11K_COREDUMP + .coredump_download = ath11k_pci_coredump_download, +#endif }; static void ath11k_pci_read_hw_version(struct ath11k_base *ab, u32 *major, u32 *minor) @@ -978,6 +1167,8 @@ static void ath11k_pci_remove(struct pci_dev *pdev) set_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags); + cancel_work_sync(&ab->reset_work); + cancel_work_sync(&ab->dump_work); ath11k_core_deinit(ab); qmi_fail: diff --git a/drivers/net/wireless/ath/ath11k/qmi.c b/drivers/net/wireless/ath/ath11k/qmi.c index aa160e6fe24f..37619ba8502e 100644 --- a/drivers/net/wireless/ath/ath11k/qmi.c +++ b/drivers/net/wireless/ath/ath11k/qmi.c @@ -1955,19 +1955,21 @@ static void ath11k_qmi_free_target_mem_chunk(struct ath11k_base *ab) int i; for (i = 0; i < ab->qmi.mem_seg_count; i++) { - if ((ab->hw_params.fixed_mem_region || - test_bit(ATH11K_FLAG_FIXED_MEM_RGN, &ab->dev_flags)) && - ab->qmi.target_mem[i].iaddr) - iounmap(ab->qmi.target_mem[i].iaddr); + if (!ab->qmi.target_mem[i].v.iaddr) + continue; - if (!ab->qmi.target_mem[i].vaddr) + if (ab->hw_params.fixed_mem_region || + test_bit(ATH11K_FLAG_FIXED_MEM_RGN, &ab->dev_flags)) { + iounmap(ab->qmi.target_mem[i].v.iaddr); + ab->qmi.target_mem[i].v.iaddr = NULL; continue; + } dma_free_coherent(ab->dev, ab->qmi.target_mem[i].prev_size, - ab->qmi.target_mem[i].vaddr, + ab->qmi.target_mem[i].v.vaddr, ab->qmi.target_mem[i].paddr); - ab->qmi.target_mem[i].vaddr = NULL; + ab->qmi.target_mem[i].v.vaddr = NULL; } } @@ -1984,22 +1986,22 @@ static int ath11k_qmi_alloc_target_mem_chunk(struct ath11k_base *ab) /* Firmware reloads in coldboot/firmware recovery. * in such case, no need to allocate memory for FW again. */ - if (chunk->vaddr) { + if (chunk->v.vaddr) { if (chunk->prev_type == chunk->type && chunk->prev_size == chunk->size) continue; /* cannot reuse the existing chunk */ dma_free_coherent(ab->dev, chunk->prev_size, - chunk->vaddr, chunk->paddr); - chunk->vaddr = NULL; + chunk->v.vaddr, chunk->paddr); + chunk->v.vaddr = NULL; } - chunk->vaddr = dma_alloc_coherent(ab->dev, - chunk->size, - &chunk->paddr, - GFP_KERNEL | __GFP_NOWARN); - if (!chunk->vaddr) { + chunk->v.vaddr = dma_alloc_coherent(ab->dev, + chunk->size, + &chunk->paddr, + GFP_KERNEL | __GFP_NOWARN); + if (!chunk->v.vaddr) { if (ab->qmi.mem_seg_count <= ATH11K_QMI_FW_MEM_REQ_SEGMENT_CNT) { ath11k_dbg(ab, ATH11K_DBG_QMI, "dma allocation failed (%d B type %u), will try later with small size\n", @@ -2055,10 +2057,10 @@ static int ath11k_qmi_assign_target_mem_chunk(struct ath11k_base *ab) } ab->qmi.target_mem[idx].paddr = res.start; - ab->qmi.target_mem[idx].iaddr = + ab->qmi.target_mem[idx].v.iaddr = ioremap(ab->qmi.target_mem[idx].paddr, ab->qmi.target_mem[i].size); - if (!ab->qmi.target_mem[idx].iaddr) + if (!ab->qmi.target_mem[idx].v.iaddr) return -EIO; ab->qmi.target_mem[idx].size = ab->qmi.target_mem[i].size; @@ -2068,7 +2070,7 @@ static int ath11k_qmi_assign_target_mem_chunk(struct ath11k_base *ab) break; case BDF_MEM_REGION_TYPE: ab->qmi.target_mem[idx].paddr = ab->hw_params.bdf_addr; - ab->qmi.target_mem[idx].vaddr = NULL; + ab->qmi.target_mem[idx].v.vaddr = NULL; ab->qmi.target_mem[idx].size = ab->qmi.target_mem[i].size; ab->qmi.target_mem[idx].type = ab->qmi.target_mem[i].type; idx++; @@ -2083,18 +2085,19 @@ static int ath11k_qmi_assign_target_mem_chunk(struct ath11k_base *ab) if (hremote_node) { ab->qmi.target_mem[idx].paddr = res.start + host_ddr_sz; - ab->qmi.target_mem[idx].iaddr = + ab->qmi.target_mem[idx].v.iaddr = ioremap(ab->qmi.target_mem[idx].paddr, ab->qmi.target_mem[i].size); - if (!ab->qmi.target_mem[idx].iaddr) + if (!ab->qmi.target_mem[idx].v.iaddr) return -EIO; } else { ab->qmi.target_mem[idx].paddr = ATH11K_QMI_CALDB_ADDRESS; + ab->qmi.target_mem[idx].v.vaddr = NULL; } } else { ab->qmi.target_mem[idx].paddr = 0; - ab->qmi.target_mem[idx].vaddr = NULL; + ab->qmi.target_mem[idx].v.vaddr = NULL; } ab->qmi.target_mem[idx].size = ab->qmi.target_mem[i].size; ab->qmi.target_mem[idx].type = ab->qmi.target_mem[i].type; diff --git a/drivers/net/wireless/ath/ath11k/qmi.h b/drivers/net/wireless/ath/ath11k/qmi.h index 7e06d100af57..016e04f9e898 100644 --- a/drivers/net/wireless/ath/ath11k/qmi.h +++ b/drivers/net/wireless/ath/ath11k/qmi.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: BSD-3-Clause-Clear */ /* * Copyright (c) 2018-2019 The Linux Foundation. All rights reserved. - * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. + * Copyright (c) 2021-2024 Qualcomm Innovation Center, Inc. All rights reserved. */ #ifndef ATH11K_QMI_H @@ -102,8 +102,10 @@ struct target_mem_chunk { u32 prev_size; u32 prev_type; dma_addr_t paddr; - u32 *vaddr; - void __iomem *iaddr; + union { + u32 *vaddr; + void __iomem *iaddr; + } v; }; struct target_info { @@ -154,6 +156,7 @@ struct ath11k_qmi { #define BDF_MEM_REGION_TYPE 0x2 #define M3_DUMP_REGION_TYPE 0x3 #define CALDB_MEM_REGION_TYPE 0x4 +#define PAGEABLE_MEM_REGION_TYPE 0x9 struct qmi_wlanfw_host_cap_req_msg_v01 { u8 num_clients_valid;
In case of firmware assert snapshot of firmware memory is essential for debugging. Add firmware coredump collection support for PCI bus. Collect RDDM and firmware paging dumps from MHI and pack them in TLV format and also pack various memory shared during QMI phase in separate TLVs. Add necessary header and share the dumps to user space using dev coredump framework. Coredump collection is disabled by default and can be enabled using menuconfig. Dump collected for a radio is 55 MB approximately. The changeset is mostly copied from: https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/. Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Miaoqing Pan <quic_miaoqing@quicinc.com> --- v2: fix implicit declaration of function 'vzalloc'. drivers/net/wireless/ath/ath11k/Kconfig | 11 ++ drivers/net/wireless/ath/ath11k/Makefile | 1 + drivers/net/wireless/ath/ath11k/core.c | 2 + drivers/net/wireless/ath/ath11k/core.h | 5 + drivers/net/wireless/ath/ath11k/coredump.c | 52 ++++++ drivers/net/wireless/ath/ath11k/coredump.h | 79 +++++++++ drivers/net/wireless/ath/ath11k/hif.h | 7 + drivers/net/wireless/ath/ath11k/mhi.c | 5 + drivers/net/wireless/ath/ath11k/mhi.h | 1 + drivers/net/wireless/ath/ath11k/pci.c | 191 +++++++++++++++++++++ drivers/net/wireless/ath/ath11k/qmi.c | 45 ++--- drivers/net/wireless/ath/ath11k/qmi.h | 9 +- 12 files changed, 384 insertions(+), 24 deletions(-) create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h