Message ID | 20170501214327.5621-1-adrian@freebsd.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
Su Kang Yin <paradyse@gmail.com> writes: > Without this patch QCA9888 is not working. Also I have to update > board2.bin from Kalle's git repo. More details would be good to know, as the behaviour seems to vary quite a lot. What platform are you using, x86, some ARM board or what? And what kernel exactly?
On 11 May 2017 at 11:55, Kalle Valo wrote: > Su Kang Yin writes: > >> Without this patch QCA9888 is not working. Also I have to update >> board2.bin from Kalle's git repo. > > More details would be good to know, as the behaviour seems to vary quite > a lot. What platform are you using, x86, some ARM board or what? And > what kernel exactly? Kernel 4.9.25 on X86 Intel Atom Panic message is something like that: Workqueue: ath10k_aux_wq ath10k_wmi_event_service_ready_work [ath10k_core] task: ffff880036433a00 ti: ffff880036440000 task.ti: ffff880036440000 RIP: 0010:[<ffffffff8124592a>] [<ffffffff8124592a>] new_slab+0x39a/0x410 RSP: 0018:ffff880036443b58 EFLAGS: 00010092 RAX: 0000000000000006 RBX: 00000000024082c4 RCX: 0000000000000000 RDX: 0000000000000006 RSI: ffff88021e30dd08 RDI: ffff88021e30dd08 RBP: ffff880036443b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000372 R12: ffff88021dc01200 R13: ffff88021dc00cc0 R14: ffff88021dc01200 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88021e300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3e65c1c730 CR3: 0000000001e06000 CR4: 00000000001406e0 Stack: ffffffff8127a4fc ffff0a01ffffff10 00000000024082c4 ffff88021dc01200 ffff88021dc00cc0 ffff88021dc01200 0000000000000001 ffff880036443c58 ffffffff81247ac6 ffff88021e31b360 ffff880036433a00 ffff880036433a00 Call Trace: [<ffffffff8127a4fc>] ? __d_lookup+0x9c/0x160 [<ffffffff81247ac6>] ___slab_alloc+0x396/0x4a0 [<ffffffffa0f8e14d>] ? ath10k_wmi_event_service_ready_work+0x5ad/0x800 [ath10k_core] [<ffffffff811f5279>] ? alloc_kmem_pages+0x9/0x10 [<ffffffff8120f203>] ? kmalloc_order+0x13/0x40 [<ffffffffa0f8e14d>] ? ath10k_wmi_event_service_ready_work+0x5ad/0x800 [ath10k_core] -- Yin
Su Kang Yin <paradyse@gmail.com> writes: > On 11 May 2017 at 11:55, Kalle Valo wrote: >> Su Kang Yin writes: >> >>> Without this patch QCA9888 is not working. Also I have to update >>> board2.bin from Kalle's git repo. >> >> More details would be good to know, as the behaviour seems to vary quite >> a lot. What platform are you using, x86, some ARM board or what? And >> what kernel exactly? > > Kernel 4.9.25 on X86 Intel Atom > > Panic message is something like that: Thanks, this was very helpful.
On Montag, 1. Mai 2017 14:43:27 CEST Adrian Chadd wrote: > This reverts b057886524be060021e3cfad0ba8458c850330cd in 2015 > which converted this allocation from dma_map_coherent() to > kzalloc() / dma_map_single(). > > The current problem manifests when using later model NICs with larger > (>700KiB) scratch spaces in memory. Although the kzalloc call > succeeds, the software IOMMU TLB code (via dma_map_single()) panics > because it can't find 700KiB of linear physmem bounce buffers for DMA. > Now, this is a bit of a silly failure mode for the dma map API, > but it's what we currently have to play with. [....] > > Signed-off-by: Adrian Chadd <adrian@FreeBSD.org> > --- > drivers/net/wireless/ath/ath10k/wmi.c | 35 ++++++++++------------------------- > 1 file changed, 10 insertions(+), 25 deletions(-) Thanks for investigating this. This partial revert fixes following crash for me with QCA99X0 on amd64 with SWIOMMU: [ 9.167281] DMA: Out of SW-IOMMU space for 689816 bytes at device 0000:02:00.0 [ 9.174719] Kernel panic - not syncing: DMA: Random memory could be DMA read [ 9.174719] [ 9.183560] CPU: 0 PID: 133 Comm: kworker/u32:6 Tainted: P OE 4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1 [ 9.194666] Hardware name: Intel Corporation SandyBridge Platform/ETXe-SC T2 i3-2310E, BIOS CHR2R110 05/12/2011 [ 9.205033] Workqueue: ath10k_aux_wq ath10k_wmi_event_service_ready_work [ath10k_core] [ 9.213151] 0000000000000000 ffffffffbd329cd5 0000000000000200 ffffa7b5c1247cb8 [ 9.220827] ffffffffbd17bc24 0000000000000008 ffffa7b5c1247cc8 ffffa7b5c1247c60 [ 9.228523] 000000006036bf02 0000000200000018 ffff91fde700dfc8 0000000000000000 [ 9.236177] Call Trace: [ 9.238692] [<ffffffffbd329cd5>] ? dump_stack+0x5c/0x77 [ 9.244165] [<ffffffffbd17bc24>] ? panic+0xe8/0x236 [ 9.251161] [<ffffffffbd354ffc>] ? swiotlb_map_page+0x16c/0x190 [ 9.259219] [<ffffffffc0c1d3e5>] ? ath10k_wmi_event_service_ready_work+0x4d5/0x680 [ath10k_core] [ 9.272150] [<ffffffffbd09171b>] ? process_one_work+0x14b/0x410 [ 9.280190] [<ffffffffbd0921d5>] ? worker_thread+0x65/0x4a0 [ 9.287869] [<ffffffffbd092170>] ? rescuer_thread+0x340/0x340 [ 9.295689] [<ffffffffbd0974c0>] ? kthread+0xe0/0x100 [ 9.302742] [<ffffffffbd02476b>] ? __switch_to+0x2bb/0x700 [ 9.310213] [<ffffffffbd0973e0>] ? kthread_park+0x60/0x60 [ 9.317556] [<ffffffffbd5fb675>] ? ret_from_fork+0x25/0x30 [ 9.324992] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 9.339429] ---[ end Kernel panic - not syncing: DMA: Random memory could be DMA read [ 9.339429] [ 9.353900] ------------[ cut here ]------------ [ 9.360274] WARNING: CPU: 0 PID: 133 at /home/zumbi/linux-4.9.13/arch/x86/kernel/smp.c:127 update_process_times+0x40/0x50 [ 9.374724] Modules linked in: tpm_infineon iTCO_wdt iTCO_vendor_support ppdev wl(POE) evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate intel_uncore ath10k_pci ath10k_core intel_rapl_perf ath mac80211 i2c_kempld btusb btrtl btbcm btintel pcspkr bluetooth cfg80211 serio_raw rfkill snd_hda_codec_realtek snd_hda_codec_generic battery snd_hda_intel parport_pc parport snd_hda_codec i915 snd_hda_core snd_hwdep shpchp video drm_kms_helper snd_pcm snd_timer snd drm soundcore mei_me button mei i2c_i801 i2c_smbus lpc_ich tpm_tis tpm_tis_core ac tpm autofs4 ext4 crc16 jbd2 fscrypto mbcache sg sd_mod ata_generic ahci pata_jmicron libahci kempld_core mfd_core libata ehci_pci ehci_hcd scsi_mod crc32c_intel usbcore psmouse e1000e usb_common igb i2c_algo_bit dca ptp pps_core fan thermal fjes [ 9.476736] CPU: 0 PID: 133 Comm: kworker/u32:6 Tainted: P OE 4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1 [ 9.491320] Hardware name: Intel Corporation SandyBridge Platform/ETXe-SC T2 i3-2310E, BIOS CHR2R110 05/12/2011 [ 9.505260] Workqueue: ath10k_aux_wq ath10k_wmi_event_service_ready_work [ath10k_core] [ 9.517103] 0000000000000000 ffffffffbd329cd5 0000000000000000 0000000000000000 [ 9.528682] ffffffffbd0778a4 ffff91fde101a140 0000000000000000 ffffa7b5c1247b98 [ 9.540304] ffffffffbd0f7270 0000000000000003 ffff91fde7012580 ffffffffbd0e7dc0 [ 9.551962] Call Trace: [ 9.556381] <IRQ> [ 9.558370] [<ffffffffbd329cd5>] ? dump_stack+0x5c/0x77 [ 9.567776] [<ffffffffbd0778a4>] ? __warn+0xc4/0xe0 [ 9.574863] [<ffffffffbd0f7270>] ? tick_sched_do_timer+0x30/0x30 [ 9.583059] [<ffffffffbd0e7dc0>] ? update_process_times+0x40/0x50 [ 9.591302] [<ffffffffbd0f6c80>] ? tick_sched_handle.isra.13+0x20/0x50 [ 9.600021] [<ffffffffbd0f72a8>] ? tick_sched_timer+0x38/0x70 [ 9.607872] [<ffffffffbd0e8a5c>] ? __hrtimer_run_queues+0xdc/0x250 [ 9.616158] [<ffffffffbd0e8ea9>] ? hrtimer_interrupt+0x99/0x190 [ 9.624100] [<ffffffffbd5fdd29>] ? smp_apic_timer_interrupt+0x39/0x50 [ 9.632618] [<ffffffffbd5fd042>] ? apic_timer_interrupt+0x82/0x90 [ 9.640700] <EOI> [ 9.642681] [<ffffffffbd17bd2e>] ? panic+0x1f2/0x236 [ 9.649615] [<ffffffffbd354ffc>] ? swiotlb_map_page+0x16c/0x190 [ 9.657526] [<ffffffffc0c1d3e5>] ? ath10k_wmi_event_service_ready_work+0x4d5/0x680 [ath10k_core] [ 9.670007] [<ffffffffbd09171b>] ? process_one_work+0x14b/0x410 [ 9.677874] [<ffffffffbd0921d5>] ? worker_thread+0x65/0x4a0 [ 9.685388] [<ffffffffbd092170>] ? rescuer_thread+0x340/0x340 [ 9.693016] [<ffffffffbd0974c0>] ? kthread+0xe0/0x100 [ 9.699914] [<ffffffffbd02476b>] ? __switch_to+0x2bb/0x700 [ 9.707224] [<ffffffffbd0973e0>] ? kthread_park+0x60/0x60 [ 9.714363] [<ffffffffbd5fb675>] ? ret_from_fork+0x25/0x30 [ 9.721589] ---[ end trace 6814c79dfe2a14da ]--- Tested-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Kind regards, Sven
Sven Eckelmann <sven.eckelmann@openmesh.com> writes: > On Montag, 1. Mai 2017 14:43:27 CEST Adrian Chadd wrote: >> This reverts b057886524be060021e3cfad0ba8458c850330cd in 2015 >> which converted this allocation from dma_map_coherent() to >> kzalloc() / dma_map_single(). >> >> The current problem manifests when using later model NICs with larger >> (>700KiB) scratch spaces in memory. Although the kzalloc call >> succeeds, the software IOMMU TLB code (via dma_map_single()) panics >> because it can't find 700KiB of linear physmem bounce buffers for DMA. >> Now, this is a bit of a silly failure mode for the dma map API, >> but it's what we currently have to play with. > [....] >> >> Signed-off-by: Adrian Chadd <adrian@FreeBSD.org> >> --- >> drivers/net/wireless/ath/ath10k/wmi.c | 35 ++++++++++------------------------- >> 1 file changed, 10 insertions(+), 25 deletions(-) > > Thanks for investigating this. This partial revert fixes following crash for > me with QCA99X0 on amd64 with SWIOMMU: > > [ 9.167281] DMA: Out of SW-IOMMU space for 689816 bytes at device 0000:02:00.0 > [ 9.174719] Kernel panic - not syncing: DMA: Random memory could be DMA read I recently got a Compex WLE1216V5-20 board and this patch also fixed a similar crash for me on my HP x86_64 test laptop: [ 840.491019] do_IRQ: 3.82 No irq handler for vector [ 840.965049] ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 [ 841.094985] ath10k_pci 0000:02:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe [ 841.095096] ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1 [ 841.095603] ath10k_pci 0000:02:00.0: firmware ver 10.4-3.4-00082 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 f301de65 [ 842.518071] ath10k_pci 0000:02:00.0: board_file api 2 bmi_id 0:1 crc32 751efba1 [ 845.220186] ath10k_pci 0000:02:00.0: swiotlb buffer is full (sz: 1172264 bytes) [ 845.220478] ath10k_pci 0000:02:00.0: DMA: Out of SW-IOMMU space for 1172264 bytes [ 845.220506] Kernel panic - not syncing: DMA: Random memory could be DMA accessed#012[ 845.220506] [ 845.220532] CPU: 0 PID: 116 Comm: kworker/u16:4 Tainted: G E 4.11.0-wt-ath+ #336 [ 845.220554] Hardware name: Hewlett-Packard HP ProBook 6540b/1722, BIOS 68CDD Ver. F.04 01/27/2010 [ 845.220600] Workqueue: ath10k_aux_wq ath10k_wmi_event_service_ready_work [ath10k_core] [ 845.220626] Call Trace: [ 845.220656] dump_stack+0x85/0xc2 [ 845.220684] panic+0xda/0x228 [ 845.220719] swiotlb_full+0x88/0xa0 [ 845.220746] swiotlb_map_page+0x214/0x280 [ 845.220785] ath10k_wmi_event_service_ready_work+0x3f2/0x7d0 [ath10k_core] [ 845.220812] ? swiotlb_map_sg_attrs+0x140/0x140 [ 845.220858] process_one_work+0x23b/0x7d0 [ 845.221093] worker_thread+0x4e/0x4a0 [ 845.221126] kthread+0x117/0x150 [ 845.221149] ? process_one_work+0x7d0/0x7d0 [ 845.221175] ? kthread_create_on_node+0x40/0x40 [ 845.221205] ret_from_fork+0x31/0x40 [ 845.221314] Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Adrian Chadd <adrian@freebsd.org> writes: > This reverts b057886524be060021e3cfad0ba8458c850330cd in 2015 > which converted this allocation from dma_map_coherent() to > kzalloc() / dma_map_single(). > > The current problem manifests when using later model NICs with larger > (>700KiB) scratch spaces in memory. Although the kzalloc call > succeeds, the software IOMMU TLB code (via dma_map_single()) panics > because it can't find 700KiB of linear physmem bounce buffers for DMA. > Now, this is a bit of a silly failure mode for the dma map API, > but it's what we currently have to play with. > > In these cases, doing kzalloc() works fine, but the dma_map_single() > call fails. > > After chatting with Linus briefly about this, it indeed should be > using dma_alloc_coherent() for doing larger device memory allocation > that requires some kind of physical address mapping. > > You're not supposed to be using kzalloc and dma_map_* calls for > large memory regions, and I'm guessing not for long-held mappings > either. Typically dma mappings should be temporary for DMA, > not long held like these. > > Now, since hopefully the major annoying underlying problem has also been > addressed (ie, ath10k is no longer tears down all of these allocations > and reallocates them every time the vdevs are brought down) fragmentation > should stop being such a touchy issue. If it is though, using > dma_alloc_coherent() use gets us access to the CMB APIs too relatively > easily and ideally we would be allocating memory early in boot for > exactly these reasons. > > Signed-off-by: Adrian Chadd <adrian@FreeBSD.org> So there are now three positive test results (including mine) so I would like to queue this to 4.12 as an important bug fix. But I think all of them was with x86 and I'm worried if this patch creates problems with other architectures, especially on ARM? In the original commit Felix wrote that coherent memory is constrained some architectures. I would hate to go ping pong with this and reverting this patch soon after, instead I would prefer to find a proper solution which works for everyone. Felix, what do you think? Can someone test this patch on LEDE with the most problematic boards? commit b057886524be060021e3cfad0ba8458c850330cd Author: Felix Fietkau <nbd@openwrt.org> Date: Mon Nov 30 19:32:01 2015 +0100 ath10k: do not use coherent memory for allocated device memory chunks Coherent memory is more expensive to allocate (and constrained on some architectures where it has to be pre-allocated). It is also completely unnecessary, since the host has no reason to even access these allocated memory spaces Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Adrian Chadd <adrian@freebsd.org> wrote: > This reverts commit b057886524be ("ath10k: do not use coherent memory for > allocated device memory chunks") in 2015 which converted this allocation from > dma_map_coherent() to kzalloc() / dma_map_single(). > > The current problem manifests when using later model NICs with larger > (>700KiB) scratch spaces in memory. Although the kzalloc call > succeeds, the software IOMMU TLB code (via dma_map_single()) panics > because it can't find 700KiB of linear physmem bounce buffers for DMA. > Now, this is a bit of a silly failure mode for the dma map API, > but it's what we currently have to play with. > > In these cases, doing kzalloc() works fine, but the dma_map_single() > call fails. > > After chatting with Linus briefly about this, it indeed should be > using dma_alloc_coherent() for doing larger device memory allocation > that requires some kind of physical address mapping. > > You're not supposed to be using kzalloc and dma_map_* calls for > large memory regions, and I'm guessing not for long-held mappings > either. Typically dma mappings should be temporary for DMA, > not long held like these. > > Now, since hopefully the major annoying underlying problem has also been > addressed (ie, ath10k is no longer tears down all of these allocations > and reallocates them every time the vdevs are brought down) fragmentation > should stop being such a touchy issue. If it is though, using > dma_alloc_coherent() use gets us access to the CMB APIs too relatively > easily and ideally we would be allocating memory early in boot for > exactly these reasons. > > Signed-off-by: Adrian Chadd <adrian@FreeBSD.org> > Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Patch applied to ath-next branch of ath.git, thanks. 79e68821582d ath10k: go back to using dma_alloc_coherent() for firmware scratch memory
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c index 6afc8d2..cc89f53 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -4482,31 +4482,17 @@ static int ath10k_wmi_alloc_chunk(struct ath10k *ar, u32 req_id, u32 num_units, u32 unit_len) { dma_addr_t paddr; - u32 pool_size = 0; + u32 pool_size; int idx = ar->wmi.num_mem_chunks; - void *vaddr = NULL; - - if (ar->wmi.num_mem_chunks == ARRAY_SIZE(ar->wmi.mem_chunks)) - return -ENOMEM; + void *vaddr; - while (!vaddr && num_units) { - pool_size = num_units * round_up(unit_len, 4); - if (!pool_size) - return -EINVAL; + pool_size = num_units * round_up(unit_len, 4); + vaddr = dma_alloc_coherent(ar->dev, pool_size, &paddr, GFP_KERNEL); - vaddr = kzalloc(pool_size, GFP_KERNEL | __GFP_NOWARN); - if (!vaddr) - num_units /= 2; - } - - if (!num_units) + if (!vaddr) return -ENOMEM; - paddr = dma_map_single(ar->dev, vaddr, pool_size, DMA_BIDIRECTIONAL); - if (dma_mapping_error(ar->dev, paddr)) { - kfree(vaddr); - return -ENOMEM; - } + memset(vaddr, 0, pool_size); ar->wmi.mem_chunks[idx].vaddr = vaddr; ar->wmi.mem_chunks[idx].paddr = paddr; @@ -8290,11 +8276,10 @@ void ath10k_wmi_free_host_mem(struct ath10k *ar) /* free the host memory chunks requested by firmware */ for (i = 0; i < ar->wmi.num_mem_chunks; i++) { - dma_unmap_single(ar->dev, - ar->wmi.mem_chunks[i].paddr, - ar->wmi.mem_chunks[i].len, - DMA_BIDIRECTIONAL); - kfree(ar->wmi.mem_chunks[i].vaddr); + dma_free_coherent(ar->dev, + ar->wmi.mem_chunks[i].len, + ar->wmi.mem_chunks[i].vaddr, + ar->wmi.mem_chunks[i].paddr); } ar->wmi.num_mem_chunks = 0;
This reverts b057886524be060021e3cfad0ba8458c850330cd in 2015 which converted this allocation from dma_map_coherent() to kzalloc() / dma_map_single(). The current problem manifests when using later model NICs with larger (>700KiB) scratch spaces in memory. Although the kzalloc call succeeds, the software IOMMU TLB code (via dma_map_single()) panics because it can't find 700KiB of linear physmem bounce buffers for DMA. Now, this is a bit of a silly failure mode for the dma map API, but it's what we currently have to play with. In these cases, doing kzalloc() works fine, but the dma_map_single() call fails. After chatting with Linus briefly about this, it indeed should be using dma_alloc_coherent() for doing larger device memory allocation that requires some kind of physical address mapping. You're not supposed to be using kzalloc and dma_map_* calls for large memory regions, and I'm guessing not for long-held mappings either. Typically dma mappings should be temporary for DMA, not long held like these. Now, since hopefully the major annoying underlying problem has also been addressed (ie, ath10k is no longer tears down all of these allocations and reallocates them every time the vdevs are brought down) fragmentation should stop being such a touchy issue. If it is though, using dma_alloc_coherent() use gets us access to the CMB APIs too relatively easily and ideally we would be allocating memory early in boot for exactly these reasons. Signed-off-by: Adrian Chadd <adrian@FreeBSD.org> --- drivers/net/wireless/ath/ath10k/wmi.c | 35 ++++++++++------------------------- 1 file changed, 10 insertions(+), 25 deletions(-)