mbox series

[v7,0/7] arm64: Default to 32-bit wide ZONE_DMA

Message ID 20201119175400.9995-1-nsaenzjulienne@suse.de (mailing list archive)
Headers show
Series arm64: Default to 32-bit wide ZONE_DMA | expand

Message

Nicolas Saenz Julienne Nov. 19, 2020, 5:53 p.m. UTC
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v6:
 - Update patch #1 so we reserve crashkernel before request_standard_resources()
 - Tested on top of Catalin's mem_init() patches.

Changes since v5:
 - Unify ACPI/DT functions

Changes since v4:
 - Fix of_dma_get_max_cpu_address() so it returns the last addressable
   addres, not the limit

Changes since v3:
 - Drop patch adding define in dma-mapping
 - Address small review changes
 - Update Ard's patch
 - Add new patch removing examples from mmzone.h

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Remove examples from enum zone_type comment

 arch/arm64/mm/init.c      | 22 +++++++++-------
 drivers/acpi/arm64/iort.c | 55 +++++++++++++++++++++++++++++++++++++++
 drivers/of/address.c      | 42 ++++++++++++++++++++++++++++++
 drivers/of/unittest.c     | 18 +++++++++++++
 include/linux/acpi_iort.h |  4 +++
 include/linux/mmzone.h    | 20 --------------
 include/linux/of.h        |  7 +++++
 7 files changed, 139 insertions(+), 29 deletions(-)

Comments

Catalin Marinas Nov. 20, 2020, 11:39 a.m. UTC | #1
On Thu, 19 Nov 2020 18:53:52 +0100, Nicolas Saenz Julienne wrote:
> Using two distinct DMA zones turned out to be problematic. Here's an
> attempt go back to a saner default.
> 
> I tested this on both a RPi4 and QEMU.

Applied to arm64 (for-next/zone-dma-default-32-bit), thanks!

[1/7] arm64: mm: Move reserve_crashkernel() into mem_init()
      https://git.kernel.org/arm64/c/0a30c53573b0
[2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
      https://git.kernel.org/arm64/c/9804f8c69b04
[3/7] of/address: Introduce of_dma_get_max_cpu_address()
      https://git.kernel.org/arm64/c/964db79d6c18
[4/7] of: unittest: Add test for of_dma_get_max_cpu_address()
      https://git.kernel.org/arm64/c/07d13a1d6120
[5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
      https://git.kernel.org/arm64/c/8424ecdde7df
[6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan
      https://git.kernel.org/arm64/c/2b8652936f0c
[7/7] mm: Remove examples from enum zone_type comment
      https://git.kernel.org/arm64/c/04435217f968
Matt Flax March 1, 2022, 3 a.m. UTC | #2
Hi All,

It seems that the ZONE_DMA changes have broken the operation of Rochip rk3399 chipsets from v5.10.22 onwards.

It isn't clear what needs to be changed to get any of these boards up and running again. Any pointers on how/what to change ?

An easy test for debugging is to run stress :

stress --cpu 4 --io 4 --vm 2 --vm-bytes 128M

stress: info: [255] dispatching hogs: 4 cpu, 4 io, 2 vm, 0 hdd
[    8.070280] SError Interrupt on CPU4, code 0xbf000000 -- SError
[    8.070286] CPU: 4 PID: 261 Comm: stress Not tainted 5.10.21 #1
[    8.070289] Hardware name: FriendlyElec NanoPi M4 (DT)
[    8.070293] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[    8.070296] pc : clear_page+0x14/0x28
[    8.070298] lr : clear_subpage+0x50/0x90
[    8.070302] sp : ffff800012abbc40
[    8.070305] x29: ffff800012abbc40 x28: ffff000000f68000 
[    8.070313] x27: 0000000000000000 x26: ffff000001f38e40 
[    8.070320] x25: ffff8000114fd000 x24: 0000000000000000 
[    8.070326] x23: 0000000000000000 x22: 0000000000001000 
[    8.070334] x21: 0000ffffa7e00000 x20: fffffe0000010000 
[    8.070341] x19: ffff000000f68000 x18: 0000000000000000 
[    8.070348] x17: 0000000000000000 x16: 0000000000000000 
[    8.070354] x15: 0000000000000002 x14: 0000000000000001 
[    8.070361] x13: 0000000000075879 x12: 00000000000000c0 
[    8.070368] x11: ffff80006c46a000 x10: 0000000000000200 
[    8.070374] x9 : 0000000000000000 x8 : 0000000000000010 
[    8.070381] x7 : ffff00007db800a0 x6 : ffff800011b899c0 
[    8.070387] x5 : 0000000000000000 x4 : ffff00007db800f7 
[    8.070394] x3 : 0000020000200000 x2 : 0000000000000004 
[    8.070401] x1 : 0000000000000040 x0 : ffff0000085ff4c0 
[    8.070409] Kernel panic - not syncing: Asynchronous SError Interrupt
[    8.070412] CPU: 4 PID: 261 Comm: stress Not tainted 5.10.21 #1
[    8.070415] Hardware name: FriendlyElec NanoPi M4 (DT)
[    8.070418] Call trace:
[    8.070420]  dump_backtrace+0x0/0x1b0
[    8.070423]  show_stack+0x18/0x70
[    8.070425]  dump_stack+0xd0/0x12c
[    8.070428]  panic+0x16c/0x334
[    8.070430]  nmi_panic+0x8c/0x90
[    8.070433]  arm64_serror_panic+0x78/0x84
[    8.070435]  do_serror+0x64/0x70
[    8.070437]  el1_error+0x88/0x108
[    8.070440]  clear_page+0x14/0x28
[    8.070443]  clear_huge_page+0x74/0x210
[    8.070445]  do_huge_pmd_anonymous_page+0x1b0/0x7c0
[    8.070448]  handle_mm_fault+0xdac/0x1290
[    8.070451]  do_page_fault+0x130/0x3a0
[    8.070453]  do_translation_fault+0xb0/0xc0
[    8.070456]  do_mem_abort+0x44/0xb0
[    8.070458]  el0_da+0x28/0x40
[    8.070461]  el0_sync_handler+0x168/0x1b0
[    8.070464]  el0_sync+0x174/0x180
[    8.070508] SError Interrupt on CPU0, code 0xbf000000 -- SError
[    8.070511] CPU: 0 PID: 258 Comm: stress Not tainted 5.10.21 #1
[    8.070515] Hardware name: FriendlyElec NanoPi M4 (DT)
[    8.070518] pstate: 80000000 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[    8.070520] pc : 0000aaaacec22e98
[    8.070523] lr : 0000aaaacec22d84
[    8.070525] sp : 0000ffffe67a8620
[    8.070528] x29: 0000ffffe67a8620 x28: 0000000000000003 
[    8.070534] x27: 0000aaaacec34000 x26: 0000ffffaeb42610 
[    8.070541] x25: 0000ffffa69af010 x24: 0000aaaacec23a98 
[    8.070547] x23: 0000aaaacec35010 x22: 0000aaaacec35000 
[    8.070554] x21: 0000000000001000 x20: ffffffffffffffff 
[    8.070560] x19: 0000000008000000 x18: 0000000000000000 
[    8.070567] x17: 0000000000000000 x16: 0000000000000000 
[    8.070573] x15: 0000000000000000 x14: 0000000000000000 
[    8.070580] x13: 0000000000008000 x12: 0000000000000000 
[    8.070587] x11: 0000000000000020 x10: 0000000000000030 
[    8.070593] x9 : 000000000000000a x8 : 00000000000000de 
[    8.070599] x7 : 0000000000200000 x6 : 000000000000021b 
[    8.070606] x5 : 0000000000000000 x4 : ffffffffffffffff 
[    8.070613] x3 : 0000000000000000 x2 : 0000ffffaeb47000 
[    8.070619] x1 : 000000000000005a x0 : 0000000000a58000 
[    8.070629] SMP: stopping secondary CPUs
[    8.070632] Kernel Offset: disabled
[    8.070634] CPU features: 0x0240022,6100600c
[    8.070637] Memory Limit: none
Robin Murphy March 1, 2022, 10:56 a.m. UTC | #3
Hi Matt,

On 2022-03-01 03:00, Matt Flax wrote:
> Hi All,
> 
> It seems that the ZONE_DMA changes have broken the operation of Rochip rk3399 chipsets from v5.10.22 onwards.
> 
> It isn't clear what needs to be changed to get any of these boards up and running again. Any pointers on how/what to change ?

Your firmware/bootloader setup is mismatched. If you're using the 
downstream Rockchip blob for BL31, you need to reserve or remove the 
memory range 0x8400000-0x9600000 to match the behaviour of the original 
Android BSP U-Boot. The downstream firmware firewalls this memory off 
for the Secure world such that any attempt to touch it from Linux 
results in a fatal SError fault as below. Any apparent correlation with 
the ZONE_DMA changes will simply be because they've affected the 
behaviour of the page allocator, such that it's more likely to reach 
into the affected range of memory.

Cheers,
Robin.

> An easy test for debugging is to run stress :
> 
> stress --cpu 4 --io 4 --vm 2 --vm-bytes 128M
> 
> stress: info: [255] dispatching hogs: 4 cpu, 4 io, 2 vm, 0 hdd
> [    8.070280] SError Interrupt on CPU4, code 0xbf000000 -- SError
> [    8.070286] CPU: 4 PID: 261 Comm: stress Not tainted 5.10.21 #1
> [    8.070289] Hardware name: FriendlyElec NanoPi M4 (DT)
> [    8.070293] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
> [    8.070296] pc : clear_page+0x14/0x28
> [    8.070298] lr : clear_subpage+0x50/0x90
> [    8.070302] sp : ffff800012abbc40
> [    8.070305] x29: ffff800012abbc40 x28: ffff000000f68000
> [    8.070313] x27: 0000000000000000 x26: ffff000001f38e40
> [    8.070320] x25: ffff8000114fd000 x24: 0000000000000000
> [    8.070326] x23: 0000000000000000 x22: 0000000000001000
> [    8.070334] x21: 0000ffffa7e00000 x20: fffffe0000010000
> [    8.070341] x19: ffff000000f68000 x18: 0000000000000000
> [    8.070348] x17: 0000000000000000 x16: 0000000000000000
> [    8.070354] x15: 0000000000000002 x14: 0000000000000001
> [    8.070361] x13: 0000000000075879 x12: 00000000000000c0
> [    8.070368] x11: ffff80006c46a000 x10: 0000000000000200
> [    8.070374] x9 : 0000000000000000 x8 : 0000000000000010
> [    8.070381] x7 : ffff00007db800a0 x6 : ffff800011b899c0
> [    8.070387] x5 : 0000000000000000 x4 : ffff00007db800f7
> [    8.070394] x3 : 0000020000200000 x2 : 0000000000000004
> [    8.070401] x1 : 0000000000000040 x0 : ffff0000085ff4c0
> [    8.070409] Kernel panic - not syncing: Asynchronous SError Interrupt
> [    8.070412] CPU: 4 PID: 261 Comm: stress Not tainted 5.10.21 #1
> [    8.070415] Hardware name: FriendlyElec NanoPi M4 (DT)
> [    8.070418] Call trace:
> [    8.070420]  dump_backtrace+0x0/0x1b0
> [    8.070423]  show_stack+0x18/0x70
> [    8.070425]  dump_stack+0xd0/0x12c
> [    8.070428]  panic+0x16c/0x334
> [    8.070430]  nmi_panic+0x8c/0x90
> [    8.070433]  arm64_serror_panic+0x78/0x84
> [    8.070435]  do_serror+0x64/0x70
> [    8.070437]  el1_error+0x88/0x108
> [    8.070440]  clear_page+0x14/0x28
> [    8.070443]  clear_huge_page+0x74/0x210
> [    8.070445]  do_huge_pmd_anonymous_page+0x1b0/0x7c0
> [    8.070448]  handle_mm_fault+0xdac/0x1290
> [    8.070451]  do_page_fault+0x130/0x3a0
> [    8.070453]  do_translation_fault+0xb0/0xc0
> [    8.070456]  do_mem_abort+0x44/0xb0
> [    8.070458]  el0_da+0x28/0x40
> [    8.070461]  el0_sync_handler+0x168/0x1b0
> [    8.070464]  el0_sync+0x174/0x180
> [    8.070508] SError Interrupt on CPU0, code 0xbf000000 -- SError
> [    8.070511] CPU: 0 PID: 258 Comm: stress Not tainted 5.10.21 #1
> [    8.070515] Hardware name: FriendlyElec NanoPi M4 (DT)
> [    8.070518] pstate: 80000000 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [    8.070520] pc : 0000aaaacec22e98
> [    8.070523] lr : 0000aaaacec22d84
> [    8.070525] sp : 0000ffffe67a8620
> [    8.070528] x29: 0000ffffe67a8620 x28: 0000000000000003
> [    8.070534] x27: 0000aaaacec34000 x26: 0000ffffaeb42610
> [    8.070541] x25: 0000ffffa69af010 x24: 0000aaaacec23a98
> [    8.070547] x23: 0000aaaacec35010 x22: 0000aaaacec35000
> [    8.070554] x21: 0000000000001000 x20: ffffffffffffffff
> [    8.070560] x19: 0000000008000000 x18: 0000000000000000
> [    8.070567] x17: 0000000000000000 x16: 0000000000000000
> [    8.070573] x15: 0000000000000000 x14: 0000000000000000
> [    8.070580] x13: 0000000000008000 x12: 0000000000000000
> [    8.070587] x11: 0000000000000020 x10: 0000000000000030
> [    8.070593] x9 : 000000000000000a x8 : 00000000000000de
> [    8.070599] x7 : 0000000000200000 x6 : 000000000000021b
> [    8.070606] x5 : 0000000000000000 x4 : ffffffffffffffff
> [    8.070613] x3 : 0000000000000000 x2 : 0000ffffaeb47000
> [    8.070619] x1 : 000000000000005a x0 : 0000000000a58000
> [    8.070629] SMP: stopping secondary CPUs
> [    8.070632] Kernel Offset: disabled
> [    8.070634] CPU features: 0x0240022,6100600c
> [    8.070637] Memory Limit: none
> 
>