diff mbox series

IMX8MM kernel panic on 5.5+ due to patch series 'Raspberry Pi 4 DMA addressing support'

Message ID CAJ+vNU0x+Dd67thRXABKG1AmJW6Babs_XE2hG01yuV3L9meuWA@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series IMX8MM kernel panic on 5.5+ due to patch series 'Raspberry Pi 4 DMA addressing support' | expand

Commit Message

Tim Harvey April 22, 2020, 5:44 p.m. UTC
Greetings,

I'm seeing a kernel panic on an IMX8MM board using defconfig starting
with the patch series 'Raspberry Pi 4 DMA addressing support':

734f924 mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'
1a8e1ce arm64: use both ZONE_DMA and ZONE_DMA32
a573cdd arm64: rename variables used to calculate ZONE_DMA32's size
ae970dc arm64: mm: use arm64_dma_phys_limit instead of calling
max_zone_dma_phys()

Strangely I don't see this panic on an ARM64 OcteonTX CPU (thunderx)
with defconfig so perhaps this has to do with some dt thing?

I find that a573cdd ("arm64: rename variables used to calculate
ZONE_DMA32's size") breaks building arm64 defconfig due to renaming of
arm64_dma_phys_limit to arm64_dma32_phys_limit but
arm64_dma_phys_limit still used in includ/asm/processor.h

The following patch resolves this build error and panic:

Anyone know why this isn't affecting all ARM64?

here is the panic:
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.7.0-rc2-00001-gcd0ff1c-dirty
(tharvey@tharvey) (gcc version 8.4.0 (Buildroot 2020.02.1), GNU ld
(GNU Binutils) 2.32) #31 SMP PREEMPT Tue Apr 21 11:10:32 PDT 2020
[    0.000000] Machine model: Gateworks Venice i.MX8MM board
[    0.000000] earlycon: ec_imx6q0 at MMIO 0x0000000030890000 (options '115200')
[    0.000000] printk: bootconsole [ec_imx6q0] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 32 MiB at 0x00000000be000000
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem
0x0000000040000000-0x00000000bfffffff]
[    0.000000] NUMA: NODE_DATA [mem 0xbdbd6100-0xbdbd7fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000007fffffff]
[    0.000000]   DMA32    [mem 0x0000000080000000-0x00000000bfffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff]
[    0.000000] On node 0 totalpages: 524288
[    0.000000]   DMA zone: 4096 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 262144 pages, LIFO batch:63
[    0.000000]   DMA32 zone: 4096 pages used for memmap
[    0.000000]   DMA32 zone: 262144 pages, LIFO batch:63
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.1
[    0.000000] percpu: Embedded 23 pages/cpu s53784 r8192 d32232 u94208
[    0.000000] pcpu-alloc: s53784 r8192 d32232 u94208 alloc=23*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: ARM erratum 845719
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] Speculative Store Bypass Disable mitigation not required
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 516096
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: console=ttymxc1,115200
earlycon=ec_imx6q,0x30890000,115200 debug
[    0.000000] Dentry cache hash table entries: 262144 (order: 9,
2097152 bytes, linear)
[    0.000000] Inode-cache hash table entries: 131072 (order: 8,
1048576 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Unable to handle kernel paging request at virtual
address ffff00003de10000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000047
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000047
[    0.000000]   CM = 0, WnR = 1
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000416e2000
[    0.000000] [ffff00003de10000] pgd=00000000bdff8003
[    0.000000] Unable to handle kernel paging request at virtual
address ffff00007dff8000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000007
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000007
[    0.000000]   CM = 0, WnR = 0
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000416e2000
[    0.000000] [ffff00007dff8000] pgd=00000000bdff8003
[    0.000000] Unable to handle kernel paging request at virtual
address ffff00007dff8008
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000007
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000007
[    0.000000]   CM = 0, WnR = 0
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000416e2000
[    0.000000] [ffff00007dff8008] pgd=00000000bdff8003
[    0.000000] Unable to handle kernel paging request at virtual
address ffff00007dff8008
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000007
...
[    0.000000] Kernel panic - not syncing: kernel stack overflow
[    0.000000] ---[ end Kernel panic - not syncing: kernel stack overflow ]---

Best Regards,

Tim

Comments

Catalin Marinas April 23, 2020, 11:33 a.m. UTC | #1
On Wed, Apr 22, 2020 at 10:44:33AM -0700, Tim Harvey wrote:
> I'm seeing a kernel panic on an IMX8MM board using defconfig starting
> with the patch series 'Raspberry Pi 4 DMA addressing support':
> 
> 734f924 mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'
> 1a8e1ce arm64: use both ZONE_DMA and ZONE_DMA32
> a573cdd arm64: rename variables used to calculate ZONE_DMA32's size
> ae970dc arm64: mm: use arm64_dma_phys_limit instead of calling
> max_zone_dma_phys()
> 
> Strangely I don't see this panic on an ARM64 OcteonTX CPU (thunderx)
> with defconfig so perhaps this has to do with some dt thing?
> 
> I find that a573cdd ("arm64: rename variables used to calculate
> ZONE_DMA32's size") breaks building arm64 defconfig due to renaming of
> arm64_dma_phys_limit to arm64_dma32_phys_limit but
> arm64_dma_phys_limit still used in includ/asm/processor.h
> 
> The following patch resolves this build error and panic:

So it means that commit 1a8e1ce causes the break for you. I haven't seen
this problem on any other platform yet. A wrong DT could as well cause
problems as this commit would change where some memory allocations come
from.

Can you run the kernel with memtest=1, just in case the DT is wrongly
pointing to some non-RAM areas.
Tim Harvey April 23, 2020, 3:40 p.m. UTC | #2
On Thu, Apr 23, 2020 at 4:33 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Apr 22, 2020 at 10:44:33AM -0700, Tim Harvey wrote:
> > I'm seeing a kernel panic on an IMX8MM board using defconfig starting
> > with the patch series 'Raspberry Pi 4 DMA addressing support':
> >
> > 734f924 mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'
> > 1a8e1ce arm64: use both ZONE_DMA and ZONE_DMA32
> > a573cdd arm64: rename variables used to calculate ZONE_DMA32's size
> > ae970dc arm64: mm: use arm64_dma_phys_limit instead of calling
> > max_zone_dma_phys()
> >
> > Strangely I don't see this panic on an ARM64 OcteonTX CPU (thunderx)
> > with defconfig so perhaps this has to do with some dt thing?
> >
> > I find that a573cdd ("arm64: rename variables used to calculate
> > ZONE_DMA32's size") breaks building arm64 defconfig due to renaming of
> > arm64_dma_phys_limit to arm64_dma32_phys_limit but
> > arm64_dma_phys_limit still used in includ/asm/processor.h
> >
> > The following patch resolves this build error and panic:
>
> So it means that commit 1a8e1ce causes the break for you. I haven't seen
> this problem on any other platform yet. A wrong DT could as well cause
> problems as this commit would change where some memory allocations come
> from.
>
> Can you run the kernel with memtest=1, just in case the DT is wrongly
> pointing to some non-RAM areas.
>

Catalin,

Oh gosh... I had my mem size wrongly doubled and that was the cause of
the issue!

Thanks for the tip and sorry for the noise!

Tim
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/processor.h
b/arch/arm64/include/asm/processor.h
index 5623685..9057495 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,8 +90,8 @@ 
                                        base)
 #endif /* CONFIG_ARM64_FORCE_52BIT */

-extern phys_addr_t arm64_dma_phys_limit;
-#define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)
+extern phys_addr_t arm64_dma32_phys_limit;
+#define ARCH_LOW_ADDRESS_LIMIT (arm64_dma32_phys_limit - 1)

 struct debug_info {
 #ifdef CONFIG_HAVE_HW_BREAKPOINT