diff mbox series

crashkernel=512M is no longer working on this aarch64 server

Message ID 1A7E2E89-34DB-41A0-BBA2-323073A7E298@gmx.us (mailing list archive)
State New, archived
Headers show
Series crashkernel=512M is no longer working on this aarch64 server | expand

Commit Message

Qian Cai Nov. 11, 2018, 4:41 a.m. UTC
It was broken somewhere between b00d209241ff and 3541833fd1f2.

[    0.000000] cannot allocate crashkernel (size:0x20000000)

Where a good one looks like this,

[    0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)

Some commits look more suspicious than others.

      mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
      mm: introduce mm_[p4d|pud|pmd]_folded
      mm: make the __PAGETABLE_PxD_FOLDED defines non-empty

# diff -u ../iomem.good.txt ../iomem.bad.txt 

The memory map looks like this,

[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000398D0014 000024 (v02 HISI  )
[    0.000000] ACPI: XSDT 0x00000000398C00E8 000064 (v01 HISI   HIP07    00000000      01000013)
[    0.000000] ACPI: FACP 0x0000000039770000 000114 (v06 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: DSDT 0x0000000039730000 00691A (v02 HISI   HIP07    00000000 INTL 20170728)
[    0.000000] ACPI: MCFG 0x00000000397C0000 0000AC (v01 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: SLIT 0x00000000397B0000 00003C (v01 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: SRAT 0x00000000397A0000 000578 (v03 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: DBG2 0x0000000039790000 00005A (v00 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: GTDT 0x0000000039760000 00007C (v02 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: APIC 0x0000000039750000 0014E4 (v04 HISI   HIP07    00000000 INTL 20151124)
[    0.000000] ACPI: IORT 0x0000000039740000 000554 (v00 HISI   HIP07    00000000 INTL 20170728)
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x1800000000-0x1fffffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x17ffffffff]
[    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x9000000000-0x97ffffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x8800000000-0x8fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x17fbffe5c0-0x17fbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe5c0-0x1ffbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x8ffbffe5c0-0x8ffbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x97fadce5c0-0x97fadcffff]
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000000000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000097fbffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000003965ffff]
[    0.000000]   node   0: [mem 0x0000000039660000-0x00000000396fffff]
[    0.000000]   node   0: [mem 0x0000000039700000-0x000000003977ffff]
[    0.000000]   node   0: [mem 0x0000000039780000-0x000000003978ffff]
[    0.000000]   node   0: [mem 0x0000000039790000-0x00000000397cffff]
[    0.000000]   node   0: [mem 0x00000000397d0000-0x00000000398bffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x00000000398dffff]
[    0.000000]   node   0: [mem 0x00000000398e0000-0x0000000039d5ffff]
[    0.000000]   node   0: [mem 0x0000000039d60000-0x000000003ed4ffff]
[    0.000000]   node   0: [mem 0x000000003ed50000-0x000000003ed7ffff]
[    0.000000]   node   0: [mem 0x000000003ed80000-0x000000003fbfffff]
[    0.000000]   node   0: [mem 0x0000001040000000-0x00000017fbffffff]
[    0.000000]   node   1: [mem 0x0000001800000000-0x0000001ffbffffff]
[    0.000000]   node   2: [mem 0x0000008800000000-0x0000008ffbffffff]
[    0.000000]   node   3: [mem 0x0000009000000000-0x00000097fbffffff]

Comments

Martin Schwidefsky Nov. 11, 2018, 11:35 a.m. UTC | #1
On Sat, 10 Nov 2018 23:41:34 -0500
Qian Cai <cai@gmx.us> wrote:

> It was broken somewhere between b00d209241ff and 3541833fd1f2.
> 
> [    0.000000] cannot allocate crashkernel (size:0x20000000)
> 
> Where a good one looks like this,
> 
> [    0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
> 
> Some commits look more suspicious than others.
> 
>       mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
>       mm: introduce mm_[p4d|pud|pmd]_folded
>       mm: make the __PAGETABLE_PxD_FOLDED defines non-empty

The intent of these three patches is to add extra checks to the
pgtable_bytes accounting function. If applied incorrectly the expected
result would be warnings like this:
  BUG: non-zero pgtables_bytes on freeing mm: 16384

The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
These defines are used with #ifdef, #ifndef, and __is_defined() for the
new mm_p?d_folded() macros. I can not see how this would make a difference
for your iomem setup.

> # diff -u ../iomem.good.txt ../iomem.bad.txt 
> --- ../iomem.good.txt	2018-11-10 22:28:20.092614398 -0500
> +++ ../iomem.bad.txt	2018-11-10 20:39:54.930294479 -0500
> @@ -1,9 +1,8 @@
>  00000000-3965ffff : System RAM
>    00080000-018cffff : Kernel code
> -  018d0000-020affff : reserved
> -  020b0000-045affff : Kernel data
> -  08600000-285fffff : Crash kernel
> -  28730000-2d5affff : reserved
> +  018d0000-0762ffff : reserved
> +  07630000-09b2ffff : Kernel data
> +  231b0000-2802ffff : reserved
>    30ec0000-30ecffff : reserved
>    35660000-3965ffff : reserved
>  39660000-396fffff : reserved
> @@ -127,7 +126,7 @@
>    7c5200000-7c520ffff : 0004:48:00.0
>  1040000000-17fbffffff : System RAM
>    13fbfd0000-13fdfdffff : reserved
> -  16fba80000-17fbfdffff : reserved
> +  16fafd0000-17fbfdffff : reserved
>    17fbfe0000-17fbffffff : reserved
>  1800000000-1ffbffffff : System RAM
>    1bfbff0000-1bfdfeffff : reserved

The easiest way to verify if the three commits have something to do with your
problem is to revert them and run your test. Can you do that please ?
Qian Cai Nov. 11, 2018, 1:36 p.m. UTC | #2
> On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> On Sat, 10 Nov 2018 23:41:34 -0500
> Qian Cai <cai@gmx.us> wrote:
> 
>> It was broken somewhere between b00d209241ff and 3541833fd1f2.
>> 
>> [    0.000000] cannot allocate crashkernel (size:0x20000000)
>> 
>> Where a good one looks like this,
>> 
>> [    0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
>> 
>> Some commits look more suspicious than others.
>> 
>>      mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
>>      mm: introduce mm_[p4d|pud|pmd]_folded
>>      mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
> 
> The intent of these three patches is to add extra checks to the
> pgtable_bytes accounting function. If applied incorrectly the expected
> result would be warnings like this:
>  BUG: non-zero pgtables_bytes on freeing mm: 16384
> 
> The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> These defines are used with #ifdef, #ifndef, and __is_defined() for the
> new mm_p?d_folded() macros. I can not see how this would make a difference
> for your iomem setup.
> 
>> # diff -u ../iomem.good.txt ../iomem.bad.txt 
>> --- ../iomem.good.txt	2018-11-10 22:28:20.092614398 -0500
>> +++ ../iomem.bad.txt	2018-11-10 20:39:54.930294479 -0500
>> @@ -1,9 +1,8 @@
>> 00000000-3965ffff : System RAM
>>   00080000-018cffff : Kernel code
>> -  018d0000-020affff : reserved
>> -  020b0000-045affff : Kernel data
>> -  08600000-285fffff : Crash kernel
>> -  28730000-2d5affff : reserved
>> +  018d0000-0762ffff : reserved
>> +  07630000-09b2ffff : Kernel data
>> +  231b0000-2802ffff : reserved
>>   30ec0000-30ecffff : reserved
>>   35660000-3965ffff : reserved
>> 39660000-396fffff : reserved
>> @@ -127,7 +126,7 @@
>>   7c5200000-7c520ffff : 0004:48:00.0
>> 1040000000-17fbffffff : System RAM
>>   13fbfd0000-13fdfdffff : reserved
>> -  16fba80000-17fbfdffff : reserved
>> +  16fafd0000-17fbfdffff : reserved
>>   17fbfe0000-17fbffffff : reserved
>> 1800000000-1ffbffffff : System RAM
>>   1bfbff0000-1bfdfeffff : reserved
> 
> The easiest way to verify if the three commits have something to do with your
> problem is to revert them and run your test. Can you do that please ?
Yes, you are right. Those commits have nothing to do with the problem. I should
realized it earlier as those are virtual memory vs physical memory. Sorry for the
nosie.

It turned out I made a wrong assumption that if kmemleak is disabled by default,
there should be no memory reserved for kmemleak at all which is not the case.

CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y

Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
which causes not enough memory for crashkernel.

Since there seems no way to turn kmemleak on later after boot, is there any
reasons for the current behavior?
Martin Schwidefsky Nov. 12, 2018, 6:01 a.m. UTC | #3
On Sun, 11 Nov 2018 08:36:09 -0500
Qian Cai <cai@gmx.us> wrote:

> > On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> > 
> > On Sat, 10 Nov 2018 23:41:34 -0500
> > Qian Cai <cai@gmx.us> wrote:
> >   
> >> It was broken somewhere between b00d209241ff and 3541833fd1f2.
> >> 
> >> [    0.000000] cannot allocate crashkernel (size:0x20000000)
> >> 
> >> Where a good one looks like this,
> >> 
> >> [    0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
> >> 
> >> Some commits look more suspicious than others.
> >> 
> >>      mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
> >>      mm: introduce mm_[p4d|pud|pmd]_folded
> >>      mm: make the __PAGETABLE_PxD_FOLDED defines non-empty  
> > 
> > The intent of these three patches is to add extra checks to the
> > pgtable_bytes accounting function. If applied incorrectly the expected
> > result would be warnings like this:
> >  BUG: non-zero pgtables_bytes on freeing mm: 16384
> > 
> > The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> > These defines are used with #ifdef, #ifndef, and __is_defined() for the
> > new mm_p?d_folded() macros. I can not see how this would make a difference
> > for your iomem setup.
> >   
> >> # diff -u ../iomem.good.txt ../iomem.bad.txt 
> >> --- ../iomem.good.txt	2018-11-10 22:28:20.092614398 -0500
> >> +++ ../iomem.bad.txt	2018-11-10 20:39:54.930294479 -0500
> >> @@ -1,9 +1,8 @@
> >> 00000000-3965ffff : System RAM
> >>   00080000-018cffff : Kernel code
> >> -  018d0000-020affff : reserved
> >> -  020b0000-045affff : Kernel data
> >> -  08600000-285fffff : Crash kernel
> >> -  28730000-2d5affff : reserved
> >> +  018d0000-0762ffff : reserved
> >> +  07630000-09b2ffff : Kernel data
> >> +  231b0000-2802ffff : reserved
> >>   30ec0000-30ecffff : reserved
> >>   35660000-3965ffff : reserved
> >> 39660000-396fffff : reserved
> >> @@ -127,7 +126,7 @@
> >>   7c5200000-7c520ffff : 0004:48:00.0
> >> 1040000000-17fbffffff : System RAM
> >>   13fbfd0000-13fdfdffff : reserved
> >> -  16fba80000-17fbfdffff : reserved
> >> +  16fafd0000-17fbfdffff : reserved
> >>   17fbfe0000-17fbffffff : reserved
> >> 1800000000-1ffbffffff : System RAM
> >>   1bfbff0000-1bfdfeffff : reserved  
> > 
> > The easiest way to verify if the three commits have something to do with your
> > problem is to revert them and run your test. Can you do that please ?  
> Yes, you are right. Those commits have nothing to do with the problem. I should
> realized it earlier as those are virtual memory vs physical memory. Sorry for the
> nosie.
> 
> It turned out I made a wrong assumption that if kmemleak is disabled by default,
> there should be no memory reserved for kmemleak at all which is not the case.
> 
> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
> CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
> 
> Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
> which causes not enough memory for crashkernel.
> 
> Since there seems no way to turn kmemleak on later after boot, is there any
> reasons for the current behavior? 

Well seems like you do have CONFIG_DEBUG_KMEMLEAK=y in your config. The code
contains data structures for the case that you want to use the kmemleak checker.
The presence of these structures will change the sizes. The last commit in regard
to the 'early_log' buffer has been from 2009 with this change:

@@ -232,8 +232,9 @@ struct early_log {
 };
 
 /* early logging buffer and current position */
-static struct early_log early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE];
-static int crt_early_log;
+static struct early_log
+       early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata;
+static int crt_early_log __initdata;
 
 static void kmemleak_disable(void);
 
The current behavior is imho nothing new.

Would it be possible to disable CONFIG_DEBUG_KMEMLEAK for your kdump kernel?
That seems like the simplest solution.
Qian Cai Nov. 12, 2018, 12:29 p.m. UTC | #4
On 11/12/18 at 1:01 AM, Martin Schwidefsky wrote:

> On Sun, 11 Nov 2018 08:36:09 -0500
> Qian Cai <cai@gmx.us> wrote:
> 
> > > On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> > > 
> > > On Sat, 10 Nov 2018 23:41:34 -0500
> > > Qian Cai <cai@gmx.us> wrote:
> > >   
> > >> It was broken somewhere between b00d209241ff and 3541833fd1f2.
> > >> 
> > >> [    0.000000] cannot allocate crashkernel (size:0x20000000)
> > >> 
> > >> Where a good one looks like this,
> > >> 
> > >> [    0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
> > >> 
> > >> Some commits look more suspicious than others.
> > >> 
> > >>      mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
> > >>      mm: introduce mm_[p4d|pud|pmd]_folded
> > >>      mm: make the __PAGETABLE_PxD_FOLDED defines non-empty  
> > > 
> > > The intent of these three patches is to add extra checks to the
> > > pgtable_bytes accounting function. If applied incorrectly the expected
> > > result would be warnings like this:
> > >  BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > 
> > > The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> > > These defines are used with #ifdef, #ifndef, and __is_defined() for the
> > > new mm_p?d_folded() macros. I can not see how this would make a difference
> > > for your iomem setup.
> > >   
> > >> # diff -u ../iomem.good.txt ../iomem.bad.txt 
> > >> --- ../iomem.good.txt	2018-11-10 22:28:20.092614398 -0500
> > >> +++ ../iomem.bad.txt	2018-11-10 20:39:54.930294479 -0500
> > >> @@ -1,9 +1,8 @@
> > >> 00000000-3965ffff : System RAM
> > >>   00080000-018cffff : Kernel code
> > >> -  018d0000-020affff : reserved
> > >> -  020b0000-045affff : Kernel data
> > >> -  08600000-285fffff : Crash kernel
> > >> -  28730000-2d5affff : reserved
> > >> +  018d0000-0762ffff : reserved
> > >> +  07630000-09b2ffff : Kernel data
> > >> +  231b0000-2802ffff : reserved
> > >>   30ec0000-30ecffff : reserved
> > >>   35660000-3965ffff : reserved
> > >> 39660000-396fffff : reserved
> > >> @@ -127,7 +126,7 @@
> > >>   7c5200000-7c520ffff : 0004:48:00.0
> > >> 1040000000-17fbffffff : System RAM
> > >>   13fbfd0000-13fdfdffff : reserved
> > >> -  16fba80000-17fbfdffff : reserved
> > >> +  16fafd0000-17fbfdffff : reserved
> > >>   17fbfe0000-17fbffffff : reserved
> > >> 1800000000-1ffbffffff : System RAM
> > >>   1bfbff0000-1bfdfeffff : reserved  
> > > 
> > > The easiest way to verify if the three commits have something to do with your
> > > problem is to revert them and run your test. Can you do that please ?  
> > Yes, you are right. Those commits have nothing to do with the problem. I should
> > realized it earlier as those are virtual memory vs physical memory. Sorry for the
> > nosie.
> > 
> > It turned out I made a wrong assumption that if kmemleak is disabled by default,
> > there should be no memory reserved for kmemleak at all which is not the case.
> > 
> > CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
> > CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
> > 
> > Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
> > which causes not enough memory for crashkernel.
> > 
> > Since there seems no way to turn kmemleak on later after boot, is there any
> > reasons for the current behavior? 
> 
> Well seems like you do have CONFIG_DEBUG_KMEMLEAK=y in your config. The code
> contains data structures for the case that you want to use the kmemleak checker.
> The presence of these structures will change the sizes. The last commit in regard
> to the 'early_log' buffer has been from 2009 with this change:
> 
> @@ -232,8 +232,9 @@ struct early_log {
>  };
>  
>  /* early logging buffer and current position */
> -static struct early_log early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE];
> -static int crt_early_log;
> +static struct early_log
> +       early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata;
> +static int crt_early_log __initdata;
>  
>  static void kmemleak_disable(void);
>  
> The current behavior is imho nothing new.
> 
> Would it be possible to disable CONFIG_DEBUG_KMEMLEAK for your kdump kernel?
> That seems like the simplest solution.
Ah, okay. Those are static memory allocations 
regardless of the kmemleak runtime setting.

The problem is that it has to disable kmemleak entirely 
and re-compile the kernel for the first-kernel as well, 
as crashkernel reservation happens in the first-kernel.

Hence, it loses flexibility to enable kmemleak during
boot time as well. I can live with it, although it does
not seem ideal.
diff mbox series

Patch

--- ../iomem.good.txt	2018-11-10 22:28:20.092614398 -0500
+++ ../iomem.bad.txt	2018-11-10 20:39:54.930294479 -0500
@@ -1,9 +1,8 @@ 
 00000000-3965ffff : System RAM
   00080000-018cffff : Kernel code
-  018d0000-020affff : reserved
-  020b0000-045affff : Kernel data
-  08600000-285fffff : Crash kernel
-  28730000-2d5affff : reserved
+  018d0000-0762ffff : reserved
+  07630000-09b2ffff : Kernel data
+  231b0000-2802ffff : reserved
   30ec0000-30ecffff : reserved
   35660000-3965ffff : reserved
 39660000-396fffff : reserved
@@ -127,7 +126,7 @@ 
   7c5200000-7c520ffff : 0004:48:00.0
 1040000000-17fbffffff : System RAM
   13fbfd0000-13fdfdffff : reserved
-  16fba80000-17fbfdffff : reserved
+  16fafd0000-17fbfdffff : reserved
   17fbfe0000-17fbffffff : reserved
 1800000000-1ffbffffff : System RAM
   1bfbff0000-1bfdfeffff : reserved