diff mbox series

[v3,2/3] crash: Fix x86_32 crash memory reserve dead loop bug at high

Message ID 20240718035444.2977105-3-ruanjinjie@huawei.com (mailing list archive)
State Superseded
Headers show
Series crash: Fix x86_32 memory reserve dead loop bug | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-2-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-2-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-2-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-2-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-2-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-2-test-6 success .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-2-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-2-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-2-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-2-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-2-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-2-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Jinjie Ruan July 18, 2024, 3:54 a.m. UTC
On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will
also cause system stall as below:

	ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b]
	ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7]
	ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f]
	ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb]
	ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3]
	ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b]
	143MB HIGHMEM available.
	879MB LOWMEM available.
	  mapped low ram: 0 - 36ffe000
	  low ram: 0 - 36ffe000
	  (stall here)

The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX
on x86_32, the first "low" crash kernel memory reservation for 512M fails,
then it go into the "retry" loop and never came out as below (consider
CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M):

-> reserve_crashkernel_generic() and high is false
   -> alloc at [0, 0x20000000] fail
      -> alloc at [0x20000000, 0x20000000] fail and repeatedly
      (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).

Fix it by skipping meaningless calls of memblock_phys_alloc_range() with
`start = end`

After this patch, the retry dead loop is avoided and print below info:
	cannot allocate crashkernel (size:0x20000000)

And apply generic crashkernel reservation to 32bit system will be ready.

Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
v3:
- Fix it as Baoquan suggested.
- Update the commit message.
---
 kernel/crash_reserve.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Baoquan He July 18, 2024, 11:14 a.m. UTC | #1
On 07/18/24 at 11:54am, Jinjie Ruan wrote:

I don't fully catch the subject, what does the 'dead loop bug at high'
mean?

> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will
> also cause system stall as below:
> 
> 	ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b]
> 	ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7]
> 	ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f]
> 	ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb]
> 	ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3]
> 	ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b]
> 	143MB HIGHMEM available.
> 	879MB LOWMEM available.
> 	  mapped low ram: 0 - 36ffe000
> 	  low ram: 0 - 36ffe000
> 	  (stall here)
> 
> The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX
> on x86_32, the first "low" crash kernel memory reservation for 512M fails,
> then it go into the "retry" loop and never came out as below (consider
> CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M):
> 
> -> reserve_crashkernel_generic() and high is false
>    -> alloc at [0, 0x20000000] fail
>       -> alloc at [0x20000000, 0x20000000] fail and repeatedly
>       (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
> 
> Fix it by skipping meaningless calls of memblock_phys_alloc_range() with
> `start = end`
> 
> After this patch, the retry dead loop is avoided and print below info:
> 	cannot allocate crashkernel (size:0x20000000)
> 
> And apply generic crashkernel reservation to 32bit system will be ready.
> 
> Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code")
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Tested-by: Jinjie Ruan <ruanjinjie@huawei.com>

Also the tag issues, please update.

Other than above concerns, the patch looks good to me.

> ---
> v3:
> - Fix it as Baoquan suggested.
> - Update the commit message.
> ---
>  kernel/crash_reserve.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c
> index c5213f123e19..dacc268429e2 100644
> --- a/kernel/crash_reserve.c
> +++ b/kernel/crash_reserve.c
> @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline,
>  			search_end = CRASH_ADDR_HIGH_MAX;
>  			search_base = CRASH_ADDR_LOW_MAX;
>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
> -			goto retry;
> +			if (search_base != search_end)
> +				goto retry;
>  		}
>  
>  		/*
> -- 
> 2.34.1
>
Jinjie Ruan July 18, 2024, 12:10 p.m. UTC | #2
On 2024/7/18 19:14, Baoquan He wrote:
> On 07/18/24 at 11:54am, Jinjie Ruan wrote:
> 
> I don't fully catch the subject, what does the 'dead loop bug at high'
> mean?

It means alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] repeatedly
which corresponds to the crashkernel parameter of the "high".

> 
>> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will
>> also cause system stall as below:
>>
>> 	ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b]
>> 	ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7]
>> 	ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f]
>> 	ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb]
>> 	ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3]
>> 	ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b]
>> 	143MB HIGHMEM available.
>> 	879MB LOWMEM available.
>> 	  mapped low ram: 0 - 36ffe000
>> 	  low ram: 0 - 36ffe000
>> 	  (stall here)
>>
>> The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX
>> on x86_32, the first "low" crash kernel memory reservation for 512M fails,
>> then it go into the "retry" loop and never came out as below (consider
>> CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M):
>>
>> -> reserve_crashkernel_generic() and high is false
>>    -> alloc at [0, 0x20000000] fail
>>       -> alloc at [0x20000000, 0x20000000] fail and repeatedly
>>       (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
>>
>> Fix it by skipping meaningless calls of memblock_phys_alloc_range() with
>> `start = end`
>>
>> After this patch, the retry dead loop is avoided and print below info:
>> 	cannot allocate crashkernel (size:0x20000000)
>>
>> And apply generic crashkernel reservation to 32bit system will be ready.
>>
>> Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code")
>> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> Tested-by: Jinjie Ruan <ruanjinjie@huawei.com>
> 
> Also the tag issues, please update.
> 
> Other than above concerns, the patch looks good to me.

Thank you for your review, I'll fix it.

> 
>> ---
>> v3:
>> - Fix it as Baoquan suggested.
>> - Update the commit message.
>> ---
>>  kernel/crash_reserve.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c
>> index c5213f123e19..dacc268429e2 100644
>> --- a/kernel/crash_reserve.c
>> +++ b/kernel/crash_reserve.c
>> @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline,
>>  			search_end = CRASH_ADDR_HIGH_MAX;
>>  			search_base = CRASH_ADDR_LOW_MAX;
>>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>> -			goto retry;
>> +			if (search_base != search_end)
>> +				goto retry;
>>  		}
>>  
>>  		/*
>> -- 
>> 2.34.1
>>
> 
>
Baoquan He July 18, 2024, 2:42 p.m. UTC | #3
On 07/18/24 at 08:10pm, Jinjie Ruan wrote:
> 
> 
> On 2024/7/18 19:14, Baoquan He wrote:
> > On 07/18/24 at 11:54am, Jinjie Ruan wrote:
> > 
> > I don't fully catch the subject, what does the 'dead loop bug at high'
> > mean?
> 
> It means alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] repeatedly
> which corresponds to the crashkernel parameter of the "high".

That may mislead people to think it's a crashkernel=,high setting and
the corresponding issue.

Maybe "crash: Fix x86_32 crashkernel reservation dead loop" is good
enough.
diff mbox series

Patch

diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c
index c5213f123e19..dacc268429e2 100644
--- a/kernel/crash_reserve.c
+++ b/kernel/crash_reserve.c
@@ -414,7 +414,8 @@  void __init reserve_crashkernel_generic(char *cmdline,
 			search_end = CRASH_ADDR_HIGH_MAX;
 			search_base = CRASH_ADDR_LOW_MAX;
 			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
-			goto retry;
+			if (search_base != search_end)
+				goto retry;
 		}
 
 		/*