Message ID | 20240718035444.2977105-3-ruanjinjie@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | crash: Fix x86_32 memory reserve dead loop bug | expand |
On 07/18/24 at 11:54am, Jinjie Ruan wrote: I don't fully catch the subject, what does the 'dead loop bug at high' mean? > On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will > also cause system stall as below: > > ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] > ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] > ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f] > ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] > ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] > ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] > 143MB HIGHMEM available. > 879MB LOWMEM available. > mapped low ram: 0 - 36ffe000 > low ram: 0 - 36ffe000 > (stall here) > > The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX > on x86_32, the first "low" crash kernel memory reservation for 512M fails, > then it go into the "retry" loop and never came out as below (consider > CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M): > > -> reserve_crashkernel_generic() and high is false > -> alloc at [0, 0x20000000] fail > -> alloc at [0x20000000, 0x20000000] fail and repeatedly > (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). > > Fix it by skipping meaningless calls of memblock_phys_alloc_range() with > `start = end` > > After this patch, the retry dead loop is avoided and print below info: > cannot allocate crashkernel (size:0x20000000) > > And apply generic crashkernel reservation to 32bit system will be ready. > > Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code") > Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> > Signed-off-by: Baoquan He <bhe@redhat.com> > Tested-by: Jinjie Ruan <ruanjinjie@huawei.com> Also the tag issues, please update. Other than above concerns, the patch looks good to me. > --- > v3: > - Fix it as Baoquan suggested. > - Update the commit message. > --- > kernel/crash_reserve.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c > index c5213f123e19..dacc268429e2 100644 > --- a/kernel/crash_reserve.c > +++ b/kernel/crash_reserve.c > @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, > search_end = CRASH_ADDR_HIGH_MAX; > search_base = CRASH_ADDR_LOW_MAX; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > - goto retry; > + if (search_base != search_end) > + goto retry; > } > > /* > -- > 2.34.1 >
On 2024/7/18 19:14, Baoquan He wrote: > On 07/18/24 at 11:54am, Jinjie Ruan wrote: > > I don't fully catch the subject, what does the 'dead loop bug at high' > mean? It means alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] repeatedly which corresponds to the crashkernel parameter of the "high". > >> On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=512M" will >> also cause system stall as below: >> >> ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b] >> ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7] >> ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f] >> ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb] >> ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3] >> ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b] >> 143MB HIGHMEM available. >> 879MB LOWMEM available. >> mapped low ram: 0 - 36ffe000 >> low ram: 0 - 36ffe000 >> (stall here) >> >> The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX >> on x86_32, the first "low" crash kernel memory reservation for 512M fails, >> then it go into the "retry" loop and never came out as below (consider >> CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX = 512M): >> >> -> reserve_crashkernel_generic() and high is false >> -> alloc at [0, 0x20000000] fail >> -> alloc at [0x20000000, 0x20000000] fail and repeatedly >> (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). >> >> Fix it by skipping meaningless calls of memblock_phys_alloc_range() with >> `start = end` >> >> After this patch, the retry dead loop is avoided and print below info: >> cannot allocate crashkernel (size:0x20000000) >> >> And apply generic crashkernel reservation to 32bit system will be ready. >> >> Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code") >> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> >> Signed-off-by: Baoquan He <bhe@redhat.com> >> Tested-by: Jinjie Ruan <ruanjinjie@huawei.com> > > Also the tag issues, please update. > > Other than above concerns, the patch looks good to me. Thank you for your review, I'll fix it. > >> --- >> v3: >> - Fix it as Baoquan suggested. >> - Update the commit message. >> --- >> kernel/crash_reserve.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c >> index c5213f123e19..dacc268429e2 100644 >> --- a/kernel/crash_reserve.c >> +++ b/kernel/crash_reserve.c >> @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, >> search_end = CRASH_ADDR_HIGH_MAX; >> search_base = CRASH_ADDR_LOW_MAX; >> crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; >> - goto retry; >> + if (search_base != search_end) >> + goto retry; >> } >> >> /* >> -- >> 2.34.1 >> > >
On 07/18/24 at 08:10pm, Jinjie Ruan wrote: > > > On 2024/7/18 19:14, Baoquan He wrote: > > On 07/18/24 at 11:54am, Jinjie Ruan wrote: > > > > I don't fully catch the subject, what does the 'dead loop bug at high' > > mean? > > It means alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] repeatedly > which corresponds to the crashkernel parameter of the "high". That may mislead people to think it's a crashkernel=,high setting and the corresponding issue. Maybe "crash: Fix x86_32 crashkernel reservation dead loop" is good enough.
diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index c5213f123e19..dacc268429e2 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -414,7 +414,8 @@ void __init reserve_crashkernel_generic(char *cmdline, search_end = CRASH_ADDR_HIGH_MAX; search_base = CRASH_ADDR_LOW_MAX; crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; - goto retry; + if (search_base != search_end) + goto retry; } /*