diff mbox series

[v5] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

Message ID 20230407022419.19412-1-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v5] arm64: kdump: simplify the reservation behaviour of crashkernel=,high | expand

Commit Message

Baoquan He April 7, 2023, 2:24 a.m. UTC
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
suitable memory region top down. If the 'xM' of crashkernel high memory
is reserved from high memory successfully, it will try to reserve
crashkernel low memory later accoringly. Otherwise, it will try to search
low memory area for the 'xM' suitable region. Please see the details in
Documentation/admin-guide/kernel-parameters.txt.

While we observed an unexpected case where a reserved region crosses the
high and low meomry boundary. E.g on a system with 4G as low memory end,
user added the kernel parameters like: 'crashkernel=512M,high', it could
finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
The crashkernel high region crossing low and high memory boudary will bring
issues:

1) For crashkernel=x,high, if getting crashkernel high region across
low and high memory boundary, then user will see two memory regions in
low memory, and one memory region in high memory. The two crashkernel
low memory regions are confusing as shown in above example.

2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
and y <= 128M, when crashkernel high region crosses low and high memory
boundary and the part of crashkernel high reservation below boundary is
bigger than y, the expected crahskernel low reservation will be skipped.
But the expected crashkernel high reservation is shrank and could not
satisfy user space requirement.

3) The crossing boundary behaviour of crahskernel high reservation is
different than x86 arch. On x86_64, the low memory end is 4G fixedly,
and the memory near 4G is reserved by system, e.g for mapping firmware,
pci mapping, so the crashkernel reservation crossing boundary never happens.
From distros point of view, this brings inconsistency and confusion. Users
need to dig into x86 and arm64 system details to find out why.

For kernel itself, the impact of issue 3) could be slight. While issue
1) and 2) cause actual impact because it brings obscure semantics and
behaviour to crashkernel=,high reservation.

Here, for crashkernel=xM,high, search the high memory for the suitable
region only in high memory. If failed, try reserving the suitable
region only in low memory. Like this, the crashkernel high region will
only exist in high memory, and crashkernel low region only exists in low
memory. The reservation behaviour for crashkernel=,high is clearer and
simpler.

Note: RPi4 has different zone ranges than normal memory. Its DMA zone is
0~1G, and DMA32 zone is 1G~4G if CONFIG_ZONE_DMA|DMA32 are enabled by
default. The low memory end is 1G in order to validate all devices, high
memory starts at 1G memory. However, for being consistent with normla
arm64 system, its low memory end is still 1G, while reserving crashkernel
high memory from 4G if crashkernel=size,high specified. This will remove
confusion.

With above change applied, summary of arm64 crashkernel reservation range:
1)
RPi4(zone DMA:0~1G; DMA32:1G~4G):
 crashkernel=size
  0~1G: low memory | 1G~top: high memory

 crashkernel=size,high
  0~1G: low memory | 4G~top: high memory

2)
Other normal system:
 crashkernel=size
 crashkernel=size,high
  0~4G: low memory | 4G~top: high memory

3)
Systems w/o zone DMA|DMA32
 crashkernel=size
 crashkernel=size,high
  0~top: low memory

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/arm64/mm/init.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

Comments

Baoquan He April 7, 2023, 2:32 a.m. UTC | #1
On 04/07/23 at 10:24am, Baoquan He wrote:
......
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 66e70ca47680..307263c01292 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
>  
>  #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
>  #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
> +#define CRASH_HIGH_SEARCH_BASE		SZ_4G
>  
>  #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
>  
> @@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
>   */
>  static void __init reserve_crashkernel(void)
>  {
> -	unsigned long long crash_base, crash_size;
> -	unsigned long long crash_low_size = 0;
> +	unsigned long long crash_base, crash_size, search_base;
>  	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> +	unsigned long long crash_low_size = 0;
>  	char *cmdline = boot_command_line;
> -	int ret;
>  	bool fixed_base = false;
> +	bool high = false;
> +	int ret;
>  
>  	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
>  		return;
> @@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void)
>  		else if (ret)
>  			return;
>  
> +		search_base = CRASH_HIGH_SEARCH_BASE;

Here, I am hesitant if a conditional check is needed as below. On
special system where both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
are disabled, there's only low memory, means its arm64_dma_phys_limit
equals to (PHYS_MASK + 1). In this case, whatever the crashkernel= is,
it can search the whole system memory for available crashkernel region.
Maybe it's fine since it's not big deal, the memory regoin can be found
anyway.

  		crash_max = CRASH_ADDR_HIGH_MAX;
		if (crash_max != CRASH_ADDR_LOW_MAX)
			search_base = CRASH_HIGH_SEARCH_BASE;

> +		high = true;
>  	} else if (ret || !crash_size) {
>  		/* The specified value is invalid */
>  		return;
> @@ -140,31 +144,51 @@ static void __init reserve_crashkernel(void)
>  	/* User specifies base address explicitly. */
>  	if (crash_base) {
>  		fixed_base = true;
> +		search_base = crash_base;
>  		crash_max = crash_base + crash_size;
>  	}
>  
>  retry:
>  	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> -					       crash_base, crash_max);
> +					       search_base, crash_max);
>  	if (!crash_base) {
>  		/*
> -		 * If the first attempt was for low memory, fall back to
> -		 * high memory, the minimum required low memory will be
> -		 * reserved later.
> +		 * For crashkernel=size[KMG]@offset[KMG], print out failure
> +		 * message if can't reserve the specified region.
>  		 */
> -		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
> +		if (fixed_base) {
> +			pr_warn("crashkernel reservation failed - memory is in use.\n");
> +			return;
> +		}
> +
> +		/*
> +		 * For crashkernel=size[KMG], if the first attempt was for
> +		 * low memory, fall back to high memory, the minimum required
> +		 * low memory will be reserved later.
> +		 */
> +		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
>  			crash_max = CRASH_ADDR_HIGH_MAX;
> +			search_base = CRASH_ADDR_LOW_MAX;
>  			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
>  			goto retry;
>  		}
>  
> +		/*
> +		 * For crashkernel=size[KMG],high, if the first attempt was
> +		 * for high memory, fall back to low memory.
> +		 */
> +		if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
> +			crash_max = CRASH_ADDR_LOW_MAX;
> +			search_base = 0;
> +			goto retry;
> +		}
>  		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>  			crash_size);
>  		return;
>  	}
>  
> -	if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
> -	     crash_low_size && reserve_crashkernel_low(crash_low_size)) {
> +	if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
> +	     reserve_crashkernel_low(crash_low_size)) {
>  		memblock_phys_free(crash_base, crash_size);
>  		return;
>  	}
> -- 
> 2.34.1
>
Catalin Marinas April 12, 2023, 11:51 a.m. UTC | #2
On Fri, Apr 07, 2023 at 10:32:38AM +0800, Baoquan He wrote:
> On 04/07/23 at 10:24am, Baoquan He wrote:
> ......
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 66e70ca47680..307263c01292 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
> >  
> >  #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
> >  #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
> > +#define CRASH_HIGH_SEARCH_BASE		SZ_4G
> >  
> >  #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
> >  
> > @@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
> >   */
> >  static void __init reserve_crashkernel(void)
> >  {
> > -	unsigned long long crash_base, crash_size;
> > -	unsigned long long crash_low_size = 0;
> > +	unsigned long long crash_base, crash_size, search_base;
> >  	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > +	unsigned long long crash_low_size = 0;
> >  	char *cmdline = boot_command_line;
> > -	int ret;
> >  	bool fixed_base = false;
> > +	bool high = false;
> > +	int ret;
> >  
> >  	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
> >  		return;
> > @@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void)
> >  		else if (ret)
> >  			return;
> >  
> > +		search_base = CRASH_HIGH_SEARCH_BASE;
> 
> Here, I am hesitant if a conditional check is needed as below. On
> special system where both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
> are disabled, there's only low memory, means its arm64_dma_phys_limit
> equals to (PHYS_MASK + 1). In this case, whatever the crashkernel= is,
> it can search the whole system memory for available crashkernel region.
> Maybe it's fine since it's not big deal, the memory regoin can be found
> anyway.
> 
>   		crash_max = CRASH_ADDR_HIGH_MAX;
> 		if (crash_max != CRASH_ADDR_LOW_MAX)
> 			search_base = CRASH_HIGH_SEARCH_BASE;

Does x86 do anything different here or they just can't disable
ZONE_DMA32? I'd be tempted to instead define CRASH_ADDR_LOW_MAX as
min(SZ_4G, arm64_dma_phys_limit) so that the crashkernel=,high semantics
are still preserved irrespective of how the kernel was built.

There's also the difference between what the current kernel vs the kdump
kernel. I don't think there's a strong requirement that they have the
same config options, in which case it may be safer to just honour the 4G
boundary.

Otherwise the patch looks fine. Whether you want to add the min limit
above:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Baoquan He April 13, 2023, 7:45 a.m. UTC | #3
On 04/12/23 at 12:51pm, Catalin Marinas wrote:
> On Fri, Apr 07, 2023 at 10:32:38AM +0800, Baoquan He wrote:
> > On 04/07/23 at 10:24am, Baoquan He wrote:
> > ......
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 66e70ca47680..307263c01292 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -69,6 +69,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
> > >  
> > >  #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
> > >  #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
> > > +#define CRASH_HIGH_SEARCH_BASE		SZ_4G
> > >  
> > >  #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
> > >  
> > > @@ -101,12 +102,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
> > >   */
> > >  static void __init reserve_crashkernel(void)
> > >  {
> > > -	unsigned long long crash_base, crash_size;
> > > -	unsigned long long crash_low_size = 0;
> > > +	unsigned long long crash_base, crash_size, search_base;
> > >  	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > > +	unsigned long long crash_low_size = 0;
> > >  	char *cmdline = boot_command_line;
> > > -	int ret;
> > >  	bool fixed_base = false;
> > > +	bool high = false;
> > > +	int ret;
> > >  
> > >  	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
> > >  		return;
> > > @@ -129,7 +131,9 @@ static void __init reserve_crashkernel(void)
> > >  		else if (ret)
> > >  			return;
> > >  
> > > +		search_base = CRASH_HIGH_SEARCH_BASE;
> > 
> > Here, I am hesitant if a conditional check is needed as below. On
> > special system where both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 
> > are disabled, there's only low memory, means its arm64_dma_phys_limit
> > equals to (PHYS_MASK + 1). In this case, whatever the crashkernel= is,
> > it can search the whole system memory for available crashkernel region.
> > Maybe it's fine since it's not big deal, the memory regoin can be found
> > anyway.
> > 
> >   		crash_max = CRASH_ADDR_HIGH_MAX;
> > 		if (crash_max != CRASH_ADDR_LOW_MAX)
> > 			search_base = CRASH_HIGH_SEARCH_BASE;
> 
> Does x86 do anything different here or they just can't disable
> ZONE_DMA32? I'd be tempted to instead define CRASH_ADDR_LOW_MAX as
> min(SZ_4G, arm64_dma_phys_limit) so that the crashkernel=,high semantics
> are still preserved irrespective of how the kernel was built.

x86 defaults to have both ZONE_DMA and ZONE_DMA32, and hardcode the zone
upper limit. I think it's not easy to disable ZONE_DMA32. Otherwise, the
device can only grab memory from DMA zone which is only 16MB big. The
rest memory will enter into normal zone. Please see below code snippet.

arch/x86/include/asm/dma.h:
/* 16MB ISA DMA zone */
#define MAX_DMA_PFN   ((16UL * 1024 * 1024) >> PAGE_SHIFT) 

/* 4GB broken PCI/AGP hardware bus master zone */
#define MAX_DMA32_PFN (1UL << (32 - PAGE_SHIFT))

arch/x86/mm/init.c
zone_sizes_init()

About CRASH_ADDR_LOW_MAX definition, with min(SZ_4G,
arm64_dma_phys_limit), it's similar to defining it as
arm64_dma_phys_limit directly. 
it's the same as defining CRASH_ADDR_LOW_MAX as arm64_dma_phys_limit.

Because arm64_dma_phys_limit has three kinds of value:
1) 1G on RPi4
2) 4G on normal system
3) (PHYS_MASK + 1) on special system w/o zone DMA and DMA32

For the first two types, min(SZ_4G, arm64_dma_phys_limit) is
arm64_dma_phys_limit. While for the 3rd one, CRASH_ADDR_LOW_MAX is 4G, but
it will make type 3) system not be able to reserve memory across 4G.
However, on type 3) system, all its memory is low memory, the 4G should
not be a boundary. I tried code change with min(SZ_4G, 
arm64_dma_phys_limit), if we can accept this, the v4 patch looks more
appripriate except that RPi4 has inconsistent behaviour when
crashkernel=,high is specified.

> 
> There's also the difference between what the current kernel vs the kdump
> kernel. I don't think there's a strong requirement that they have the
> same config options, in which case it may be safer to just honour the 4G
> boundary.

Yes. In principle, kdump kerel doesn't have to be the same as the current
kernel. However, in reality, we usually take the same kernel for the
current kernel and kdump kernel. In our distros, we do that by default.
While user can choose different kernel as kdump kernel.

But, we may not suggest user to take kernel with different config
options as kdump kernel. E.g on RPi4, the current kernel has ZONE_DMA
and ZONE_DMA32 enabled, then it has DMA zone under 1G, DMA32 zone under
4G. If we take a kernel without ZONE_DMA and ZONE_DMA32 set as kdump
kernel, it's very likely not working because the pci device could get
memory above 1G, even above 4G.

> 
> Otherwise the patch looks fine. Whether you want to add the min limit
> above:

I am OK with this version, or the version with min(SZ_4G,
arm64_dma_phys_limit), or v4. Please help point out if I got your idea
correctly. Thanks a lot.
Catalin Marinas April 13, 2023, 2:30 p.m. UTC | #4
On Thu, Apr 13, 2023 at 03:45:50PM +0800, Baoquan He wrote:
> I am OK with this version, or the version with min(SZ_4G,
> arm64_dma_phys_limit), or v4. Please help point out if I got your idea
> correctly. Thanks a lot.

I think we should stick to this patch. The disabling of the ZONE_DMA(32)
is fairly specialised and you are right that we should not introduce an
artificial 4GB crashkernel boundary on such systems. The slight
confusion may be that ,high triggers a search above 4GB where there's
not such boundary but this would match the documentation anyway.
Leizhen (ThunderTown) April 14, 2023, 2:27 a.m. UTC | #5
On 2023/4/13 22:30, Catalin Marinas wrote:
> On Thu, Apr 13, 2023 at 03:45:50PM +0800, Baoquan He wrote:
>> I am OK with this version, or the version with min(SZ_4G,
>> arm64_dma_phys_limit), or v4. Please help point out if I got your idea
>> correctly. Thanks a lot.
> 
> I think we should stick to this patch. The disabling of the ZONE_DMA(32)
> is fairly specialised and you are right that we should not introduce an
> artificial 4GB crashkernel boundary on such systems. The slight
> confusion may be that ,high triggers a search above 4GB where there's
> not such boundary but this would match the documentation anyway.
> 

Agreed, so careful.
Baoquan He April 14, 2023, 9:49 a.m. UTC | #6
On 04/13/23 at 03:30pm, Catalin Marinas wrote:
> On Thu, Apr 13, 2023 at 03:45:50PM +0800, Baoquan He wrote:
> > I am OK with this version, or the version with min(SZ_4G,
> > arm64_dma_phys_limit), or v4. Please help point out if I got your idea
> > correctly. Thanks a lot.
> 
> I think we should stick to this patch. The disabling of the ZONE_DMA(32)
> is fairly specialised and you are right that we should not introduce an
> artificial 4GB crashkernel boundary on such systems. The slight
> confusion may be that ,high triggers a search above 4GB where there's
> not such boundary but this would match the documentation anyway.

OK, sounds good to me. Thanks a lot for all those careful reviewing and
helpful suggestions.
Will Deacon April 14, 2023, 2:34 p.m. UTC | #7
Hi,

On Fri, Apr 07, 2023 at 10:24:19AM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.

[...]

>  arch/arm64/mm/init.c | 44 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 34 insertions(+), 10 deletions(-)

I tried to apply this, but smatch is unhappy with the result:

  | arch/arm64/mm/init.c:153 reserve_crashkernel() error: uninitialized symbol 'search_base'.

I _think_ this is a false positive, but I must say that the control flow
in reserve_crashkernel() is extremely hard to follow so I couldn't be
sure. If the static checker is struggling, then so will humans!

Ideally, this would all be restructured to make it easier to follow,
but in the short term we need something to squash the warning.

Cheers,

Will
Baoquan He April 15, 2023, 12:43 a.m. UTC | #8
On 04/14/23 at 03:34pm, Will Deacon wrote:
> Hi,
> 
> On Fri, Apr 07, 2023 at 10:24:19AM +0800, Baoquan He wrote:
> > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> > suitable memory region top down. If the 'xM' of crashkernel high memory
> > is reserved from high memory successfully, it will try to reserve
> > crashkernel low memory later accoringly. Otherwise, it will try to search
> > low memory area for the 'xM' suitable region. Please see the details in
> > Documentation/admin-guide/kernel-parameters.txt.
> 
> [...]
> 
> >  arch/arm64/mm/init.c | 44 ++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 34 insertions(+), 10 deletions(-)
> 
> I tried to apply this, but smatch is unhappy with the result:
> 
>   | arch/arm64/mm/init.c:153 reserve_crashkernel() error: uninitialized symbol 'search_base'.
> 
> I _think_ this is a false positive, but I must say that the control flow
> in reserve_crashkernel() is extremely hard to follow so I couldn't be
> sure. If the static checker is struggling, then so will humans!
> 
> Ideally, this would all be restructured to make it easier to follow,
> but in the short term we need something to squash the warning.

Sorry for that, I didn't run static checker. We should do initialization
as below to fix the warning. Below code can be added into this v5 patch,
or I can post v6 with Catalin's Reviewed-by tag.

Yes, about restructuring, I can think about it later. The added corner
cases handling specific for arm64 makes the flow a little harder to
flow. I will consider if adding document in arm64 is better.

From 3575571ab9614c31f30933148a8693924a30321c Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@redhat.com>
Date: Sat, 15 Apr 2023 08:35:08 +0800
Subject: [PATCH] arm64: kdump: fix warning reported by static checker
Content-type: text/plain

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 13750b0548da..bfc117cefcd5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -128,9 +128,9 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
  */
 static void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_base, crash_size, search_base;
+	unsigned long long crash_low_size = 0, search_base = 0;
 	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
-	unsigned long long crash_low_size = 0;
+	unsigned long long crash_base, crash_size;
 	char *cmdline = boot_command_line;
 	bool fixed_base = false;
 	bool high = false;
Baoquan He May 16, 2023, 3 a.m. UTC | #9
Hi Will,

On 04/14/23 at 03:34pm, Will Deacon wrote:
> Hi,
> 
> On Fri, Apr 07, 2023 at 10:24:19AM +0800, Baoquan He wrote:
> > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> > suitable memory region top down. If the 'xM' of crashkernel high memory
> > is reserved from high memory successfully, it will try to reserve
> > crashkernel low memory later accoringly. Otherwise, it will try to search
> > low memory area for the 'xM' suitable region. Please see the details in
> > Documentation/admin-guide/kernel-parameters.txt.
> 
> [...]
> 
> >  arch/arm64/mm/init.c | 44 ++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 34 insertions(+), 10 deletions(-)
> 
> I tried to apply this, but smatch is unhappy with the result:
> 
>   | arch/arm64/mm/init.c:153 reserve_crashkernel() error: uninitialized symbol 'search_base'.
> 
> I _think_ this is a false positive, but I must say that the control flow
> in reserve_crashkernel() is extremely hard to follow so I couldn't be
> sure. If the static checker is struggling, then so will humans!
> 
> Ideally, this would all be restructured to make it easier to follow,
> but in the short term we need something to squash the warning.

I tried to refactor the code as you suggested, while it seems not easy
to do. The complexity comes from several cases which need be handled.
I try my best to write a document with the things I think important to
help understand the code. Please help check if it helps or we just having
the current code is fine.

Thanks
Baoquan
diff mbox series

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 66e70ca47680..307263c01292 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -69,6 +69,7 @@  phys_addr_t __ro_after_init arm64_dma_phys_limit;
 
 #define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
 #define CRASH_ADDR_HIGH_MAX		(PHYS_MASK + 1)
+#define CRASH_HIGH_SEARCH_BASE		SZ_4G
 
 #define DEFAULT_CRASH_KERNEL_LOW_SIZE	(128UL << 20)
 
@@ -101,12 +102,13 @@  static int __init reserve_crashkernel_low(unsigned long long low_size)
  */
 static void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_base, crash_size;
-	unsigned long long crash_low_size = 0;
+	unsigned long long crash_base, crash_size, search_base;
 	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+	unsigned long long crash_low_size = 0;
 	char *cmdline = boot_command_line;
-	int ret;
 	bool fixed_base = false;
+	bool high = false;
+	int ret;
 
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
@@ -129,7 +131,9 @@  static void __init reserve_crashkernel(void)
 		else if (ret)
 			return;
 
+		search_base = CRASH_HIGH_SEARCH_BASE;
 		crash_max = CRASH_ADDR_HIGH_MAX;
+		high = true;
 	} else if (ret || !crash_size) {
 		/* The specified value is invalid */
 		return;
@@ -140,31 +144,51 @@  static void __init reserve_crashkernel(void)
 	/* User specifies base address explicitly. */
 	if (crash_base) {
 		fixed_base = true;
+		search_base = crash_base;
 		crash_max = crash_base + crash_size;
 	}
 
 retry:
 	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-					       crash_base, crash_max);
+					       search_base, crash_max);
 	if (!crash_base) {
 		/*
-		 * If the first attempt was for low memory, fall back to
-		 * high memory, the minimum required low memory will be
-		 * reserved later.
+		 * For crashkernel=size[KMG]@offset[KMG], print out failure
+		 * message if can't reserve the specified region.
 		 */
-		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+		if (fixed_base) {
+			pr_warn("crashkernel reservation failed - memory is in use.\n");
+			return;
+		}
+
+		/*
+		 * For crashkernel=size[KMG], if the first attempt was for
+		 * low memory, fall back to high memory, the minimum required
+		 * low memory will be reserved later.
+		 */
+		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
 			crash_max = CRASH_ADDR_HIGH_MAX;
+			search_base = CRASH_ADDR_LOW_MAX;
 			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
 			goto retry;
 		}
 
+		/*
+		 * For crashkernel=size[KMG],high, if the first attempt was
+		 * for high memory, fall back to low memory.
+		 */
+		if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
+			crash_max = CRASH_ADDR_LOW_MAX;
+			search_base = 0;
+			goto retry;
+		}
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 			crash_size);
 		return;
 	}
 
-	if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
-	     crash_low_size && reserve_crashkernel_low(crash_low_size)) {
+	if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
+	     reserve_crashkernel_low(crash_low_size)) {
 		memblock_phys_free(crash_base, crash_size);
 		return;
 	}