diff mbox series

[v10,4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

Message ID 20200703035816.31289-5-chenzhou10@huawei.com (mailing list archive)
State New, archived
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Commit Message

chenzhou July 3, 2020, 3:58 a.m. UTC
commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
broken the arm64 kdump. If the memory reserved for crash dump kernel
falled in ZONE_DMA32, the devices in crash dump kernel need to use
ZONE_DMA will alloc fail.

This patch addressed the above issue based on "reserving crashkernel
above 4G". Originally, we reserve low memory below 4G, and now just need
to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
in crash dump kernel, it is a good choice to use parameters
"crashkernel=X crashkernel=Y,low".

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 kernel/crash_core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Catalin Marinas July 27, 2020, 5:30 p.m. UTC | #1
On Fri, Jul 03, 2020 at 11:58:15AM +0800, Chen Zhou wrote:
> commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
> broken the arm64 kdump. If the memory reserved for crash dump kernel
> falled in ZONE_DMA32, the devices in crash dump kernel need to use
> ZONE_DMA will alloc fail.
> 
> This patch addressed the above issue based on "reserving crashkernel
> above 4G". Originally, we reserve low memory below 4G, and now just need
> to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
> if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
> in crash dump kernel, it is a good choice to use parameters
> "crashkernel=X crashkernel=Y,low".
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> ---
>  kernel/crash_core.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a7580d291c37..e8ecbbc761a3 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
>  	unsigned long long base, low_base = 0, low_size = 0;
>  	unsigned long total_low_mem;
>  	int ret;
> +	phys_addr_t crash_max = 1ULL << 32;
>  
>  	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>  
> @@ -352,7 +353,11 @@ int __init reserve_crashkernel_low(void)
>  			return 0;
>  	}
>  
> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> +#ifdef CONFIG_ARM64
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		crash_max = arm64_dma_phys_limit;
> +#endif
> +	low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
>  	if (!low_base) {
>  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>  		       (unsigned long)(low_size >> 20));

Given the number of #ifdefs we end up with in this function, I think
it's better to simply copy to the code to arch/arm64 and tailor it
accordingly.

Anyway, there are two series solving slightly different issues with
kdump reservations:

1. This series which relaxes the crashkernel= allocation to go anywhere
   in the accessible space while having a dedicated crashkernel=X,low
   option for ZONE_DMA.

2. Bhupesh's series [1] forcing crashkernel=X allocations only from
   ZONE_DMA.

For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
Existing crashkernel= uses may no longer work, depending on where the
allocation falls. Option (2) above is a quick fix assuming that the
crashkernel reservation is small enough. What's a typical crashkernel
option here? That series is probably more prone to reservation failures.

Option (1), i.e. this series, doesn't solve the problem raised by
Bhupesh unless one uses the crashkernel=X,low argument. It can actually
make it worse even for ZONE_DMA32 since the allocation can go above 4G
(assuming that we change the ZONE_DMA configuration to only limit it to
1GB on RPi4).

I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
allocations. If this is too small for typical kdump, we can look into
expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
list). In addition, if Chen thinks allocations above 4G are still needed
or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
",high" option to explicitly require such access.

[1] http://lists.infradead.org/pipermail/kexec/2020-July/020777.html
chenzhou July 29, 2020, 3:52 a.m. UTC | #2
Hi  Catalin,


On 2020/7/28 1:30, Catalin Marinas wrote:
> On Fri, Jul 03, 2020 at 11:58:15AM +0800, Chen Zhou wrote:
>> commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
>> broken the arm64 kdump. If the memory reserved for crash dump kernel
>> falled in ZONE_DMA32, the devices in crash dump kernel need to use
>> ZONE_DMA will alloc fail.
>>
>> This patch addressed the above issue based on "reserving crashkernel
>> above 4G". Originally, we reserve low memory below 4G, and now just need
>> to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
>> if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
>> in crash dump kernel, it is a good choice to use parameters
>> "crashkernel=X crashkernel=Y,low".
>>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> ---
>>  kernel/crash_core.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index a7580d291c37..e8ecbbc761a3 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
>>  	unsigned long long base, low_base = 0, low_size = 0;
>>  	unsigned long total_low_mem;
>>  	int ret;
>> +	phys_addr_t crash_max = 1ULL << 32;
>>  
>>  	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>  
>> @@ -352,7 +353,11 @@ int __init reserve_crashkernel_low(void)
>>  			return 0;
>>  	}
>>  
>> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>> +#ifdef CONFIG_ARM64
>> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +		crash_max = arm64_dma_phys_limit;
>> +#endif
>> +	low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
>>  	if (!low_base) {
>>  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>  		       (unsigned long)(low_size >> 20));
> Given the number of #ifdefs we end up with in this function, I think
> it's better to simply copy to the code to arch/arm64 and tailor it
> accordingly.
>
> Anyway, there are two series solving slightly different issues with
> kdump reservations:
>
> 1. This series which relaxes the crashkernel= allocation to go anywhere
>    in the accessible space while having a dedicated crashkernel=X,low
>    option for ZONE_DMA.
>
> 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
>    ZONE_DMA.
>
> For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
> Existing crashkernel= uses may no longer work, depending on where the
> allocation falls. Option (2) above is a quick fix assuming that the
> crashkernel reservation is small enough. What's a typical crashkernel
> option here? That series is probably more prone to reservation failures.
>
> Option (1), i.e. this series, doesn't solve the problem raised by
> Bhupesh unless one uses the crashkernel=X,low argument. It can actually
> make it worse even for ZONE_DMA32 since the allocation can go above 4G
> (assuming that we change the ZONE_DMA configuration to only limit it to
> 1GB on RPi4).
>
> I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
> allocations. If this is too small for typical kdump, we can look into
> expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
> list). In addition, if Chen thinks allocations above 4G are still needed
> or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
> ",high" option to explicitly require such access.
Thanks for your reply and exhaustive explanation.

In our ARM servers, we need to to reserve a large chunk for kdump(512M or 1G),
there is no enough low memory. So we proposed this patch series
"support reserving crashkernel above 4G on arm64 kdump" In April 2019.

I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier versions.
Suggested by James, to simplify, we call reserve_crashkernel_low() at the beginning of
reserve_crashkernel() and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low()
allocated something.
That is, just the parameter "crashkernel=X,low" is ok and i deleted "crashkernel=X,high".

After the ZONE_DMA introduced in December 2019, the issue occurred as you said above.
In fact, we didn't have RPi4 machine. Originally, i suggested to fix this based on this patch series
and used the dedicated option.

According to your clarify, for typical kdump, there are other solutions. In this case,
"keep the crashkernel= behaviour to ZONE_DMA allocations" looks much better.

How about like this:
1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel= behaviour to ZONE_DMA allocations.
2. For this patch series, make the reserve_crashkernel_low() to ZONE_DMA allocations.

Thanks,
Chen Zhou
> [1] http://lists.infradead.org/pipermail/kexec/2020-July/020777.html
>
Catalin Marinas July 29, 2020, 11:58 a.m. UTC | #3
Hi Chen,

On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
> On 2020/7/28 1:30, Catalin Marinas wrote:
> > Anyway, there are two series solving slightly different issues with
> > kdump reservations:
> >
> > 1. This series which relaxes the crashkernel= allocation to go anywhere
> >    in the accessible space while having a dedicated crashkernel=X,low
> >    option for ZONE_DMA.
> >
> > 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
> >    ZONE_DMA.
> >
> > For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
> > Existing crashkernel= uses may no longer work, depending on where the
> > allocation falls. Option (2) above is a quick fix assuming that the
> > crashkernel reservation is small enough. What's a typical crashkernel
> > option here? That series is probably more prone to reservation failures.
> >
> > Option (1), i.e. this series, doesn't solve the problem raised by
> > Bhupesh unless one uses the crashkernel=X,low argument. It can actually
> > make it worse even for ZONE_DMA32 since the allocation can go above 4G
> > (assuming that we change the ZONE_DMA configuration to only limit it to
> > 1GB on RPi4).
> >
> > I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
> > allocations. If this is too small for typical kdump, we can look into
> > expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
> > list). In addition, if Chen thinks allocations above 4G are still needed
> > or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
> > ",high" option to explicitly require such access.
> 
> Thanks for your reply and exhaustive explanation.
> 
> In our ARM servers, we need to to reserve a large chunk for kdump(512M
> or 1G), there is no enough low memory. So we proposed this patch
> series "support reserving crashkernel above 4G on arm64 kdump" In
> April 2019.

Trying to go through the discussions last year, hopefully things get
clearer.

So prior to the ZONE_DMA change, you still couldn't reserve 1G in the
first 4GB? It shouldn't be sparsely populated during early boot.

> I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier versions.
> Suggested by James, to simplify, we call reserve_crashkernel_low() at the beginning of
> reserve_crashkernel() and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low()
> allocated something.
> That is, just the parameter "crashkernel=X,low" is ok and i deleted "crashkernel=X,high".

The problem I see is that with your patches we diverge from x86
behaviour (and the arm64 behaviour prior to the ZONE_DMA reduction) as
we now require that crashkernel=X,low is always passed if you want
something in ZONE_DMA (and you do want, otherwise the crashdump kernel
fails to boot).

My main requirement is that crashkernel=X, without any suffix, still
works which I don't think is guaranteed with your patches (well,
ignoring RPi4 ZONE_DMA). Bhupesh's series is a quick fix but doesn't
solve your large allocation requirements (that may have worked prior to
the ZONE_DMA change).

> After the ZONE_DMA introduced in December 2019, the issue occurred as
> you said above. In fact, we didn't have RPi4 machine.

You don't even need to have a RPi4 machine, ZONE_DMA has been set to 1GB
unconditionally. And while we could move it back to 4GB on non-RPi4
hardware, I'd rather have a solution that fixes kdump for RPi4 as well.

> Originally, i suggested to fix this based on this patch series and
> used the dedicated option.
> 
> According to your clarify, for typical kdump, there are other
> solutions. In this case, "keep the crashkernel= behaviour to ZONE_DMA
> allocations" looks much better.
> 
> How about like this:
> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
>    behaviour to ZONE_DMA allocations.
> 2. For this patch series, make the reserve_crashkernel_low() to
>    ZONE_DMA allocations.

So you mean rebasing your series on top of Bhupesh's? I guess you can
combine the two, I really don't care which way as long as we fix both
issues and agree on the crashkernel= semantics. I think with some tweaks
we can go with your series alone.

IIUC from the x86 code (especially the part you #ifdef'ed out for
arm64), if ",low" is not passed (so just standard crashkernel=X), it
still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
rest can go in a high region. Why can't we do something similar on
arm64? Of course, you can keep the ",low" argument for explicit
allocation but I don't want to mandate it.

So with an implicit ZONE_DMA allocation similar to the x86 one, we
probably don't need Bhupesh's series at all. In addition, we can limit
crashkernel= to the first 4G with a fall-back to high like x86 (not sure
if memblock_find_in_range() is guaranteed to search in ascending order).
I don't think we need an explicit ",high" annotation.

So with the above, just a crashkernel=1G gives you at least 256MB in
ZONE_DMA followed by the rest anywhere, with a preference for
ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
mostly intact from x86 (less #ifdef's).

Do I miss anything?
chenzhou July 29, 2020, 2:14 p.m. UTC | #4
Hi Catalin,

On 2020/7/29 19:58, Catalin Marinas wrote:
> Hi Chen,
>
> On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
>> On 2020/7/28 1:30, Catalin Marinas wrote:
>>> Anyway, there are two series solving slightly different issues with
>>> kdump reservations:
>>>
>>> 1. This series which relaxes the crashkernel= allocation to go anywhere
>>>    in the accessible space while having a dedicated crashkernel=X,low
>>>    option for ZONE_DMA.
>>>
>>> 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
>>>    ZONE_DMA.
>>>
>>> For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
>>> Existing crashkernel= uses may no longer work, depending on where the
>>> allocation falls. Option (2) above is a quick fix assuming that the
>>> crashkernel reservation is small enough. What's a typical crashkernel
>>> option here? That series is probably more prone to reservation failures.
>>>
>>> Option (1), i.e. this series, doesn't solve the problem raised by
>>> Bhupesh unless one uses the crashkernel=X,low argument. It can actually
>>> make it worse even for ZONE_DMA32 since the allocation can go above 4G
>>> (assuming that we change the ZONE_DMA configuration to only limit it to
>>> 1GB on RPi4).
>>>
>>> I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
>>> allocations. If this is too small for typical kdump, we can look into
>>> expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
>>> list). In addition, if Chen thinks allocations above 4G are still needed
>>> or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
>>> ",high" option to explicitly require such access.
>> Thanks for your reply and exhaustive explanation.
>>
>> In our ARM servers, we need to to reserve a large chunk for kdump(512M
>> or 1G), there is no enough low memory. So we proposed this patch
>> series "support reserving crashkernel above 4G on arm64 kdump" In
>> April 2019.
> Trying to go through the discussions last year, hopefully things get
> clearer.
>
> So prior to the ZONE_DMA change, you still couldn't reserve 1G in the
> first 4GB? It shouldn't be sparsely populated during early boot.
Yes, we prior to the ZONE_DMA change, you still couldn't reserve 1G/512M in the first 4GB.
The memory reported by the bios may be splitted by some "reserved" entries.
Like this:
...
2f126000-2fbfffff : reserved
2fc00000-396affff : System RAM
  30de8000-30de9fff : reserved
  30dec000-30decfff : reserved
  30df2000-30df2fff : reserved
  30e20000-30e4ffff : reserved
  39620000-3968ffff : reserved
396b0000-3974ffff : reserved
39750000-397affff : System RAM
397b0000-398fffff : reserved
39900000-3990ffff : System RAM
  39900000-3990ffff : reserved
...
>
>> I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier versions.
>> Suggested by James, to simplify, we call reserve_crashkernel_low() at the beginning of
>> reserve_crashkernel() and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low()
>> allocated something.
>> That is, just the parameter "crashkernel=X,low" is ok and i deleted "crashkernel=X,high".
> The problem I see is that with your patches we diverge from x86
> behaviour (and the arm64 behaviour prior to the ZONE_DMA reduction) as
> we now require that crashkernel=X,low is always passed if you want
> something in ZONE_DMA (and you do want, otherwise the crashdump kernel
> fails to boot).
>
> My main requirement is that crashkernel=X, without any suffix, still
> works which I don't think is guaranteed with your patches (well,
> ignoring RPi4 ZONE_DMA). Bhupesh's series is a quick fix but doesn't
> solve your large allocation requirements (that may have worked prior to
> the ZONE_DMA change).
The main purpose of this series is to solve the large allocation requirements.
Before the DMA_ZONE, both the original crashkernel=X and large allocation with my  patches
work well.

With the DMA_ZONE, both the original crashkernel=X and large allocation with my  patches
may fail to boot. Both need to think about the DMA_ZONE.

>
>> After the ZONE_DMA introduced in December 2019, the issue occurred as
>> you said above. In fact, we didn't have RPi4 machine.
> You don't even need to have a RPi4 machine, ZONE_DMA has been set to 1GB
> unconditionally. And while we could move it back to 4GB on non-RPi4
> hardware, I'd rather have a solution that fixes kdump for RPi4 as well.
>
>> Originally, i suggested to fix this based on this patch series and
>> used the dedicated option.
>>
>> According to your clarify, for typical kdump, there are other
>> solutions. In this case, "keep the crashkernel= behaviour to ZONE_DMA
>> allocations" looks much better.
>>
>> How about like this:
>> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
>>    behaviour to ZONE_DMA allocations.
>> 2. For this patch series, make the reserve_crashkernel_low() to
>>    ZONE_DMA allocations.
> So you mean rebasing your series on top of Bhupesh's? I guess you can
> combine the two, I really don't care which way as long as we fix both
> issues and agree on the crashkernel= semantics. I think with some tweaks
> we can go with your series alone.
>
> IIUC from the x86 code (especially the part you #ifdef'ed out for
> arm64), if ",low" is not passed (so just standard crashkernel=X), it
> still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
> rest can go in a high region. Why can't we do something similar on
> arm64? Of course, you can keep the ",low" argument for explicit
> allocation but I don't want to mandate it.
It is a good idea to combine the two.

For parameter crashkernel=X, we do like this:
1. allocate some low memory in ZONE_DMA(or ZONE_DMA32 if CONFIG_ZONE_DMA=n)
2. allocate X size memory in a high region

",low" argument can be used to specify the low memory.

Do i understand correctly?
>
> So with an implicit ZONE_DMA allocation similar to the x86 one, we
> probably don't need Bhupesh's series at all. In addition, we can limit
> crashkernel= to the first 4G with a fall-back to high like x86 (not sure
> if memblock_find_in_range() is guaranteed to search in ascending order).
> I don't think we need an explicit ",high" annotation.
>
> So with the above, just a crashkernel=1G gives you at least 256MB in
> ZONE_DMA followed by the rest anywhere, with a preference for
> ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
> mostly intact from x86 (less #ifdef's).
>
> Do I miss anything?
Yes. We can let crashkernel=X  try to reserve low memory and fall back to use high memory
if failing to find a low range.

About the function reserve_crashkernel_low(), if we put it in arch/arm64, there is some common
code with x86_64. Some suggestions about this?

Thanks,
Chen Zhou
>
Catalin Marinas July 29, 2020, 3:20 p.m. UTC | #5
On Wed, Jul 29, 2020 at 10:14:32PM +0800, chenzhou wrote:
> On 2020/7/29 19:58, Catalin Marinas wrote:
> > On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
> >> How about like this:
> >> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
> >>    behaviour to ZONE_DMA allocations.
> >> 2. For this patch series, make the reserve_crashkernel_low() to
> >>    ZONE_DMA allocations.
> > 
> > So you mean rebasing your series on top of Bhupesh's? I guess you can
> > combine the two, I really don't care which way as long as we fix both
> > issues and agree on the crashkernel= semantics. I think with some tweaks
> > we can go with your series alone.
> >
> > IIUC from the x86 code (especially the part you #ifdef'ed out for
> > arm64), if ",low" is not passed (so just standard crashkernel=X), it
> > still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
> > rest can go in a high region. Why can't we do something similar on
> > arm64? Of course, you can keep the ",low" argument for explicit
> > allocation but I don't want to mandate it.
> 
> It is a good idea to combine the two.
> 
> For parameter crashkernel=X, we do like this:
> 1. allocate some low memory in ZONE_DMA(or ZONE_DMA32 if CONFIG_ZONE_DMA=n)
> 2. allocate X size memory in a high region
> 
> ",low" argument can be used to specify the low memory.
> 
> Do i understand correctly?

Yes, although we could follow the x86 approach:

1. Try low (ZONE_DMA for arm64) allocation, fallback to high allocation
   if it fails.

2. If crash_base is outside ZONE_DMA, call reserve_crashkernel_low()
   which either honours the ,low option or allocates some small amount
   in ZONE_DMA.

If at some point we have platforms failing step 2, we'll look at
changing ZONE_DMA to the full 4GB on non-RPi4 platforms.

It looks to me like x86 ignores the ,low option if the first step
managed to get some low memory. Shall we do the same on arm64?

> > So with an implicit ZONE_DMA allocation similar to the x86 one, we
> > probably don't need Bhupesh's series at all. In addition, we can limit
> > crashkernel= to the first 4G with a fall-back to high like x86 (not sure
> > if memblock_find_in_range() is guaranteed to search in ascending order).
> > I don't think we need an explicit ",high" annotation.
> >
> > So with the above, just a crashkernel=1G gives you at least 256MB in
> > ZONE_DMA followed by the rest anywhere, with a preference for
> > ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
> > mostly intact from x86 (less #ifdef's).
> 
> Yes. We can let crashkernel=X  try to reserve low memory and fall back to use high memory
> if failing to find a low range.

The only question is whether we need to preserve some more ZONE_DMA on
the current system. If for example we pass a crashkernel=512M and some
cma=, we may end up with very little free memory in ZONE_DMA. That's
mostly an issue for RPi4 since other platforms would work with
ZONE_DMA32. We could add a threshold and go for high allocation directly
if the required size is too large.

> About the function reserve_crashkernel_low(), if we put it in arch/arm64, there is some common
> code with x86_64. Some suggestions about this?

If we can use this function almost intact, just move it in a common
place. But if it gets sprinkled with #ifdef CONFIG_ARM64, I'd rather
duplicate it. I'd still prefer to move it to a common place if possible.

You can go a step further and also move the x86 reserve_crashkernel() to
common code. I don't think there a significant difference between arm64
and x86 here. You'd have to define arch-specific specific
CRASH_ADDR_LOW_MAX etc.

Also patches moving code should not have any functional change. The
CRASH_ALIGN change from 16M to 2M on x86 should be a separate patch as
it needs to be acked by the x86 maintainers (IIRC, Ingo only acked the
function move if there was no functional change; CRASH_ALIGN is used for
the start address, not just alignment, on x86).
chenzhou July 30, 2020, 8:22 a.m. UTC | #6
Hi Catalin,


On 2020/7/29 23:20, Catalin Marinas wrote:
> On Wed, Jul 29, 2020 at 10:14:32PM +0800, chenzhou wrote:
>> On 2020/7/29 19:58, Catalin Marinas wrote:
>>> On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
>>>> How about like this:
>>>> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
>>>>    behaviour to ZONE_DMA allocations.
>>>> 2. For this patch series, make the reserve_crashkernel_low() to
>>>>    ZONE_DMA allocations.
>>> So you mean rebasing your series on top of Bhupesh's? I guess you can
>>> combine the two, I really don't care which way as long as we fix both
>>> issues and agree on the crashkernel= semantics. I think with some tweaks
>>> we can go with your series alone.
>>>
>>> IIUC from the x86 code (especially the part you #ifdef'ed out for
>>> arm64), if ",low" is not passed (so just standard crashkernel=X), it
>>> still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
>>> rest can go in a high region. Why can't we do something similar on
>>> arm64? Of course, you can keep the ",low" argument for explicit
>>> allocation but I don't want to mandate it.
>> It is a good idea to combine the two.
>>
>> For parameter crashkernel=X, we do like this:
>> 1. allocate some low memory in ZONE_DMA(or ZONE_DMA32 if CONFIG_ZONE_DMA=n)
>> 2. allocate X size memory in a high region
>>
>> ",low" argument can be used to specify the low memory.
>>
>> Do i understand correctly?
> Yes, although we could follow the x86 approach:
>
> 1. Try low (ZONE_DMA for arm64) allocation, fallback to high allocation
>    if it fails.
>
> 2. If crash_base is outside ZONE_DMA, call reserve_crashkernel_low()
>    which either honours the ,low option or allocates some small amount
>    in ZONE_DMA.
>
> If at some point we have platforms failing step 2, we'll look at
> changing ZONE_DMA to the full 4GB on non-RPi4 platforms.
>
> It looks to me like x86 ignores the ,low option if the first step
> managed to get some low memory. Shall we do the same on arm64?
Yes, we could do like this.
>
>>> So with an implicit ZONE_DMA allocation similar to the x86 one, we
>>> probably don't need Bhupesh's series at all. In addition, we can limit
>>> crashkernel= to the first 4G with a fall-back to high like x86 (not sure
>>> if memblock_find_in_range() is guaranteed to search in ascending order).
>>> I don't think we need an explicit ",high" annotation.
>>>
>>> So with the above, just a crashkernel=1G gives you at least 256MB in
>>> ZONE_DMA followed by the rest anywhere, with a preference for
>>> ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
>>> mostly intact from x86 (less #ifdef's).
>> Yes. We can let crashkernel=X  try to reserve low memory and fall back to use high memory
>> if failing to find a low range.
> The only question is whether we need to preserve some more ZONE_DMA on
> the current system. If for example we pass a crashkernel=512M and some
> cma=, we may end up with very little free memory in ZONE_DMA. That's
> mostly an issue for RPi4 since other platforms would work with
> ZONE_DMA32. We could add a threshold and go for high allocation directly
> if the required size is too large.
Ok.  I will think about the threshold in the next version and make the value be 1/2 or 1/3 of the ZONE_DMA.
>
>> About the function reserve_crashkernel_low(), if we put it in arch/arm64, there is some common
>> code with x86_64. Some suggestions about this?
> If we can use this function almost intact, just move it in a common
> place. But if it gets sprinkled with #ifdef CONFIG_ARM64, I'd rather
> duplicate it. I'd still prefer to move it to a common place if possible.
>
> You can go a step further and also move the x86 reserve_crashkernel() to
> common code. I don't think there a significant difference between arm64
> and x86 here. You'd have to define arch-specific specific
> CRASH_ADDR_LOW_MAX etc.
I will take these into account and send the next version recently.
>
> Also patches moving code should not have any functional change. The
> CRASH_ALIGN change from 16M to 2M on x86 should be a separate patch as
> it needs to be acked by the x86 maintainers (IIRC, Ingo only acked the
> function move if there was no functional change; CRASH_ALIGN is used for
> the start address, not just alignment, on x86).
>
Thanks,
Chen Zhou
diff mbox series

Patch

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a7580d291c37..e8ecbbc761a3 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -320,6 +320,7 @@  int __init reserve_crashkernel_low(void)
 	unsigned long long base, low_base = 0, low_size = 0;
 	unsigned long total_low_mem;
 	int ret;
+	phys_addr_t crash_max = 1ULL << 32;
 
 	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
 
@@ -352,7 +353,11 @@  int __init reserve_crashkernel_low(void)
 			return 0;
 	}
 
-	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+#ifdef CONFIG_ARM64
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		crash_max = arm64_dma_phys_limit;
+#endif
+	low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
 	if (!low_base) {
 		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
 		       (unsigned long)(low_size >> 20));