diff mbox series

[v22,5/9] arm64: kdump: Reimplement crashkernel=X

Message ID 20220414115720.1887-6-thunder.leizhen@huawei.com (mailing list archive)
State New, archived
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Commit Message

Zhen Lei April 14, 2022, 11:57 a.m. UTC
From: Chen Zhou <chenzhou10@huawei.com>

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is not enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will fail to boot because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Co-developed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/kernel/machine_kexec.c      |   9 +-
 arch/arm64/kernel/machine_kexec_file.c |  12 ++-
 arch/arm64/mm/init.c                   | 116 +++++++++++++++++++++++--
 3 files changed, 124 insertions(+), 13 deletions(-)

Comments

Catalin Marinas April 26, 2022, 6:02 p.m. UTC | #1
On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
>   * This function reserves memory area given in "crashkernel=" kernel command
>   * line parameter. The memory reserved is used by dump capture kernel when
>   * primary kernel is crashing.
> + *
> + * NOTE: Reservation of crashkernel,low is special since its existence
> + * is not independent, need rely on the existence of crashkernel,high.
> + * Here, four cases of crashkernel low memory reservation are summarized:
> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
> + *    memory takes Y;
> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> + *    take the default crashkernel low memory size;
> + * 3) crashkernel=X is specified, while fallback to get a memory region
> + *    in high memory, take the default crashkernel low memory size;
> + * 4) crashkernel='invalid value',low is specified, failed the whole
> + *    crashkernel reservation and bail out.

Following the x86 behaviour made sense when we were tried to get that
code generic. Now that we moved the logic under arch/arm64, we can
diverge a bit. I lost track of the original (v1/v2) proposal but I
wonder whether we still need the fallback to high for crashkernel=Y.
Maybe simpler, no fallbacks:

	crashkernel=Y - keep the current behaviour, ignore high,low
	crashkernel=Y,high - allocate above ZONE_DMA
	crashkernel=Y,low - allocate within ZONE_DMA

From your proposal, the difference is that the Y,high option won't
have any default ZONE_DMA fallback, one would have to explicitly pass
the Y,low option if needed.

Just a thought, maybe it makes the code simpler. But I'm open to
discussion if there are good arguments for the proposed (x86-like)
behaviour. One argument could be for crashkernel=Y to fall back to high
if distros don't want to bother with high/low settings.

Another thing I may have asked in the past, what happens if we run a new
kernel with these patches with old kexec user tools. I suspect the
crashkernel=Y with the fallback to high will confuse the tools.

BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the
crashkernel above 4G. Let's get the crashkernel reservations sorted
first, it's been around for too long.

Thanks.
Zhen Lei April 27, 2022, 6:54 a.m. UTC | #2
On 2022/4/27 2:02, Catalin Marinas wrote:
> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>>  /*
>>   * reserve_crashkernel() - reserves memory for crash kernel
>>   *
>>   * This function reserves memory area given in "crashkernel=" kernel command
>>   * line parameter. The memory reserved is used by dump capture kernel when
>>   * primary kernel is crashing.
>> + *
>> + * NOTE: Reservation of crashkernel,low is special since its existence
>> + * is not independent, need rely on the existence of crashkernel,high.
>> + * Here, four cases of crashkernel low memory reservation are summarized:
>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
>> + *    memory takes Y;
>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
>> + *    take the default crashkernel low memory size;
>> + * 3) crashkernel=X is specified, while fallback to get a memory region
>> + *    in high memory, take the default crashkernel low memory size;
>> + * 4) crashkernel='invalid value',low is specified, failed the whole
>> + *    crashkernel reservation and bail out.
> 
> Following the x86 behaviour made sense when we were tried to get that
> code generic. Now that we moved the logic under arch/arm64, we can
> diverge a bit. I lost track of the original (v1/v2) proposal but I
> wonder whether we still need the fallback to high for crashkernel=Y.

I don't think anyone has raised this demand yet! If it weren't for the
fact that crashkernel=X appeared earlier, it would probably have been
enough for a combination of crashkernel=X,high and crashkernel=Y,low.

In fact, I also tend not to support "fallback to high for crashkernel=Y".
I took over this from Chen Zhou. In the absence of any objection, I had
to inherit. Now that you've brought it up, I'm happy to delete it.
Supporting this feature complicates the code logic a lot. The point is,
it's not fully backwards compatible yet. For example, someone may want
crashkernel=3G to report failure, but the the new support make it work.

> Maybe simpler, no fallbacks:
> 
> 	crashkernel=Y - keep the current behaviour, ignore high,low
> 	crashkernel=Y,high - allocate above ZONE_DMA
> 	crashkernel=Y,low - allocate within ZONE_DMA
> 
>>From your proposal, the difference is that the Y,high option won't
> have any default ZONE_DMA fallback, one would have to explicitly pass
> the Y,low option if needed.

I agree with you. Now we don't need code generic, so there is no need to
carry the historical burden of other ARCHs. arm64 does not need to delve
into that empirical value(the default size of crash low memory).

> 
> Just a thought, maybe it makes the code simpler. But I'm open to
> discussion if there are good arguments for the proposed (x86-like)
> behaviour. One argument could be for crashkernel=Y to fall back to high
> if distros don't want to bother with high/low settings.

I think distros should take precedence over "crashkernel=Y,high". After all,
ZONE_DMA memory is more valuable than high memory.


> 
> Another thing I may have asked in the past, what happens if we run a new
> kernel with these patches with old kexec user tools. I suspect the
> crashkernel=Y with the fallback to high will confuse the tools.

If crashkernel=Y can reserve the memory in Zone_DMA successfully, the old
kexec works well. But if crashkernel=Y fallback to high memory, the second
kernel will boot failed, because the old kexec can only use dtb to pass the
high memory range to the second kernel. In comparison, if no fallback, we can
see crash memory reservation failure in the first kernel, so we have a chance
to adjust Y.

Currently, the new kexec tool will pick the last memory range(sorted by address
in ascending order) to store Image,dtb,initrd.


> 
> BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the
> crashkernel above 4G. Let's get the crashkernel reservations sorted
> first, it's been around for too long.

OK, thank you. That's a good suggestion.

> 
> Thanks.
>
Catalin Marinas April 27, 2022, 12:32 p.m. UTC | #3
On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 2:02, Catalin Marinas wrote:
> > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
> >>  /*
> >>   * reserve_crashkernel() - reserves memory for crash kernel
> >>   *
> >>   * This function reserves memory area given in "crashkernel=" kernel command
> >>   * line parameter. The memory reserved is used by dump capture kernel when
> >>   * primary kernel is crashing.
> >> + *
> >> + * NOTE: Reservation of crashkernel,low is special since its existence
> >> + * is not independent, need rely on the existence of crashkernel,high.
> >> + * Here, four cases of crashkernel low memory reservation are summarized:
> >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
> >> + *    memory takes Y;
> >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> >> + *    take the default crashkernel low memory size;
> >> + * 3) crashkernel=X is specified, while fallback to get a memory region
> >> + *    in high memory, take the default crashkernel low memory size;
> >> + * 4) crashkernel='invalid value',low is specified, failed the whole
> >> + *    crashkernel reservation and bail out.
> > 
> > Following the x86 behaviour made sense when we were tried to get that
> > code generic. Now that we moved the logic under arch/arm64, we can
> > diverge a bit. I lost track of the original (v1/v2) proposal but I
> > wonder whether we still need the fallback to high for crashkernel=Y.
> 
> I don't think anyone has raised this demand yet! If it weren't for the
> fact that crashkernel=X appeared earlier, it would probably have been
> enough for a combination of crashkernel=X,high and crashkernel=Y,low.
> 
> In fact, I also tend not to support "fallback to high for crashkernel=Y".
> I took over this from Chen Zhou. In the absence of any objection, I had
> to inherit. Now that you've brought it up, I'm happy to delete it.
> Supporting this feature complicates the code logic a lot. The point is,
> it's not fully backwards compatible yet. For example, someone may want
> crashkernel=3G to report failure, but the the new support make it work.

BTW, prior to v20, this patch had this line:

	crashk_low_res.name = "Crash kernel (low)";

I can't find it anymore. Do the kexec tools need to distinguish between
low and high or they can cope with multiple "Crash kernel" entries?

> > Maybe simpler, no fallbacks:
> > 
> > 	crashkernel=Y - keep the current behaviour, ignore high,low
> > 	crashkernel=Y,high - allocate above ZONE_DMA
> > 	crashkernel=Y,low - allocate within ZONE_DMA
> > 
> > From your proposal, the difference is that the Y,high option won't
> > have any default ZONE_DMA fallback, one would have to explicitly pass
> > the Y,low option if needed.
> 
> I agree with you. Now we don't need code generic, so there is no need to
> carry the historical burden of other ARCHs. arm64 does not need to delve
> into that empirical value(the default size of crash low memory).
> 
> > Just a thought, maybe it makes the code simpler. But I'm open to
> > discussion if there are good arguments for the proposed (x86-like)
> > behaviour. One argument could be for crashkernel=Y to fall back to high
> > if distros don't want to bother with high/low settings.
> 
> I think distros should take precedence over "crashkernel=Y,high". After all,
> ZONE_DMA memory is more valuable than high memory.

My point is whether an admin configuring the kernel command line needs
to know the layout of ZONE_DMA etc. to figure out how much to pass in
high and low. The fallbacks in this case have some value but they also
complicate the code logic. The 4GB limit does not always make sense
either for some platforms (RPi4 has a ZONE_DMA limit of 1GB).

I think one could always pass a default command line like:

	crashkernel=1G,high crashkernel=128M,low

without much knowledge of the SoC memory layout.

Another option is to only introduce crashkernel=Y,low and, when that is
passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
'high' option at all:

	crashkernel=1G				- all within ZONE_DMA
	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
						  1G above ZONE_DMA

If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
the 'low' option.
Zhen Lei April 27, 2022, 1:49 p.m. UTC | #4
On 2022/4/27 20:32, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/27 2:02, Catalin Marinas wrote:
>>> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>>>>  /*
>>>>   * reserve_crashkernel() - reserves memory for crash kernel
>>>>   *
>>>>   * This function reserves memory area given in "crashkernel=" kernel command
>>>>   * line parameter. The memory reserved is used by dump capture kernel when
>>>>   * primary kernel is crashing.
>>>> + *
>>>> + * NOTE: Reservation of crashkernel,low is special since its existence
>>>> + * is not independent, need rely on the existence of crashkernel,high.
>>>> + * Here, four cases of crashkernel low memory reservation are summarized:
>>>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
>>>> + *    memory takes Y;
>>>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
>>>> + *    take the default crashkernel low memory size;
>>>> + * 3) crashkernel=X is specified, while fallback to get a memory region
>>>> + *    in high memory, take the default crashkernel low memory size;
>>>> + * 4) crashkernel='invalid value',low is specified, failed the whole
>>>> + *    crashkernel reservation and bail out.
>>>
>>> Following the x86 behaviour made sense when we were tried to get that
>>> code generic. Now that we moved the logic under arch/arm64, we can
>>> diverge a bit. I lost track of the original (v1/v2) proposal but I
>>> wonder whether we still need the fallback to high for crashkernel=Y.
>>
>> I don't think anyone has raised this demand yet! If it weren't for the
>> fact that crashkernel=X appeared earlier, it would probably have been
>> enough for a combination of crashkernel=X,high and crashkernel=Y,low.
>>
>> In fact, I also tend not to support "fallback to high for crashkernel=Y".
>> I took over this from Chen Zhou. In the absence of any objection, I had
>> to inherit. Now that you've brought it up, I'm happy to delete it.
>> Supporting this feature complicates the code logic a lot. The point is,
>> it's not fully backwards compatible yet. For example, someone may want
>> crashkernel=3G to report failure, but the the new support make it work.
> 
> BTW, prior to v20, this patch had this line:
> 
> 	crashk_low_res.name = "Crash kernel (low)";
> 
> I can't find it anymore. Do the kexec tools need to distinguish between
> low and high or they can cope with multiple "Crash kernel" entries?

Yes, I've updated the kexec tool patch based on Borislav Petkov's comments
to keep it the same as x86. And this patch has been merged into kexec v2.0.24.
b5a34a2 arm64: support more than one crash kernel regions

The kexec tool will first sorted all "Crash kernel" memory range in ascending
order, then choose the last one (the memory range with the highest address) to
store the second kernel's Image,dtb,initrd.

> 
>>> Maybe simpler, no fallbacks:
>>>
>>> 	crashkernel=Y - keep the current behaviour, ignore high,low
>>> 	crashkernel=Y,high - allocate above ZONE_DMA
>>> 	crashkernel=Y,low - allocate within ZONE_DMA
>>>
>>> From your proposal, the difference is that the Y,high option won't
>>> have any default ZONE_DMA fallback, one would have to explicitly pass
>>> the Y,low option if needed.
>>
>> I agree with you. Now we don't need code generic, so there is no need to
>> carry the historical burden of other ARCHs. arm64 does not need to delve
>> into that empirical value(the default size of crash low memory).
>>
>>> Just a thought, maybe it makes the code simpler. But I'm open to
>>> discussion if there are good arguments for the proposed (x86-like)
>>> behaviour. One argument could be for crashkernel=Y to fall back to high
>>> if distros don't want to bother with high/low settings.
>>
>> I think distros should take precedence over "crashkernel=Y,high". After all,
>> ZONE_DMA memory is more valuable than high memory.
> 
> My point is whether an admin configuring the kernel command line needs
> to know the layout of ZONE_DMA etc. to figure out how much to pass in

No need.

> high and low. The fallbacks in this case have some value but they also
> complicate the code logic. The 4GB limit does not always make sense
> either for some platforms (RPi4 has a ZONE_DMA limit of 1GB).
> 
> I think one could always pass a default command line like:
> 
> 	crashkernel=1G,high crashkernel=128M,low
> 
> without much knowledge of the SoC memory layout.

Yes, that's what the end result is. The user specify crashkernel=128M,low
and the implementation ensure the 128M low memory is allocated from DMA zone.
We use arm64_dma_phys_limit as the upper limit for crash low memory.

+#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
+       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
                                               crash_base, crash_max);

> 
> Another option is to only introduce crashkernel=Y,low and, when that is
> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> 'high' option at all:
> 
> 	crashkernel=1G				- all within ZONE_DMA
> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> 						  1G above ZONE_DMA
> 
> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> the 'low' option.

I think although the code is hard to make generic, the interface is better to
be relatively uniform. A user might have to maintain both x86 and arm64, and
so on. It's not a good thing that the difference is too big.

A well had already been dug by the forefathers, and it seemed unnecessary to
dig another well beside it if there was no particular advantage.

>
Catalin Marinas April 27, 2022, 4:04 p.m. UTC | #5
On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 20:32, Catalin Marinas wrote:
> > I think one could always pass a default command line like:
> > 
> > 	crashkernel=1G,high crashkernel=128M,low
> > 
> > without much knowledge of the SoC memory layout.
> 
> Yes, that's what the end result is. The user specify crashkernel=128M,low
> and the implementation ensure the 128M low memory is allocated from DMA zone.
> We use arm64_dma_phys_limit as the upper limit for crash low memory.
> 
> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>                                                crash_base, crash_max);
> 
> > Another option is to only introduce crashkernel=Y,low and, when that is
> > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > 'high' option at all:
> > 
> > 	crashkernel=1G				- all within ZONE_DMA
> > 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> > 						  1G above ZONE_DMA
> > 
> > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > the 'low' option.
> 
> I think although the code is hard to make generic, the interface is better to
> be relatively uniform. A user might have to maintain both x86 and arm64, and
> so on. It's not a good thing that the difference is too big.

There will be some difference as the 4G limit doesn't always hold for
arm64 (though it's true in most cases). Anyway, we can probably simplify
things a bit while following the documented behaviour:

	crashkernel=Y		- current behaviour within ZONE_DMA
	crashkernel=Y,high	- allocate from above ZONE_DMA
	crashkernel=Y,low	- allocate within ZONE_DMA

There is no fallback from crashkernel=Y.

The question is whether we still want a default low allocation if
crashkernel=Y,low is missing but 'high' is present. If we add this, I
think we'd be consistent with kernel-parameters.txt for the 'low'
description. A default 'low' is probably not that bad but I'm tempted to
always mandate both 'high' and 'low'.
Zhen Lei April 28, 2022, 2:22 a.m. UTC | #6
On 2022/4/28 0:04, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>> I think one could always pass a default command line like:
>>>
>>> 	crashkernel=1G,high crashkernel=128M,low
>>>
>>> without much knowledge of the SoC memory layout.
>>
>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>
>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>                                                crash_base, crash_max);
>>
>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>> 'high' option at all:
>>>
>>> 	crashkernel=1G				- all within ZONE_DMA
>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>> 						  1G above ZONE_DMA
>>>
>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>> the 'low' option.
>>
>> I think although the code is hard to make generic, the interface is better to
>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>> so on. It's not a good thing that the difference is too big.
> 
> There will be some difference as the 4G limit doesn't always hold for
> arm64 (though it's true in most cases). Anyway, we can probably simplify
> things a bit while following the documented behaviour:
> 
> 	crashkernel=Y		- current behaviour within ZONE_DMA
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> There is no fallback from crashkernel=Y.

Yes, I followed your guidelines yesterday to modify the code. Now the code flow
is much clearer.

> 
> The question is whether we still want a default low allocation if
> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> think we'd be consistent with kernel-parameters.txt for the 'low'
> description. A default 'low' is probably not that bad but I'm tempted to
> always mandate both 'high' and 'low'.

Yes, I agree with you. Because the situation is complicated, the default value
is hard to be accurate. It's better to let the user configure it according to
the actual situation, they're also programmers.

Whether mandate both 'high' and 'low', or allow only 'high' like x86(but the default
value becomes zero). I prefer the latter. The size of 'low' maybe zero, for example,
SMMU is enabled on the second kernel. If only high memory is required, only that
high memory needs to be configured, seems more reasonable.

>
Baoquan He April 28, 2022, 3:40 a.m. UTC | #7
Hi Catalin, Zhen Lei,

On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> > On 2022/4/27 20:32, Catalin Marinas wrote:
> > > I think one could always pass a default command line like:
> > > 
> > > 	crashkernel=1G,high crashkernel=128M,low
> > > 
> > > without much knowledge of the SoC memory layout.
> > 
> > Yes, that's what the end result is. The user specify crashkernel=128M,low
> > and the implementation ensure the 128M low memory is allocated from DMA zone.
> > We use arm64_dma_phys_limit as the upper limit for crash low memory.
> > 
> > +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> > +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >                                                crash_base, crash_max);
> > 
> > > Another option is to only introduce crashkernel=Y,low and, when that is
> > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > > 'high' option at all:
> > > 
> > > 	crashkernel=1G				- all within ZONE_DMA
> > > 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> > > 						  1G above ZONE_DMA
> > > 
> > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > > the 'low' option.
> > 
> > I think although the code is hard to make generic, the interface is better to
> > be relatively uniform. A user might have to maintain both x86 and arm64, and
> > so on. It's not a good thing that the difference is too big.
> 
> There will be some difference as the 4G limit doesn't always hold for
> arm64 (though it's true in most cases). Anyway, we can probably simplify
> things a bit while following the documented behaviour:
> 
> 	crashkernel=Y		- current behaviour within ZONE_DMA
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> There is no fallback from crashkernel=Y.
> 
> The question is whether we still want a default low allocation if
> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> think we'd be consistent with kernel-parameters.txt for the 'low'
> description. A default 'low' is probably not that bad but I'm tempted to
> always mandate both 'high' and 'low'.

Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
about this version. And I have the same concerns about them which comes
from below points:
1) we may need to take best effort to keep ,high, ,low behaviour
consistent on all ARCHes. Otherwise user/admin may be confused when they
deploy/configure kdump on different machines of different ARCHes in the
same LAB. I think we should try to avoid the confusion.
2) Fallback behaviour is important to our distros. The reason is we will
provide default value with crashkernel=xxxM along kernel of distros. In
this case, we hope the reservation will succeed by all means. The ,high
and ,low is an option if customer likes to take with expertise.

After going through arm64 memory init code, I got below summary about
arm64_dma_phys_limit which is the first zone's upper limit. I think we
can make use of it to facilitate to simplify code.
================================================================================
                        DMA                      DMA32                    NORMAL
1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
2)Normal machine        0~4G                     0                        (above 4G)
3)Special machine       (above 4G)~MAX
4)No DMA|DMA32                                                            (above 4G)~MAX

-------------------------------------------
                      arm64_dma_phys_limit
1)Raspberry Pi4         1G                     
2)Normal machine        4G                     
3)Special machine       MAX
4)No DMA|DMA32          MAX

Note: 3)Special machine means the machine's starting physical address is above 4G.
WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
IOMMU hardware supporting.
===================================================================================

I made a draft patch based on this patchset, please feel free to check and
see if it's OK, or anything missing or wrongly understood. I removed
reserve_crashkernel_high() and only keep reserve_crashkernel() and
reserve_crashkernel_low() as the v21 did.

Thanks
Baoquan
Baoquan He April 28, 2022, 3:52 a.m. UTC | #8
On 04/28/22 at 11:40am, Baoquan He wrote:
> Hi Catalin, Zhen Lei,
> 
> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> > > On 2022/4/27 20:32, Catalin Marinas wrote:
> > > > I think one could always pass a default command line like:
> > > > 
> > > > 	crashkernel=1G,high crashkernel=128M,low
> > > > 
> > > > without much knowledge of the SoC memory layout.
> > > 
> > > Yes, that's what the end result is. The user specify crashkernel=128M,low
> > > and the implementation ensure the 128M low memory is allocated from DMA zone.
> > > We use arm64_dma_phys_limit as the upper limit for crash low memory.
> > > 
> > > +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> > > +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > > +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> > >                                                crash_base, crash_max);
> > > 
> > > > Another option is to only introduce crashkernel=Y,low and, when that is
> > > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > > > 'high' option at all:
> > > > 
> > > > 	crashkernel=1G				- all within ZONE_DMA
> > > > 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> > > > 						  1G above ZONE_DMA
> > > > 
> > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > > > the 'low' option.
> > > 
> > > I think although the code is hard to make generic, the interface is better to
> > > be relatively uniform. A user might have to maintain both x86 and arm64, and
> > > so on. It's not a good thing that the difference is too big.
> > 
> > There will be some difference as the 4G limit doesn't always hold for
> > arm64 (though it's true in most cases). Anyway, we can probably simplify
> > things a bit while following the documented behaviour:
> > 
> > 	crashkernel=Y		- current behaviour within ZONE_DMA
> > 	crashkernel=Y,high	- allocate from above ZONE_DMA
> > 	crashkernel=Y,low	- allocate within ZONE_DMA
> > 
> > There is no fallback from crashkernel=Y.
> > 
> > The question is whether we still want a default low allocation if
> > crashkernel=Y,low is missing but 'high' is present. If we add this, I
> > think we'd be consistent with kernel-parameters.txt for the 'low'
> > description. A default 'low' is probably not that bad but I'm tempted to
> > always mandate both 'high' and 'low'.
> 
> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> about this version. And I have the same concerns about them which comes
> from below points:
> 1) we may need to take best effort to keep ,high, ,low behaviour
> consistent on all ARCHes. Otherwise user/admin may be confused when they
> deploy/configure kdump on different machines of different ARCHes in the
> same LAB. I think we should try to avoid the confusion.
> 2) Fallback behaviour is important to our distros. The reason is we will
> provide default value with crashkernel=xxxM along kernel of distros. In
> this case, we hope the reservation will succeed by all means. The ,high
> and ,low is an option if customer likes to take with expertise.
> 
> After going through arm64 memory init code, I got below summary about
> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> can make use of it to facilitate to simplify code.
> ================================================================================
>                         DMA                      DMA32                    NORMAL
> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> 2)Normal machine        0~4G                     0                        (above 4G)
> 3)Special machine       (above 4G)~MAX
> 4)No DMA|DMA32                                                            (above 4G)~MAX
> 
> -------------------------------------------
>                       arm64_dma_phys_limit
> 1)Raspberry Pi4         1G                     
> 2)Normal machine        4G                     
> 3)Special machine       MAX
> 4)No DMA|DMA32          MAX
> 
> Note: 3)Special machine means the machine's starting physical address is above 4G.
> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
> IOMMU hardware supporting.
> ===================================================================================
> 
> I made a draft patch based on this patchset, please feel free to check and
> see if it's OK, or anything missing or wrongly understood. I removed
> reserve_crashkernel_high() and only keep reserve_crashkernel() and
> reserve_crashkernel_low() as the v21 did.

Sorry, forgot attaching the draft patch.

By the way, we can also have a simple version with basic ,high, ,low
support, no fallback. We can add fallback and other optimization later.
This can be plan B.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 4a8200f29b35..aa1fbea47c46 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -84,11 +84,7 @@ EXPORT_SYMBOL(memstart_addr);
  * Note: Page-granularity mappings are necessary for crash kernel memory
  * range for shrinking its size via /sys/kernel/kexec_crash_size interface.
  */
-#if IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)
 phys_addr_t __ro_after_init arm64_dma_phys_limit;
-#else
-phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
-#endif
 
 bool crash_low_mem_page_map __initdata;
 static bool crash_high_mem_reserved __initdata;
@@ -132,63 +128,6 @@ static int __init reserve_crashkernel_low(unsigned long long low_size)
 	return 0;
 }
 
-static void __init reserve_crashkernel_high(void)
-{
-	unsigned long long crash_base, crash_size;
-	char *cmdline = boot_command_line;
-	int ret;
-
-	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
-		return;
-
-	/* crashkernel=X[@offset] */
-	ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
-				&crash_size, &crash_base);
-	if (ret || !crash_size) {
-		ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base);
-		if (ret || !crash_size)
-			return;
-	} else if (!crash_base) {
-		crash_low_mem_page_map = true;
-	}
-
-	crash_size = PAGE_ALIGN(crash_size);
-
-	/*
-	 * For the case crashkernel=X, may fall back to reserve memory above
-	 * 4G, make reservations here in advance. It will be released later if
-	 * the region is successfully reserved under 4G.
-	 */
-	if (!crash_base) {
-		crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-						       crash_base, CRASH_ADDR_HIGH_MAX);
-		if (!crash_base)
-			return;
-
-		crash_high_mem_reserved = true;
-	}
-
-	/* Mark the memory range that requires page-level mappings */
-	crashk_res.start = crash_base;
-	crashk_res.end   = crash_base + crash_size - 1;
-}
-
-static void __init hand_over_reserved_high_mem(void)
-{
-	crashk_res_high.start = crashk_res.start;
-	crashk_res_high.end   = crashk_res.end;
-
-	crashk_res.start = 0;
-	crashk_res.end   = 0;
-}
-
-static void __init take_reserved_high_mem(unsigned long long *crash_base,
-					  unsigned long long *crash_size)
-{
-	*crash_base = crashk_res_high.start;
-	*crash_size = resource_size(&crashk_res_high);
-}
-
 static void __init free_reserved_high_mem(void)
 {
 	memblock_phys_free(crashk_res_high.start, resource_size(&crashk_res_high));
@@ -225,7 +164,8 @@ static void __init reserve_crashkernel(void)
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
 
-	hand_over_reserved_high_mem();
+	if (crashk_res.end)
+		return;
 
 	/* crashkernel=X[@offset] */
 	ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
@@ -245,11 +185,6 @@ static void __init reserve_crashkernel(void)
 
 		high = true;
 		crash_max = CRASH_ADDR_HIGH_MAX;
-
-		if (crash_high_mem_reserved) {
-			take_reserved_high_mem(&crash_base, &crash_size);
-			goto reserve_low;
-		}
 	}
 
 	fixed_base = !!crash_base;
@@ -267,12 +202,8 @@ static void __init reserve_crashkernel(void)
 		 * to high memory, the minimum required low memory will be
 		 * reserved later.
 		 */
-		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
-			if (crash_high_mem_reserved) {
-				take_reserved_high_mem(&crash_base, &crash_size);
-				goto reserve_low;
-			}
-
+		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX
+		    && crash_max <CRASH_ADDR_HIGH_MAX)) {
 			crash_max = CRASH_ADDR_HIGH_MAX;
 			goto retry;
 		}
@@ -289,7 +220,7 @@ static void __init reserve_crashkernel(void)
 	 * description of "crashkernel=X,high" option, add below 'high'
 	 * condition to make sure the crash low memory will be reserved.
 	 */
-	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
+	if (crash_base >= CRASH_ADDR_LOW_MAX ) {
 reserve_low:
 		/* case #3 of crashkernel,low reservation */
 		if (!high)
@@ -299,14 +230,7 @@ static void __init reserve_crashkernel(void)
 			memblock_phys_free(crash_base, crash_size);
 			return;
 		}
-	} else if (crash_high_mem_reserved) {
-		/*
-		 * The crash memory is successfully allocated under 4G, and the
-		 * previously reserved high memory is no longer required.
-		 */
-		free_reserved_high_mem();
 	}
-
 	pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
 		crash_base, crash_base + crash_size, crash_size >> 20);
 
@@ -520,10 +444,10 @@ void __init arm64_memblock_init(void)
 
 	early_init_fdt_scan_reserved_mem();
 
+	arm64_dma_phys_limit = memblock_end_of_DRAM();
+
 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
 		reserve_crashkernel();
-	else
-		reserve_crashkernel_high();
 
 	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 }
Zhen Lei April 28, 2022, 9:33 a.m. UTC | #9
On 2022/4/28 11:52, Baoquan He wrote:
> On 04/28/22 at 11:40am, Baoquan He wrote:
>> Hi Catalin, Zhen Lei,
>>
>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>> I think one could always pass a default command line like:
>>>>>
>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>
>>>>> without much knowledge of the SoC memory layout.
>>>>
>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>
>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>                                                crash_base, crash_max);
>>>>
>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>> 'high' option at all:
>>>>>
>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>> 						  1G above ZONE_DMA
>>>>>
>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>> the 'low' option.
>>>>
>>>> I think although the code is hard to make generic, the interface is better to
>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>> so on. It's not a good thing that the difference is too big.
>>>
>>> There will be some difference as the 4G limit doesn't always hold for
>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>> things a bit while following the documented behaviour:
>>>
>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>
>>> There is no fallback from crashkernel=Y.
>>>
>>> The question is whether we still want a default low allocation if
>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>> description. A default 'low' is probably not that bad but I'm tempted to
>>> always mandate both 'high' and 'low'.
>>
>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>> about this version. And I have the same concerns about them which comes
>> from below points:
>> 1) we may need to take best effort to keep ,high, ,low behaviour
>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>> deploy/configure kdump on different machines of different ARCHes in the
>> same LAB. I think we should try to avoid the confusion.

Yes, but for someone who is configuring crashkernel= for the first time, he
needs to read doc to understand how to configure it. The doc can show the
recommended default value of 'low' size.

After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.


+                * -swiotlb size: user-specified with swiotlb= or default.
-               low_size = swiotlb_size_or_default() + (8UL<<20);
+               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);

That means all ARCHs can explicit configure crashkernel=256M,low, instead of
omitting it. This may be another way to avoid confusion. It's not hard for
programmer-turned-user/admin. However, this requires us to forgo backward
compatibility with the default size of 'low'.


>> 2) Fallback behaviour is important to our distros. The reason is we will
>> provide default value with crashkernel=xxxM along kernel of distros. In
>> this case, we hope the reservation will succeed by all means. The ,high
>> and ,low is an option if customer likes to take with expertise.

OK, I got it.

>>
>> After going through arm64 memory init code, I got below summary about
>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>> can make use of it to facilitate to simplify code.
>> ================================================================================
>>                         DMA                      DMA32                    NORMAL
>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>> 2)Normal machine        0~4G                     0                        (above 4G)
>> 3)Special machine       (above 4G)~MAX
>> 4)No DMA|DMA32                                                            (above 4G)~MAX

arm64_memblock_init()
	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
paging_init()                                       |
	map_mem()                                   |
unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
bootmem_init();                                     |      |
	zone_sizes_init()                           |      |
		of_dma_get_max_cpu_address          |  ----|
		//Update arm64_dma_phys_limit       |  ----|
	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
request_standard_resources()

>>
>> -------------------------------------------
>>                       arm64_dma_phys_limit
>> 1)Raspberry Pi4         1G                     
>> 2)Normal machine        4G                     
>> 3)Special machine       MAX
>> 4)No DMA|DMA32          MAX
>>
>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>> IOMMU hardware supporting.
>> ===================================================================================
>>
>> I made a draft patch based on this patchset, please feel free to check and
>> see if it's OK, or anything missing or wrongly understood. I removed
>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>> reserve_crashkernel_low() as the v21 did.
> 
> Sorry, forgot attaching the draft patch.
> 
> By the way, we can also have a simple version with basic ,high, ,low
> support, no fallback. We can add fallback and other optimization later.
> This can be plan B.

Yes, That's what Catalin suggested also.

Hi, Baoquan He:
  Without optimization, the whole Patch 3-4 and 6-7 can be dropped.

Process after abstraction:
	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
		reserve_crashkernel()
		//block mapping
	} else {
		//page mapping
		reserve_crashkernel()
	}

------------ Simplified real-world process ---------
arm64_memblock_init()
	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
		reserve_crashkernel()
paging_init()
	map_mem()
		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
			//block mapping
		else
			//page mapping
unflatten_device_tree or ACPI
bootmem_init();
	zone_sizes_init()
		of_dma_get_max_cpu_address
		//Update arm64_dma_phys_limit
	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
		reserve_crashkernel()


> 
>
Baoquan He April 29, 2022, 3:24 a.m. UTC | #10
On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/4/28 11:52, Baoquan He wrote:
> > On 04/28/22 at 11:40am, Baoquan He wrote:
> >> Hi Catalin, Zhen Lei,
> >>
> >> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> >>>> On 2022/4/27 20:32, Catalin Marinas wrote:
> >>>>> I think one could always pass a default command line like:
> >>>>>
> >>>>> 	crashkernel=1G,high crashkernel=128M,low
> >>>>>
> >>>>> without much knowledge of the SoC memory layout.
> >>>>
> >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
> >>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
> >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
> >>>>
> >>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> >>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> >>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >>>>                                                crash_base, crash_max);
> >>>>
> >>>>> Another option is to only introduce crashkernel=Y,low and, when that is
> >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> >>>>> 'high' option at all:
> >>>>>
> >>>>> 	crashkernel=1G				- all within ZONE_DMA
> >>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> >>>>> 						  1G above ZONE_DMA
> >>>>>
> >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> >>>>> the 'low' option.
> >>>>
> >>>> I think although the code is hard to make generic, the interface is better to
> >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
> >>>> so on. It's not a good thing that the difference is too big.
> >>>
> >>> There will be some difference as the 4G limit doesn't always hold for
> >>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> >>> things a bit while following the documented behaviour:
> >>>
> >>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> >>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> >>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> >>>
> >>> There is no fallback from crashkernel=Y.
> >>>
> >>> The question is whether we still want a default low allocation if
> >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> >>> think we'd be consistent with kernel-parameters.txt for the 'low'
> >>> description. A default 'low' is probably not that bad but I'm tempted to
> >>> always mandate both 'high' and 'low'.
> >>
> >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> >> about this version. And I have the same concerns about them which comes
> >> from below points:
> >> 1) we may need to take best effort to keep ,high, ,low behaviour
> >> consistent on all ARCHes. Otherwise user/admin may be confused when they
> >> deploy/configure kdump on different machines of different ARCHes in the
> >> same LAB. I think we should try to avoid the confusion.
> 
> Yes, but for someone who is configuring crashkernel= for the first time, he
> needs to read doc to understand how to configure it. The doc can show the
> recommended default value of 'low' size.
> 
> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
> 
> 
> +                * -swiotlb size: user-specified with swiotlb= or default.
> -               low_size = swiotlb_size_or_default() + (8UL<<20);
> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
> 
> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
> omitting it. This may be another way to avoid confusion. It's not hard for
> programmer-turned-user/admin. However, this requires us to forgo backward
> compatibility with the default size of 'low'.

We can make ,high and ,low simpler at first as they are alternative. If
possible, we can also simplify the ,high ,low implementation on x86_64
if it truly brings better archievement on arm64.

> 
> 
> >> 2) Fallback behaviour is important to our distros. The reason is we will
> >> provide default value with crashkernel=xxxM along kernel of distros. In
> >> this case, we hope the reservation will succeed by all means. The ,high
> >> and ,low is an option if customer likes to take with expertise.
> 
> OK, I got it.
> 
> >>
> >> After going through arm64 memory init code, I got below summary about
> >> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> >> can make use of it to facilitate to simplify code.
> >> ================================================================================
> >>                         DMA                      DMA32                    NORMAL
> >> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> >> 2)Normal machine        0~4G                     0                        (above 4G)
> >> 3)Special machine       (above 4G)~MAX
> >> 4)No DMA|DMA32                                                            (above 4G)~MAX
> 
> arm64_memblock_init()
> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
We don't need different code for this place of reservation as you are
doing in this patchset, since arm64_dma_phys_limit is initialized as 
below. In fact, in arm64_memblock_init(), we have made memblock ready,
we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
memblock_start_of_DRAM() is bigger than 4G, we possibly can call
reserve_crashkernel() here too.

phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;

> paging_init()                                       |
> 	map_mem()                                   |
> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
> bootmem_init();                                     |      |
> 	zone_sizes_init()                           |      |
> 		of_dma_get_max_cpu_address          |  ----|
> 		//Update arm64_dma_phys_limit       |  ----|
> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
> request_standard_resources()

Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
about BLOCKING may not be right since you assume the 4G boundary, while
forgetting Raspberry Pi4 on which 1G is the boundary of low memory and
high memory. So separating out BLOCKING optimization can let us focus on
the crashkernel,high support.

> 
> >>
> >> -------------------------------------------
> >>                       arm64_dma_phys_limit
> >> 1)Raspberry Pi4         1G                     
> >> 2)Normal machine        4G                     
> >> 3)Special machine       MAX
> >> 4)No DMA|DMA32          MAX
> >>
> >> Note: 3)Special machine means the machine's starting physical address is above 4G.
> >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
> >> IOMMU hardware supporting.
> >> ===================================================================================
> >>
> >> I made a draft patch based on this patchset, please feel free to check and
> >> see if it's OK, or anything missing or wrongly understood. I removed
> >> reserve_crashkernel_high() and only keep reserve_crashkernel() and
> >> reserve_crashkernel_low() as the v21 did.
> > 
> > Sorry, forgot attaching the draft patch.
> > 
> > By the way, we can also have a simple version with basic ,high, ,low
> > support, no fallback. We can add fallback and other optimization later.
> > This can be plan B.
> 
> Yes, That's what Catalin suggested also.
> 
> Hi, Baoquan He:
>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
> 
> Process after abstraction:
> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
> 		reserve_crashkernel()
> 		//block mapping
> 	} else {
> 		//page mapping
> 		reserve_crashkernel()
> 	}
> 
> ------------ Simplified real-world process ---------
Yeah, this looks clearer. I would like to see a version with them.

> arm64_memblock_init()
        Before reserve_crashkernel(), we can update arm64_dma_phys_limit
as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
memblock_start_of_DRAM() is bigger than 4G.
        Then we go with:
        if (!arm64_dma_phys_limit)
		reserve_crashkernel();

Just personal opinion, please check if it's appropriate to handle case
3) which has physical starting memory above 4G here. 

> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
> 		reserve_crashkernel()
           
> paging_init()
> 	map_mem()
> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
> 			//block mapping
> 		else
> 			//page mapping
> unflatten_device_tree or ACPI
> bootmem_init();
> 	zone_sizes_init()
> 		of_dma_get_max_cpu_address
> 		//Update arm64_dma_phys_limit
> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
> 		reserve_crashkernel()

The rest sounds good with optimization code split out.
Zhen Lei April 29, 2022, 8:02 a.m. UTC | #11
On 2022/4/29 11:24, Baoquan He wrote:
> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/4/28 11:52, Baoquan He wrote:
>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>> Hi Catalin, Zhen Lei,
>>>>
>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>>>> I think one could always pass a default command line like:
>>>>>>>
>>>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>>>
>>>>>>> without much knowledge of the SoC memory layout.
>>>>>>
>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>>>
>>>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>>>                                                crash_base, crash_max);
>>>>>>
>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>>>> 'high' option at all:
>>>>>>>
>>>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>>>> 						  1G above ZONE_DMA
>>>>>>>
>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>>>> the 'low' option.
>>>>>>
>>>>>> I think although the code is hard to make generic, the interface is better to
>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>>>> so on. It's not a good thing that the difference is too big.
>>>>>
>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>> things a bit while following the documented behaviour:
>>>>>
>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>>>
>>>>> There is no fallback from crashkernel=Y.
>>>>>
>>>>> The question is whether we still want a default low allocation if
>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>>>> description. A default 'low' is probably not that bad but I'm tempted to
>>>>> always mandate both 'high' and 'low'.
>>>>
>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>> about this version. And I have the same concerns about them which comes
>>>> from below points:
>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>> same LAB. I think we should try to avoid the confusion.
>>
>> Yes, but for someone who is configuring crashkernel= for the first time, he
>> needs to read doc to understand how to configure it. The doc can show the
>> recommended default value of 'low' size.
>>
>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
>>
>>
>> +                * -swiotlb size: user-specified with swiotlb= or default.
>> -               low_size = swiotlb_size_or_default() + (8UL<<20);
>> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
>>
>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
>> omitting it. This may be another way to avoid confusion. It's not hard for
>> programmer-turned-user/admin. However, this requires us to forgo backward
>> compatibility with the default size of 'low'.
> 
> We can make ,high and ,low simpler at first as they are alternative. If
> possible, we can also simplify the ,high ,low implementation on x86_64
> if it truly brings better archievement on arm64.

OK, I plan to remove optimization, fallback and default low size, to follow the
suggestion of Catalin first. But there's one minor point of contention.

1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
   or
      Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.

I prefer 2), how about you?

> 
>>
>>
>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>> and ,low is an option if customer likes to take with expertise.
>>
>> OK, I got it.
>>
>>>>
>>>> After going through arm64 memory init code, I got below summary about
>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>> can make use of it to facilitate to simplify code.
>>>> ================================================================================
>>>>                         DMA                      DMA32                    NORMAL
>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>> 3)Special machine       (above 4G)~MAX
>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>
>> arm64_memblock_init()
>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> We don't need different code for this place of reservation as you are
> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> reserve_crashkernel() here too.

Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing
32-bit devices to access high memory without SMMU is: Set a fixed value for the upper
32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G).

> 
> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
> 
>> paging_init()                                       |
>> 	map_mem()                                   |
>> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
>> bootmem_init();                                     |      |
>> 	zone_sizes_init()                           |      |
>> 		of_dma_get_max_cpu_address          |  ----|
>> 		//Update arm64_dma_phys_limit       |  ----|
>> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
>> request_standard_resources()
> 
> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
> about BLOCKING may not be right since you assume the 4G boundary, while
> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and

No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory
under 4G and block mapping for memory above 4G. But still try to reserve crash
low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized,
the memory 1-4G are mapped as page.

> high memory. So separating out BLOCKING optimization can let us focus on
> the crashkernel,high support.
> 
>>
>>>>
>>>> -------------------------------------------
>>>>                       arm64_dma_phys_limit
>>>> 1)Raspberry Pi4         1G                     
>>>> 2)Normal machine        4G                     
>>>> 3)Special machine       MAX
>>>> 4)No DMA|DMA32          MAX
>>>>
>>>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>>>> IOMMU hardware supporting.
>>>> ===================================================================================
>>>>
>>>> I made a draft patch based on this patchset, please feel free to check and
>>>> see if it's OK, or anything missing or wrongly understood. I removed
>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>>>> reserve_crashkernel_low() as the v21 did.
>>>
>>> Sorry, forgot attaching the draft patch.
>>>
>>> By the way, we can also have a simple version with basic ,high, ,low
>>> support, no fallback. We can add fallback and other optimization later.
>>> This can be plan B.
>>
>> Yes, That's what Catalin suggested also.
>>
>> Hi, Baoquan He:
>>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
>>
>> Process after abstraction:
>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
>> 		reserve_crashkernel()
>> 		//block mapping
>> 	} else {
>> 		//page mapping
>> 		reserve_crashkernel()
>> 	}
>>
>> ------------ Simplified real-world process ---------
> Yeah, this looks clearer. I would like to see a version with them.
> 
>> arm64_memblock_init()
>         Before reserve_crashkernel(), we can update arm64_dma_phys_limit
> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
> memblock_start_of_DRAM() is bigger than 4G.
>         Then we go with:
>         if (!arm64_dma_phys_limit)
> 		reserve_crashkernel();
> 
> Just personal opinion, please check if it's appropriate to handle case
> 3) which has physical starting memory above 4G here. 

OK, I'll write it down and consider it in the future optimization.

> 
>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>> 		reserve_crashkernel()
>            
>> paging_init()
>> 	map_mem()
>> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>> 			//block mapping
>> 		else
>> 			//page mapping
>> unflatten_device_tree or ACPI
>> bootmem_init();
>> 	zone_sizes_init()
>> 		of_dma_get_max_cpu_address
>> 		//Update arm64_dma_phys_limit
>> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
>> 		reserve_crashkernel()
> 
> The rest sounds good with optimization code split out.
> 
> .
>
Zhen Lei April 29, 2022, 8:25 a.m. UTC | #12
On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/4/29 11:24, Baoquan He wrote:
>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/4/28 11:52, Baoquan He wrote:
>>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>>> Hi Catalin, Zhen Lei,
>>>>>
>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>>>>> I think one could always pass a default command line like:
>>>>>>>>
>>>>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>>>>
>>>>>>>> without much knowledge of the SoC memory layout.
>>>>>>>
>>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>>>>
>>>>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>>>>                                                crash_base, crash_max);
>>>>>>>
>>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>>>>> 'high' option at all:
>>>>>>>>
>>>>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>>>>> 						  1G above ZONE_DMA
>>>>>>>>
>>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>>>>> the 'low' option.
>>>>>>>
>>>>>>> I think although the code is hard to make generic, the interface is better to
>>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>>>>> so on. It's not a good thing that the difference is too big.
>>>>>>
>>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>>> things a bit while following the documented behaviour:
>>>>>>
>>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>>>>
>>>>>> There is no fallback from crashkernel=Y.
>>>>>>
>>>>>> The question is whether we still want a default low allocation if
>>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>>>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>>>>> description. A default 'low' is probably not that bad but I'm tempted to
>>>>>> always mandate both 'high' and 'low'.
>>>>>
>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>>> about this version. And I have the same concerns about them which comes
>>>>> from below points:
>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>>> same LAB. I think we should try to avoid the confusion.
>>>
>>> Yes, but for someone who is configuring crashkernel= for the first time, he
>>> needs to read doc to understand how to configure it. The doc can show the
>>> recommended default value of 'low' size.
>>>
>>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
>>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
>>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
>>>
>>>
>>> +                * -swiotlb size: user-specified with swiotlb= or default.
>>> -               low_size = swiotlb_size_or_default() + (8UL<<20);
>>> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
>>>
>>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
>>> omitting it. This may be another way to avoid confusion. It's not hard for
>>> programmer-turned-user/admin. However, this requires us to forgo backward
>>> compatibility with the default size of 'low'.
>>
>> We can make ,high and ,low simpler at first as they are alternative. If
>> possible, we can also simplify the ,high ,low implementation on x86_64
>> if it truly brings better archievement on arm64.
> 
> OK, I plan to remove optimization, fallback and default low size, to follow the
> suggestion of Catalin first. But there's one minor point of contention.
> 
> 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
>    or
>       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> 
> I prefer 2), how about you?
> 
>>
>>>
>>>
>>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>>> and ,low is an option if customer likes to take with expertise.
>>>
>>> OK, I got it.
>>>
>>>>>
>>>>> After going through arm64 memory init code, I got below summary about
>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>>> can make use of it to facilitate to simplify code.
>>>>> ================================================================================
>>>>>                         DMA                      DMA32                    NORMAL
>>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>>> 3)Special machine       (above 4G)~MAX
>>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>>
>>> arm64_memblock_init()
>>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
>> We don't need different code for this place of reservation as you are
>> doing in this patchset, since arm64_dma_phys_limit is initialized as 
>> below. In fact, in arm64_memblock_init(), we have made memblock ready,
>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
>> reserve_crashkernel() here too.
> 
> Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing
> 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper
> 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G).

I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation")

    Currently, the kernel assumes that if RAM starts above 32-bit (or
    zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
    such constrained devices have a hardwired DMA offset. In practice, we
    haven't noticed any such hardware so let's assume that we can expand
    ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
    ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
    zone_bits.

So your suggestion is feasible.

> 
>>
>> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
>>
>>> paging_init()                                       |
>>> 	map_mem()                                   |
>>> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
>>> bootmem_init();                                     |      |
>>> 	zone_sizes_init()                           |      |
>>> 		of_dma_get_max_cpu_address          |  ----|
>>> 		//Update arm64_dma_phys_limit       |  ----|
>>> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
>>> request_standard_resources()
>>
>> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
>> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
>> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
>> about BLOCKING may not be right since you assume the 4G boundary, while
>> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and
> 
> No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory
> under 4G and block mapping for memory above 4G. But still try to reserve crash
> low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized,
> the memory 1-4G are mapped as page.
> 
>> high memory. So separating out BLOCKING optimization can let us focus on
>> the crashkernel,high support.
>>
>>>
>>>>>
>>>>> -------------------------------------------
>>>>>                       arm64_dma_phys_limit
>>>>> 1)Raspberry Pi4         1G                     
>>>>> 2)Normal machine        4G                     
>>>>> 3)Special machine       MAX
>>>>> 4)No DMA|DMA32          MAX
>>>>>
>>>>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>>>>> IOMMU hardware supporting.
>>>>> ===================================================================================
>>>>>
>>>>> I made a draft patch based on this patchset, please feel free to check and
>>>>> see if it's OK, or anything missing or wrongly understood. I removed
>>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>>>>> reserve_crashkernel_low() as the v21 did.
>>>>
>>>> Sorry, forgot attaching the draft patch.
>>>>
>>>> By the way, we can also have a simple version with basic ,high, ,low
>>>> support, no fallback. We can add fallback and other optimization later.
>>>> This can be plan B.
>>>
>>> Yes, That's what Catalin suggested also.
>>>
>>> Hi, Baoquan He:
>>>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
>>>
>>> Process after abstraction:
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
>>> 		reserve_crashkernel()
>>> 		//block mapping
>>> 	} else {
>>> 		//page mapping
>>> 		reserve_crashkernel()
>>> 	}
>>>
>>> ------------ Simplified real-world process ---------
>> Yeah, this looks clearer. I would like to see a version with them.
>>
>>> arm64_memblock_init()
>>         Before reserve_crashkernel(), we can update arm64_dma_phys_limit
>> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
>> memblock_start_of_DRAM() is bigger than 4G.
>>         Then we go with:
>>         if (!arm64_dma_phys_limit)
>> 		reserve_crashkernel();
>>
>> Just personal opinion, please check if it's appropriate to handle case
>> 3) which has physical starting memory above 4G here. 
> 
> OK, I'll write it down and consider it in the future optimization.
> 
>>
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>            
>>> paging_init()
>>> 	map_mem()
>>> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 			//block mapping
>>> 		else
>>> 			//page mapping
>>> unflatten_device_tree or ACPI
>>> bootmem_init();
>>> 	zone_sizes_init()
>>> 		of_dma_get_max_cpu_address
>>> 		//Update arm64_dma_phys_limit
>>> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>
>> The rest sounds good with optimization code split out.
>>
>> .
>>
>
Catalin Marinas May 3, 2022, 10 p.m. UTC | #13
On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> > On 2022/4/29 11:24, Baoquan He wrote:
> >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> >>> On 2022/4/28 11:52, Baoquan He wrote:
> >>>> On 04/28/22 at 11:40am, Baoquan He wrote:
> >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> >>>>>> There will be some difference as the 4G limit doesn't always hold for
> >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> >>>>>> things a bit while following the documented behaviour:
> >>>>>>
> >>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> >>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> >>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
[...]
> >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> >>>>> about this version. And I have the same concerns about them which comes
> >>>>> from below points:
> >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
> >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
> >>>>> deploy/configure kdump on different machines of different ARCHes in the
> >>>>> same LAB. I think we should try to avoid the confusion.

I guess by all arches you mean just x86 here. Since the code is not
generic, all arches do their own stuff.

> > OK, I plan to remove optimization, fallback and default low size, to follow the
> > suggestion of Catalin first. But there's one minor point of contention.
> > 
> > 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> > 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
> >    or
> >       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> > 
> > I prefer 2), how about you?

(2) works for me as well. We keep these simple as "expert" options and
allow crashkernel= to fall back to 'high' if not sufficient memory in
ZONE_DMA. That would be a slight change from the current behaviour but,
as Zhen Lei said, with the old tools it's just moving the error around,
the crashkernel wouldn't be available in either case.

> >>>>> 2) Fallback behaviour is important to our distros. The reason is we will
> >>>>> provide default value with crashkernel=xxxM along kernel of distros. In
> >>>>> this case, we hope the reservation will succeed by all means. The ,high
> >>>>> and ,low is an option if customer likes to take with expertise.

OK, that's good feedback.

So, to recap, IIUC you are fine with:

	crashkernel=Y		- allocate within ZONE_DMA with fallback
				  above with a default in ZONE_DMA (like
				  x86, 256M or swiotlb size)
	crashkernel=Y,high	- allocate from above ZONE_DMA
	crashkernel=Y,low	- allocate within ZONE_DMA

'crashkernel' overrides the high and low while the latter two can be
passed independently.

> >>>>> After going through arm64 memory init code, I got below summary about
> >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> >>>>> can make use of it to facilitate to simplify code.
> >>>>> ================================================================================
> >>>>>                         DMA                      DMA32                    NORMAL
> >>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> >>>>> 2)Normal machine        0~4G                     0                        (above 4G)
> >>>>> 3)Special machine       (above 4G)~MAX
> >>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
> >>>
> >>> arm64_memblock_init()
> >>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> >> We don't need different code for this place of reservation as you are
> >> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> >> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> >> reserve_crashkernel() here too.
> > 
> > Yes. Maybe all the devices in this environment are 64-bit. One way I
> > know of allowing 32-bit devices to access high memory without SMMU
> > is: Set a fixed value for the upper 32 bits. In this case, the DMA
> > zone should be [phys_start, phys_start + 4G).

We decided that this case doesn't really exists for arm64 platforms (no
need for special ZONE_DMA).

> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
> offsets in the max_zone_phys() calculation")
> 
>     Currently, the kernel assumes that if RAM starts above 32-bit (or
>     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
>     such constrained devices have a hardwired DMA offset. In practice, we
>     haven't noticed any such hardware so let's assume that we can expand
>     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
>     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
>     zone_bits.

I think the above log is slightly confusing. If the DRAM starts above
4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
case we get incorrect firmware tables, otherwise we could have extended
ZONE_DMA to end of DRAM.

Zhen Lei, if we agreed on the crashkernel behaviour, could you please
post a series that does the above parsing allocation? Ignore the
optimisations, we can look at them afterwards.

Thanks.
Zhen Lei May 5, 2022, 2:13 a.m. UTC | #14
On 2022/5/4 6:00, Catalin Marinas wrote:
> On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
>>> On 2022/4/29 11:24, Baoquan He wrote:
>>>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>>>> On 2022/4/28 11:52, Baoquan He wrote:
>>>>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>>>>> things a bit while following the documented behaviour:
>>>>>>>>
>>>>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> [...]
>>>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>>>>> about this version. And I have the same concerns about them which comes
>>>>>>> from below points:
>>>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>>>>> same LAB. I think we should try to avoid the confusion.
> 
> I guess by all arches you mean just x86 here. Since the code is not
> generic, all arches do their own stuff.
> 
>>> OK, I plan to remove optimization, fallback and default low size, to follow the
>>> suggestion of Catalin first. But there's one minor point of contention.
>>>
>>> 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
>>> 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
>>>    or
>>>       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
>>>
>>> I prefer 2), how about you?
> 
> (2) works for me as well. We keep these simple as "expert" options and

Okay, so I'll follow 2) to update the code.

> allow crashkernel= to fall back to 'high' if not sufficient memory in
> ZONE_DMA. That would be a slight change from the current behaviour but,
> as Zhen Lei said, with the old tools it's just moving the error around,
> the crashkernel wouldn't be available in either case.
> 
>>>>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>>>>> and ,low is an option if customer likes to take with expertise.
> 
> OK, that's good feedback.
> 
> So, to recap, IIUC you are fine with:
> 
> 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> 				  above with a default in ZONE_DMA (like
> 				  x86, 256M or swiotlb size)
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> 'crashkernel' overrides the high and low while the latter two can be
> passed independently.
> 
>>>>>>> After going through arm64 memory init code, I got below summary about
>>>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>>>>> can make use of it to facilitate to simplify code.
>>>>>>> ================================================================================
>>>>>>>                         DMA                      DMA32                    NORMAL
>>>>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>>>>> 3)Special machine       (above 4G)~MAX
>>>>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>>>>
>>>>> arm64_memblock_init()
>>>>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
>>>> We don't need different code for this place of reservation as you are
>>>> doing in this patchset, since arm64_dma_phys_limit is initialized as 
>>>> below. In fact, in arm64_memblock_init(), we have made memblock ready,
>>>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
>>>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
>>>> reserve_crashkernel() here too.
>>>
>>> Yes. Maybe all the devices in this environment are 64-bit. One way I
>>> know of allowing 32-bit devices to access high memory without SMMU
>>> is: Set a fixed value for the upper 32 bits. In this case, the DMA
>>> zone should be [phys_start, phys_start + 4G).
> 
> We decided that this case doesn't really exists for arm64 platforms (no
> need for special ZONE_DMA).
> 
>> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
>> offsets in the max_zone_phys() calculation")
>>
>>     Currently, the kernel assumes that if RAM starts above 32-bit (or
>>     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
>>     such constrained devices have a hardwired DMA offset. In practice, we
>>     haven't noticed any such hardware so let's assume that we can expand
>>     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
>>     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
>>     zone_bits.
> 
> I think the above log is slightly confusing. If the DRAM starts above
> 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
> above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
> ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
> case we get incorrect firmware tables, otherwise we could have extended
> ZONE_DMA to end of DRAM.
> 
> Zhen Lei, if we agreed on the crashkernel behaviour, could you please
> post a series that does the above parsing allocation? Ignore the
> optimisations, we can look at them afterwards.

OK, I've changed the code before the festival, and I'll test it today.

> 
> Thanks.
>
Baoquan He May 5, 2022, 3 a.m. UTC | #15
On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
> > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> > > On 2022/4/29 11:24, Baoquan He wrote:
> > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> > >>> On 2022/4/28 11:52, Baoquan He wrote:
> > >>>> On 04/28/22 at 11:40am, Baoquan He wrote:
> > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> > >>>>>> There will be some difference as the 4G limit doesn't always hold for
> > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> > >>>>>> things a bit while following the documented behaviour:
> > >>>>>>
> > >>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> > >>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> > >>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> [...]
> > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> > >>>>> about this version. And I have the same concerns about them which comes
> > >>>>> from below points:
> > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
> > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
> > >>>>> deploy/configure kdump on different machines of different ARCHes in the
> > >>>>> same LAB. I think we should try to avoid the confusion.
> 
> I guess by all arches you mean just x86 here. Since the code is not
> generic, all arches do their own stuff.

Right. Since currently only x86 has crashkernel,high|low support. From
the distros and customer's point of view, we would like to see the same
feature has the same or similar behaviour. This will ease operation and
maintaining. E.g on the cloud platform, the base of it could be any
ARCH, x86, arm64. The inconsistent behaviour could cause confusion.
Certainly, the underlying implementation may be different.

Surely, if arm64 has its own manner because of reasons, we can
add document to note that.

> 
> > > OK, I plan to remove optimization, fallback and default low size, to follow the
> > > suggestion of Catalin first. But there's one minor point of contention.
> > > 
> > > 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> > > 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
> > >    or
> > >       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> > > 
> > > I prefer 2), how about you?
> 
> (2) works for me as well. We keep these simple as "expert" options and
> allow crashkernel= to fall back to 'high' if not sufficient memory in
> ZONE_DMA. That would be a slight change from the current behaviour but,
> as Zhen Lei said, with the old tools it's just moving the error around,
> the crashkernel wouldn't be available in either case.
> 
> > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will
> > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In
> > >>>>> this case, we hope the reservation will succeed by all means. The ,high
> > >>>>> and ,low is an option if customer likes to take with expertise.
> 
> OK, that's good feedback.
> 
> So, to recap, IIUC you are fine with:
> 
> 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> 				  above with a default in ZONE_DMA (like
> 				  x86, 256M or swiotlb size)
                                  
        Ack to this one. 


> 	crashkernel=Y,high	- allocate from above ZONE_DMA
                                  
        Not exactly. If there's only ZONE_DMA, crashkernel,high will
        be reserved in ZONE_DMA, and crashkernel,low will be ignored.
        Other than this, ack.

> 	crashkernel=Y,low	- allocate within ZONE_DMA

        Ack to this one.
> 
> 'crashkernel' overrides the high and low while the latter two can be
> passed independently.

        crashkernel=,high can be passed independently, then a crashkernel=,low
        is needed implicitly. If people don't want crashkernel=,low
        explicitly, crashkernel=0,low need be specified.

        An independent crashkernel=,low makes no sense. Crashkernel=,low
        should be paird with crashkernel=,high.
        
        My personal opinion according to the existed senmantics on x86.
        Otherwise, the guidance of crashkernel= |,high|,low reservation
        will be complicated to write.

> 
> > >>>>> After going through arm64 memory init code, I got below summary about
> > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> > >>>>> can make use of it to facilitate to simplify code.
> > >>>>> ================================================================================
> > >>>>>                         DMA                      DMA32                    NORMAL
> > >>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> > >>>>> 2)Normal machine        0~4G                     0                        (above 4G)
> > >>>>> 3)Special machine       (above 4G)~MAX
> > >>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
> > >>>
> > >>> arm64_memblock_init()
> > >>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> > >> We don't need different code for this place of reservation as you are
> > >> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> > >> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> > >> reserve_crashkernel() here too.
> > > 
> > > Yes. Maybe all the devices in this environment are 64-bit. One way I
> > > know of allowing 32-bit devices to access high memory without SMMU
> > > is: Set a fixed value for the upper 32 bits. In this case, the DMA
> > > zone should be [phys_start, phys_start + 4G).
> 
> We decided that this case doesn't really exists for arm64 platforms (no
> need for special ZONE_DMA).
> 
> > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
> > offsets in the max_zone_phys() calculation")
> > 
> >     Currently, the kernel assumes that if RAM starts above 32-bit (or
> >     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
> >     such constrained devices have a hardwired DMA offset. In practice, we
> >     haven't noticed any such hardware so let's assume that we can expand
> >     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
> >     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
> >     zone_bits.
> 
> I think the above log is slightly confusing. If the DRAM starts above
> 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
> above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
> ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
> case we get incorrect firmware tables, otherwise we could have extended
> ZONE_DMA to end of DRAM.
> 
> Zhen Lei, if we agreed on the crashkernel behaviour, could you please
> post a series that does the above parsing allocation? Ignore the
> optimisations, we can look at them afterwards.
> 
> Thanks.
> 
> -- 
> Catalin
>
Catalin Marinas May 5, 2022, 2:20 p.m. UTC | #16
On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote:
> On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> > So, to recap, IIUC you are fine with:
> > 
> > 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> > 				  above with a default in ZONE_DMA (like
> > 				  x86, 256M or swiotlb size)
> 
>         Ack to this one.
> 
> 
> > 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 
>         Not exactly. If there's only ZONE_DMA, crashkernel,high will
>         be reserved in ZONE_DMA, and crashkernel,low will be ignored.
>         Other than this, ack.

Yes, that's fine.

> > 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
>         Ack to this one.
> > 
> > 'crashkernel' overrides the high and low while the latter two can be
> > passed independently.
> 
>         crashkernel=,high can be passed independently, then a crashkernel=,low
>         is needed implicitly. If people don't want crashkernel=,low
>         explicitly, crashkernel=0,low need be specified.

I find this complicating the interface. I don't know the background to
the x86 implementation but we diverge already on arm64 since we talk
about ZONE_DMA rather than 4G limit (though for most platforms these
would be the same).

I guess we could restate the difference between crashkernel= and
crashkernel=,high as the hint to go for allocation above ZONE_DMA first.

>         An independent crashkernel=,low makes no sense. Crashkernel=,low
>         should be paird with crashkernel=,high.

You could argue that crashkernel=,low gives the current crashkernel=
behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it
may have some value on its own.

>         My personal opinion according to the existed senmantics on x86.
>         Otherwise, the guidance of crashkernel= |,high|,low reservation
>         will be complicated to write.

It's more that I find the current semantics unnecessarily confusing. But
even reading the x86_64 text it's not that clear. For example the
default low allocation for crashkernel= and crashkernel=,high is only
mentioned in the crashkernel=,low description.
Baoquan He May 6, 2022, 11:39 a.m. UTC | #17
On 05/05/22 at 03:20pm, Catalin Marinas wrote:
> On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote:
> > On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> > > So, to recap, IIUC you are fine with:
> > > 
> > > 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> > > 				  above with a default in ZONE_DMA (like
> > > 				  x86, 256M or swiotlb size)
> > 
> >         Ack to this one.
> > 
> > 
> > > 	crashkernel=Y,high	- allocate from above ZONE_DMA
> > 
> >         Not exactly. If there's only ZONE_DMA, crashkernel,high will
> >         be reserved in ZONE_DMA, and crashkernel,low will be ignored.
> >         Other than this, ack.
> 
> Yes, that's fine.
> 
> > > 	crashkernel=Y,low	- allocate within ZONE_DMA
> > 
> >         Ack to this one.
> > > 
> > > 'crashkernel' overrides the high and low while the latter two can be
> > > passed independently.
> > 
> >         crashkernel=,high can be passed independently, then a crashkernel=,low
> >         is needed implicitly. If people don't want crashkernel=,low
> >         explicitly, crashkernel=0,low need be specified.
> 
> I find this complicating the interface. I don't know the background to
> the x86 implementation but we diverge already on arm64 since we talk
> about ZONE_DMA rather than 4G limit (though for most platforms these
> would be the same).
> 
> I guess we could restate the difference between crashkernel= and
> crashkernel=,high as the hint to go for allocation above ZONE_DMA first.

Yes, rethinking about this, we can make a straightforward and simpler
crashkernel=,high|,low on arm64, namely asking for user to clearly
specify them.

During maintenance of crashkernel= parameter in our distros, we found
crashkernel=xM is used mostly since most of systems can be satisfied
with 256M or a little more for kdump. While on some big end servers,
1G or more crashkernel memory is needed. In this case, crashkernel=,high
is taken. We don't want to reserve so much low memory during system
running while just waiting in case rare crash happened. crashkernel=,high
is rarely used, so making it simple and not so flexible is not so bad.
We can improve it later with justification.

> 
> >         An independent crashkernel=,low makes no sense. Crashkernel=,low
> >         should be paird with crashkernel=,high.
> 
> You could argue that crashkernel=,low gives the current crashkernel=
> behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it
> may have some value on its own.

Yes, crashkernel=,low has the same behaviour as the current crashkernel=
if we decide not to add fallback mechanism to it. The purpose of
crahskernel=,low is to assist crashkernel=,high to get kdump kernel
boot up with satisfing DMA allocation. While allowing independent
crashkernel=,low will add it another mission, limiting crashkernel only
reserved in low memory. Up to now, we don't see the need for that.

> 
> >         My personal opinion according to the existed senmantics on x86.
> >         Otherwise, the guidance of crashkernel= |,high|,low reservation
> >         will be complicated to write.
> 
> It's more that I find the current semantics unnecessarily confusing. But
> even reading the x86_64 text it's not that clear. For example the
> default low allocation for crashkernel= and crashkernel=,high is only
> mentioned in the crashkernel=,low description.

Yeah, we can improve those document if insufficiency is found.

By the way, with my observation, crashkernel= with fallback meet
99% of our needs. If people really need more than 512M memory or more,
then please consider crashkernel=,high. Basically on servers, low memory
is limited, while high memory is very big.

So I agree with you that we can make it step by step, firstly adding
basic crashkernel=,high and ,low support. We can add those complicated
cases later.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index e16b248699d5c3c..19c2d487cb08feb 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -329,8 +329,13 @@  bool crash_is_nosave(unsigned long pfn)
 
 	/* in reserved memory? */
 	addr = __pfn_to_phys(pfn);
-	if ((addr < crashk_res.start) || (crashk_res.end < addr))
-		return false;
+	if ((addr < crashk_res.start) || (crashk_res.end < addr)) {
+		if (!crashk_low_res.end)
+			return false;
+
+		if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr))
+			return false;
+	}
 
 	if (!kexec_crash_image)
 		return true;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 59c648d51848886..889951291cc0f9c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@  static int prepare_elf_headers(void **addr, unsigned long *sz)
 
 	/* Exclude crashkernel region */
 	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+	if (ret)
+		goto out;
+
+	if (crashk_low_res.end) {
+		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+		if (ret)
+			goto out;
+	}
 
-	if (!ret)
-		ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
 
+out:
 	kfree(cmem);
 	return ret;
 }
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f670bca160992b9..99d5539c13de3b1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -90,43 +90,138 @@  phys_addr_t __ro_after_init arm64_dma_phys_limit;
 phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
 #endif
 
+/* Current arm64 boot protocol requires 2MB alignment */
+#define CRASH_ALIGN			SZ_2M
+
+#define CRASH_ADDR_LOW_MAX		arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX		memblock.current_limit
+
+/*
+ * This is an empirical value in x86_64 and taken here directly. Please
+ * refer to the code comment in reserve_crashkernel_low() of x86_64 for more
+ * details.
+ */
+#define DEFAULT_CRASH_KERNEL_LOW_SIZE	\
+	max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20)
+
+static int __init reserve_crashkernel_low(unsigned long long low_size)
+{
+	unsigned long long low_base;
+
+	/* passed with crashkernel=0,low ? */
+	if (!low_size)
+		return 0;
+
+	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+	if (!low_base) {
+		pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size);
+		return -ENOMEM;
+	}
+
+	pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n",
+		low_base, low_base + low_size, low_size >> 20);
+
+	crashk_low_res.start = low_base;
+	crashk_low_res.end   = low_base + low_size - 1;
+	insert_resource(&iomem_resource, &crashk_low_res);
+
+	return 0;
+}
+
 /*
  * reserve_crashkernel() - reserves memory for crash kernel
  *
  * This function reserves memory area given in "crashkernel=" kernel command
  * line parameter. The memory reserved is used by dump capture kernel when
  * primary kernel is crashing.
+ *
+ * NOTE: Reservation of crashkernel,low is special since its existence
+ * is not independent, need rely on the existence of crashkernel,high.
+ * Here, four cases of crashkernel low memory reservation are summarized:
+ * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
+ *    memory takes Y;
+ * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
+ *    take the default crashkernel low memory size;
+ * 3) crashkernel=X is specified, while fallback to get a memory region
+ *    in high memory, take the default crashkernel low memory size;
+ * 4) crashkernel='invalid value',low is specified, failed the whole
+ *    crashkernel reservation and bail out.
  */
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_base, crash_size;
-	unsigned long long crash_max = arm64_dma_phys_limit;
+	unsigned long long crash_low_size;
+	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+	bool fixed_base, high = false;
+	char *cmdline = boot_command_line;
 	int ret;
 
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
 
-	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+	/* crashkernel=X[@offset] */
+	ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
 				&crash_size, &crash_base);
-	/* no crashkernel= or invalid value specified */
-	if (ret || !crash_size)
-		return;
+	if (ret || !crash_size) {
+		ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base);
+		if (ret || !crash_size)
+			return;
+
+		ret = parse_crashkernel_low(cmdline, 0, &crash_low_size, &crash_base);
+		if (ret == -ENOENT)
+			/* case #2 of crashkernel,low reservation */
+			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+		else if (ret)
+			/* case #4 of crashkernel,low reservation */
+			return;
+
+		high = true;
+		crash_max = CRASH_ADDR_HIGH_MAX;
+	}
 
+	fixed_base = !!crash_base;
 	crash_size = PAGE_ALIGN(crash_size);
 
-	/* User specifies base address explicitly. */
-	if (crash_base)
+	if (fixed_base)
 		crash_max = crash_base + crash_size;
 
-	/* Current arm64 boot protocol requires 2MB alignment */
-	crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+retry:
+	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
 					       crash_base, crash_max);
 	if (!crash_base) {
+		/*
+		 * Attempt to fully allocate low memory failed, fall back
+		 * to high memory, the minimum required low memory will be
+		 * reserved later.
+		 */
+		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+			crash_max = CRASH_ADDR_HIGH_MAX;
+			goto retry;
+		}
+
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 			crash_size);
 		return;
 	}
 
+	/*
+	 * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the
+	 * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so
+	 * the 'crash_base' of high memory can not exceed it. To follow the
+	 * description of "crashkernel=X,high" option, add below 'high'
+	 * condition to make sure the crash low memory will be reserved.
+	 */
+	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
+		/* case #3 of crashkernel,low reservation */
+		if (!high)
+			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+
+		if (reserve_crashkernel_low(crash_low_size)) {
+			memblock_phys_free(crash_base, crash_size);
+			return;
+		}
+	}
+
 	pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
 		crash_base, crash_base + crash_size, crash_size >> 20);
 
@@ -135,6 +230,9 @@  static void __init reserve_crashkernel(void)
 	 * map. Inform kmemleak so that it won't try to access it.
 	 */
 	kmemleak_ignore_phys(crash_base);
+	if (crashk_low_res.end)
+		kmemleak_ignore_phys(crashk_low_res.start);
+
 	crashk_res.start = crash_base;
 	crashk_res.end = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);