Message ID | 20230306084124.300584-1-bhe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] arm64: kdump: simplify the reservation behaviour of crashkernel=,high | expand |
On 2023/3/6 16:41, Baoquan He wrote: > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for > suitable memory region top down. If the 'xM' of crashkernel high memory > is reserved from high memory successfully, it will try to reserve > crashkernel low memory later accoringly. Otherwise, it will try to search > low memory area for the 'xM' suitable region. Please see the details in > Documentation/admin-guide/kernel-parameters.txt. > > While we observed an unexpected case where a reserved region crosses the > high and low meomry boundary. E.g on a system with 4G as low memory end, > user added the kernel parameters like: 'crashkernel=512M,high', it could > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. > The crashkernel high region crossing low and high memory boudary will bring > issues: > > 1) For crashkernel=x,high, if getting crashkernel high region across > low and high memory boundary, then user will see two memory regions in > low memory, and one memory region in high memory. The two crashkernel > low memory regions are confusing as shown in above example. > > 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low" > and y <= 128M, when crashkernel high region crosses low and high memory > boundary and the part of crashkernel high reservation below boundary is > bigger than y, the expected crahskernel low reservation will be skipped. > But the expected crashkernel high reservation is shrank and could not > satisfy user space requirement. > > 3) The crossing boundary behaviour of crahskernel high reservation is > different than x86 arch. On x86_64, the low memory end is 4G fixedly, > and the memory near 4G is reserved by system, e.g for mapping firmware, > pci mapping, so the crashkernel reservation crossing boundary never happens. >>From distros point of view, this brings inconsistency and confusion. Users > need to dig into x86 and arm64 system details to find out why. > > For kernel itself, the impact of issue 3) could be slight. While issue > 1) and 2) cause actual impact because it brings obscure semantics and > behaviour to crashkernel=,high reservation. > > Here, for crashkernel=xM,high, search the high memory for the suitable > region only in high memory. If failed, try reserving the suitable > region only in low memory. Like this, the crashkernel high region will > only exist in high memory, and crashkernel low region only exists in low > memory. The reservation behaviour for crashkernel=,high is clearer and > simpler. > > Note: On arm64, the high and low memory boudary could be 1G if it's RPi4 > system, or 4G if other normal systems. Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> > > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > Change history: > v3->v4: > - Change pr_info() to pr_warn() when fixed allocation failed. Leizhen > pointed out this and suggested changing. > v2->v3: > - Rephrase patch log to clarify the current crashkernel high > reservation could cross the high and low memory boundary, but not > 4G boundary only, because RPi4 of arm64 has high and low memory > boudary as 1G. The v3 patch log could mislead people that the RPi4 > also use 4G as high,low memory boundary. > v1->v2: > - Fold patch 2 of v1 into patch 1 for better reviewing. > - Update patch log to add more details. > > arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++---------- > 1 file changed, 33 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 58a0bb2c17f1..307763f97549 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -127,12 +127,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) > */ > static void __init reserve_crashkernel(void) > { > - unsigned long long crash_base, crash_size; > - unsigned long long crash_low_size = 0; > + unsigned long long crash_base, crash_size, search_base; > unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > + unsigned long long crash_low_size = 0; > char *cmdline = boot_command_line; > - int ret; > bool fixed_base = false; > + bool high = false; > + int ret; > > if (!IS_ENABLED(CONFIG_KEXEC_CORE)) > return; > @@ -155,7 +156,9 @@ static void __init reserve_crashkernel(void) > else if (ret) > return; > > + search_base = CRASH_ADDR_LOW_MAX; > crash_max = CRASH_ADDR_HIGH_MAX; > + high = true; > } else if (ret || !crash_size) { > /* The specified value is invalid */ > return; > @@ -166,31 +169,51 @@ static void __init reserve_crashkernel(void) > /* User specifies base address explicitly. */ > if (crash_base) { > fixed_base = true; > + search_base = crash_base; > crash_max = crash_base + crash_size; > } > > retry: > crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > - crash_base, crash_max); > + search_base, crash_max); > if (!crash_base) { > /* > - * If the first attempt was for low memory, fall back to > - * high memory, the minimum required low memory will be > - * reserved later. > + * For crashkernel=size[KMG]@offset[KMG], print out failure > + * message if can't reserve the specified region. > */ > - if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { > + if (fixed_base) { > + pr_warn("crashkernel reservation failed - memory is in use.\n"); > + return; > + } > + > + /* > + * For crashkernel=size[KMG], if the first attempt was for > + * low memory, fall back to high memory, the minimum required > + * low memory will be reserved later. > + */ > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > crash_max = CRASH_ADDR_HIGH_MAX; > + search_base = CRASH_ADDR_LOW_MAX; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > goto retry; > } > > + /* > + * For crashkernel=size[KMG],high, if the first attempt was > + * for high memory, fall back to low memory. > + */ > + if (high && crash_max == CRASH_ADDR_HIGH_MAX) { > + crash_max = CRASH_ADDR_LOW_MAX; > + search_base = 0; > + goto retry; > + } > pr_warn("cannot allocate crashkernel (size:0x%llx)\n", > crash_size); > return; > } > > - if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) && > - crash_low_size && reserve_crashkernel_low(crash_low_size)) { > + if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size && > + reserve_crashkernel_low(crash_low_size)) { > memblock_phys_free(crash_base, crash_size); > return; > } >
On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote: > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for > suitable memory region top down. If the 'xM' of crashkernel high memory > is reserved from high memory successfully, it will try to reserve > crashkernel low memory later accoringly. Otherwise, it will try to search > low memory area for the 'xM' suitable region. Please see the details in > Documentation/admin-guide/kernel-parameters.txt. > > While we observed an unexpected case where a reserved region crosses the > high and low meomry boundary. E.g on a system with 4G as low memory end, > user added the kernel parameters like: 'crashkernel=512M,high', it could > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. > The crashkernel high region crossing low and high memory boudary will bring > issues: > > 1) For crashkernel=x,high, if getting crashkernel high region across > low and high memory boundary, then user will see two memory regions in > low memory, and one memory region in high memory. The two crashkernel > low memory regions are confusing as shown in above example. > > 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low" > and y <= 128M, when crashkernel high region crosses low and high memory > boundary and the part of crashkernel high reservation below boundary is > bigger than y, the expected crahskernel low reservation will be skipped. > But the expected crashkernel high reservation is shrank and could not > satisfy user space requirement. > > 3) The crossing boundary behaviour of crahskernel high reservation is > different than x86 arch. On x86_64, the low memory end is 4G fixedly, > and the memory near 4G is reserved by system, e.g for mapping firmware, > pci mapping, so the crashkernel reservation crossing boundary never happens. > >From distros point of view, this brings inconsistency and confusion. Users > need to dig into x86 and arm64 system details to find out why. > > For kernel itself, the impact of issue 3) could be slight. While issue > 1) and 2) cause actual impact because it brings obscure semantics and > behaviour to crashkernel=,high reservation. > > Here, for crashkernel=xM,high, search the high memory for the suitable > region only in high memory. If failed, try reserving the suitable > region only in low memory. Like this, the crashkernel high region will > only exist in high memory, and crashkernel low region only exists in low > memory. The reservation behaviour for crashkernel=,high is clearer and > simpler. > > Note: On arm64, the high and low memory boudary could be 1G if it's RPi4 > system, or 4G if other normal systems. > > Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org>
On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote: > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for > suitable memory region top down. If the 'xM' of crashkernel high memory > is reserved from high memory successfully, it will try to reserve > crashkernel low memory later accoringly. Otherwise, it will try to search > low memory area for the 'xM' suitable region. Please see the details in > Documentation/admin-guide/kernel-parameters.txt. > > While we observed an unexpected case where a reserved region crosses the > high and low meomry boundary. E.g on a system with 4G as low memory end, > user added the kernel parameters like: 'crashkernel=512M,high', it could > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. > The crashkernel high region crossing low and high memory boudary will bring > issues: [...] > Note: On arm64, the high and low memory boudary could be 1G if it's RPi4 > system, or 4G if other normal systems. I'm mostly ok with the reworking but I'm not sure about the non-fixed low memory boundary. As I mentioned on v2, the user doesn't (need to) know about the ZONE_DMA limit on a specific platform, that's supposed to be fairly transparent. So on RPi4, specifying 'high' still allows allocation below 4G which some users may treat as 'low'. The kernel-parameters.txt doc also only talks about the 4G limit. > + /* > + * For crashkernel=size[KMG], if the first attempt was for > + * low memory, fall back to high memory, the minimum required > + * low memory will be reserved later. > + */ > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > crash_max = CRASH_ADDR_HIGH_MAX; > + search_base = CRASH_ADDR_LOW_MAX; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > goto retry; > } So I'm more tempted to set the search_base to 4G here rather than CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all memory below 4G will fall back to low allocation. But RPi4 is the odd one out, so I think we can live with this.
Hi Catalin, On 03/15/23 at 02:52pm, Catalin Marinas wrote: > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote: > > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for > > suitable memory region top down. If the 'xM' of crashkernel high memory > > is reserved from high memory successfully, it will try to reserve > > crashkernel low memory later accoringly. Otherwise, it will try to search > > low memory area for the 'xM' suitable region. Please see the details in > > Documentation/admin-guide/kernel-parameters.txt. > > > > While we observed an unexpected case where a reserved region crosses the > > high and low meomry boundary. E.g on a system with 4G as low memory end, > > user added the kernel parameters like: 'crashkernel=512M,high', it could > > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. > > The crashkernel high region crossing low and high memory boudary will bring > > issues: > [...] > > Note: On arm64, the high and low memory boudary could be 1G if it's RPi4 > > system, or 4G if other normal systems. Thanks for looking into this. > I'm mostly ok with the reworking but I'm not sure about the non-fixed > low memory boundary. As I mentioned on v2, the user doesn't (need to) > know about the ZONE_DMA limit on a specific platform, that's supposed to > be fairly transparent. So on RPi4, specifying 'high' still allows > allocation below 4G which some users may treat as 'low'. The > kernel-parameters.txt doc also only talks about the 4G limit. With my understanding and the current arm64 code, the boundary of 'high' and 'low' is truly different on different platform. E.g on RPi4, its device does need memory below 1G to initialize, otherwise it will fail kernel bootup. I ever asked people to help test this on RPi4, with crashkernel reservation all above 1G, kdump kernel can't boot up. And now, the upper boundary of low memory has been decided by macro CRASH_ADDR_LOW_MAX. It says how big the low memory end is. That menas on RPi4, it's 1G, on other normal systems, it's 4G. #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit As for kernel-parameters.txt, the 'crashkernel=size[KMG],high' part need be rephrased to reflect the fact on arm64. So I made this patch based on the facts and undertanding. However, I like the idea that taking 4G as the fixed boudary of low and high memory on arm64. Please see the reply at bottom. > > > + /* > > + * For crashkernel=size[KMG], if the first attempt was for > > + * low memory, fall back to high memory, the minimum required > > + * low memory will be reserved later. > > + */ > > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > > crash_max = CRASH_ADDR_HIGH_MAX; > > + search_base = CRASH_ADDR_LOW_MAX; > > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > > goto retry; > > } > > So I'm more tempted to set the search_base to 4G here rather than > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all > memory below 4G will fall back to low allocation. But RPi4 is the odd > one out, so I think we can live with this. I totally agree with you that we should take 4G as the fixed boundary of low and high memory because kdump is aimed at workstation and server platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]] to specify a region if people have to use. [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly since we decide to take 4G as the low memory limit when doing 'high' reserving or falling back. This is just like what we have been doing in x86_64. Not sure if I follow you correctly. ===========================x86_64 low memory end for reference======= On x86_64, we fixedly have DMA zone at 0~16M, and DMA32 zone at 16M~4G, still we have boundary of low memory and high memory at 4G. However, it's not decided in the first place. Initially, we disregarded devices which only address 24bit range, namely 16M, set the CRASH_ADDR_LOW_MAX as 896M because that is compatible with the 32bit kernel which could be running on the x86_64 bare metal machine. Later, we realized 32bit x86 support for kdump is not needed any more because of the rare deployment in server. It was done in below commit. commit 9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c Author: Dave Young <dyoung@redhat.com> Date: Sun Apr 21 11:50:59 2019 +0800 x86/kdump: Have crashkernel=X reserve under 4G by default arch/x86/kernel/setup.c: ------------------------- #ifdef CONFIG_X86_32 # define CRASH_ADDR_LOW_MAX SZ_512M # define CRASH_ADDR_HIGH_MAX SZ_512M #else # define CRASH_ADDR_LOW_MAX SZ_4G # define CRASH_ADDR_HIGH_MAX SZ_64T #endif
On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote: > On 03/15/23 at 02:52pm, Catalin Marinas wrote: > > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote: > > > + /* > > > + * For crashkernel=size[KMG], if the first attempt was for > > > + * low memory, fall back to high memory, the minimum required > > > + * low memory will be reserved later. > > > + */ > > > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > > > crash_max = CRASH_ADDR_HIGH_MAX; > > > + search_base = CRASH_ADDR_LOW_MAX; > > > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > > > goto retry; > > > } > > > > So I'm more tempted to set the search_base to 4G here rather than > > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all > > memory below 4G will fall back to low allocation. But RPi4 is the odd > > one out, so I think we can live with this. > > I totally agree with you that we should take 4G as the fixed boundary of > low and high memory because kdump is aimed at workstation and server > platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]] > to specify a region if people have to use. > > [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end > https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly > since we decide to take 4G as the low memory limit when doing 'high' > reserving or falling back. This is just like what we have been doing in > x86_64. Not sure if I follow you correctly. On RPi4, we do need the 'low' allocation to be below 1GB, otherwise ZONE_DMA will be empty. But we can leave the 'high' reservation above 4GB (if available). The downside is that we won't get anything between 1GB and 4GB unless explicitly specified with @offset. I'm not entirely sure what you want to achieve: avoiding the 'high' reservation going across an arbitrary boundary (1GB or 4GB) that the user may not be aware of or just avoiding the 'high' reservation going across a 4G boundary? On RPi4, if the 'high' reservation above 4GB fails, should it fall back to below 1GB reservation or to somewhere between 1GB and 4GB, making sure it doesn't cross any of these two boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter would look like two 'low' allocations below 4GB.
On 03/16/23 at 05:35pm, Catalin Marinas wrote: > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote: > > On 03/15/23 at 02:52pm, Catalin Marinas wrote: > > > On Mon, Mar 06, 2023 at 04:41:24PM +0800, Baoquan He wrote: > > > > + /* > > > > + * For crashkernel=size[KMG], if the first attempt was for > > > > + * low memory, fall back to high memory, the minimum required > > > > + * low memory will be reserved later. > > > > + */ > > > > + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > > > > crash_max = CRASH_ADDR_HIGH_MAX; > > > > + search_base = CRASH_ADDR_LOW_MAX; > > > > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > > > > goto retry; > > > > } > > > > > > So I'm more tempted to set the search_base to 4G here rather than > > > CRASH_ADDR_LOW_MAX. The crashkernel=x,high option on a RPi4 with all > > > memory below 4G will fall back to low allocation. But RPi4 is the odd > > > one out, so I think we can live with this. > > > > I totally agree with you that we should take 4G as the fixed boundary of > > low and high memory because kdump is aimed at workstation and server > > platform. We can leave RPi4 to use crashkernel=size[KMG][@offset[KMG]] > > to specify a region if people have to use. > > > > [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end > > https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u > > > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly > > since we decide to take 4G as the low memory limit when doing 'high' > > reserving or falling back. This is just like what we have been doing in > > x86_64. Not sure if I follow you correctly. > > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise > ZONE_DMA will be empty. But we can leave the 'high' reservation above > 4GB (if available). The downside is that we won't get anything between > 1GB and 4GB unless explicitly specified with @offset. Ah, I see. You want to have high reservation like below when RPi4. diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 307763f97549..9e558fadf483 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) +#define CRASH_HIGH_SEARCH_BASE SZ_4G #define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void) else if (ret) return; - search_base = CRASH_ADDR_LOW_MAX; + search_base = CRASH_HIGH_SEARCH_BASE; crash_max = CRASH_ADDR_HIGH_MAX; high = true; } else if (ret || !crash_size) { @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void) */ if (!high && crash_max == CRASH_ADDR_LOW_MAX) { crash_max = CRASH_ADDR_HIGH_MAX; - search_base = CRASH_ADDR_LOW_MAX; + search_base = CRASH_HIGH_SEARCH_BASE; crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; goto retry; } > > I'm not entirely sure what you want to achieve: avoiding the 'high' > reservation going across an arbitrary boundary (1GB or 4GB) that the > user may not be aware of or just avoiding the 'high' reservation going > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB > fails, should it fall back to below 1GB reservation or to somewhere > between 1GB and 4GB, making sure it doesn't cross any of these two > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter > would look like two 'low' allocations below 4GB. Currently, with the existing code, the high and low memory is like below on arm64: RPi4: 0~1G: low memory | 1G~top: high memory Other normal system: 0~4G: low memory | 4G~top: high memory Now you want RPi4 to have low and high crahskernel reservation like this: 0~1G: low memory | 4G~top: high memory It's also fine. However, as far as I know, RPi4 usually only has 8G memory. And arm64 only has top down memblock allocation. Its high memory above 4G will be fragmentary, the crahskernel high reservation will fail if the value is big. In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, we leave it to crashkernel=size@offset syntax. Two reasons for this: 1) crashkernel is needed on enterprise platform, such as workstation or server. While RPi4 is obviously not the target. I contacted several RPi4 players in Redhat and my friends, none of them ever played kdump testing. If they really have to, crashkernel=size@offset is enough for them to set. 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the problem of base page mapping for the whole linear mapping if crsahkernel= is set in kernel parameter shown in [1] at bottom. Arm64 enterprise OS deployment is now very important in our Fedora/Centos stream/RHEL. I am wondering how wide and important kdump feature is needed and deployed on RPi4 platform and what is the user case. If CRASH_ADDR_LOW_MAX is set to 4G fixedly, and document that RPi4 need crashkernel=size@offset specially if kdump deployment is really needed. problems we discussed here and the spotted base page mapping issue can be solved easily and elegantly. That woule be deeply appreciated. diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 307763f97549..3b03958b668e 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -93,8 +93,9 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; /* Current arm64 boot protocol requires 2MB alignment */ #define CRASH_ALIGN SZ_2M -#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_LOW_MAX SZ_4G #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) [1] [PATCH 0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u
On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote: > On 03/16/23 at 05:35pm, Catalin Marinas wrote: > > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote: > > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly > > > since we decide to take 4G as the low memory limit when doing 'high' > > > reserving or falling back. This is just like what we have been doing in > > > x86_64. Not sure if I follow you correctly. > > > > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise > > ZONE_DMA will be empty. But we can leave the 'high' reservation above > > 4GB (if available). The downside is that we won't get anything between > > 1GB and 4GB unless explicitly specified with @offset. > > Ah, I see. You want to have high reservation like below when RPi4. > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 307763f97549..9e558fadf483 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > > #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) > +#define CRASH_HIGH_SEARCH_BASE SZ_4G > > #define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) > > @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void) > else if (ret) > return; > > - search_base = CRASH_ADDR_LOW_MAX; > + search_base = CRASH_HIGH_SEARCH_BASE; > crash_max = CRASH_ADDR_HIGH_MAX; > high = true; > } else if (ret || !crash_size) { > @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void) > */ > if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > crash_max = CRASH_ADDR_HIGH_MAX; > - search_base = CRASH_ADDR_LOW_MAX; > + search_base = CRASH_HIGH_SEARCH_BASE; > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > goto retry; > } This would probably do _if_ these are the semantics that we want (well, more discussion below). > > I'm not entirely sure what you want to achieve: avoiding the 'high' > > reservation going across an arbitrary boundary (1GB or 4GB) that the > > user may not be aware of or just avoiding the 'high' reservation going > > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB > > fails, should it fall back to below 1GB reservation or to somewhere > > between 1GB and 4GB, making sure it doesn't cross any of these two > > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter > > would look like two 'low' allocations below 4GB. > > Currently, with the existing code, the high and low memory is like below > on arm64: > RPi4: > 0~1G: low memory | 1G~top: high memory > > Other normal system: > 0~4G: low memory | 4G~top: high memory Yes, that's the current behaviour. > Now you want RPi4 to have low and high crahskernel reservation like > this: > 0~1G: low memory | 4G~top: high memory I don't necessarily want this, I just want to clarify what we actually need to fix. Your initial reasoning for this patch was that the user gets confused by the high allocation going across the 4G boundary. But without knowing of the 1GB vs 4GB boundary on RPi4, one can still get confused by the high allocation across the 4GB boundary on RPi4. > It's also fine. However, as far as I know, RPi4 usually only has 8G > memory. That's the maximum. Lots of them just come with 4GB or less. > And arm64 only has top down memblock allocation. Its high memory > above 4G will be fragmentary, the crahskernel high reservation will fail > if the value is big. > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, > we leave it to crashkernel=size@offset syntax. Two reasons for this: > 1) crashkernel is needed on enterprise platform, such as workstation or > server. While RPi4 is obviously not the target. I contacted several RPi4 > players in Redhat and my friends, none of them ever played kdump > testing. If they really have to, crashkernel=size@offset is enough for > them to set. I'd like crashkernel=size (without @offset) on RPi4 to still do the right thing: a low allocation at least as we had until recently (or high+low where high here is maybe above 1GB). IOW, no regression for this crashkernel=size case. We can then change the explicit crashkernel=x,high to mean only above 4GB irrespective of the platform and crashkernel=x,low to be below arm64_dma_phys_limit. > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the > problem of base page mapping for the whole linear mapping if crsahkernel= > is set in kernel parameter shown in [1] at bottom. That's a different problem ;). I should re-read that thread, forgot most of the details but I recall one of the counter arguments was that there isn't a strong case to unmap the crashkernel reservation. Now, if we place crashdump kernel image goes in the 'high' reservation, can we not leave the 'low' reservation mapped? We don't really care about it as it wouldn't have any meaningful code/data to be preserved. If the 'high' one goes above 4G always, we don't depend on the arm64_dma_phys_limit. > Arm64 enterprise OS deployment is now very important in our > Fedora/Centos stream/RHEL. I am wondering how wide and important kdump > feature is needed and deployed on RPi4 platform and what is the user case. I doubt it's important beyond testing but I'd very much like not to cause a regression and require the @offset argument for RPi4.
On 03/17/23 at 06:05pm, Catalin Marinas wrote: > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote: > > On 03/16/23 at 05:35pm, Catalin Marinas wrote: > > > On Thu, Mar 16, 2023 at 05:47:52PM +0800, Baoquan He wrote: > > > > Now I am wondering if we should change CRASH_ADDR_LOW_MAX to 4G directly > > > > since we decide to take 4G as the low memory limit when doing 'high' > > > > reserving or falling back. This is just like what we have been doing in > > > > x86_64. Not sure if I follow you correctly. > > > > > > On RPi4, we do need the 'low' allocation to be below 1GB, otherwise > > > ZONE_DMA will be empty. But we can leave the 'high' reservation above > > > 4GB (if available). The downside is that we won't get anything between > > > 1GB and 4GB unless explicitly specified with @offset. > > > > Ah, I see. You want to have high reservation like below when RPi4. > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 307763f97549..9e558fadf483 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -95,6 +95,7 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > > > > #define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > > #define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) > > +#define CRASH_HIGH_SEARCH_BASE SZ_4G > > > > #define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) > > > > @@ -156,7 +157,7 @@ static void __init reserve_crashkernel(void) > > else if (ret) > > return; > > > > - search_base = CRASH_ADDR_LOW_MAX; > > + search_base = CRASH_HIGH_SEARCH_BASE; > > crash_max = CRASH_ADDR_HIGH_MAX; > > high = true; > > } else if (ret || !crash_size) { > > @@ -193,7 +194,7 @@ static void __init reserve_crashkernel(void) > > */ > > if (!high && crash_max == CRASH_ADDR_LOW_MAX) { > > crash_max = CRASH_ADDR_HIGH_MAX; > > - search_base = CRASH_ADDR_LOW_MAX; > > + search_base = CRASH_HIGH_SEARCH_BASE; > > crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; > > goto retry; > > } > > This would probably do _if_ these are the semantics that we want (well, > more discussion below). > > > > I'm not entirely sure what you want to achieve: avoiding the 'high' > > > reservation going across an arbitrary boundary (1GB or 4GB) that the > > > user may not be aware of or just avoiding the 'high' reservation going > > > across a 4G boundary? On RPi4, if the 'high' reservation above 4GB > > > fails, should it fall back to below 1GB reservation or to somewhere > > > between 1GB and 4GB, making sure it doesn't cross any of these two > > > boundaries? For someone unfamiliar with the ZONE_DMA on RPi4, the latter > > > would look like two 'low' allocations below 4GB. > > > > Currently, with the existing code, the high and low memory is like below > > on arm64: > > RPi4: > > 0~1G: low memory | 1G~top: high memory > > > > Other normal system: > > 0~4G: low memory | 4G~top: high memory > > Yes, that's the current behaviour. > > > Now you want RPi4 to have low and high crahskernel reservation like > > this: > > 0~1G: low memory | 4G~top: high memory > > I don't necessarily want this, I just want to clarify what we actually > need to fix. Your initial reasoning for this patch was that the user > gets confused by the high allocation going across the 4G boundary. But > without knowing of the 1GB vs 4GB boundary on RPi4, one can still get > confused by the high allocation across the 4GB boundary on RPi4. Ah, I see your point. Yeah, the initial reasoning is our QE reported several times that arm64 system could have two regions of low memory allocation when crahskernel=,high. While later I spotted another two issues. With the initial reasoning, we does need consider the hign and low boundary at 1G on RPi4 which could cause confusion. Given RPi4 is not taken into our distros's account, I am shooting myself in the foot. But yes, we should consider it if kdump feature is really cared about on RPi4. Please see later reply. > > > It's also fine. However, as far as I know, RPi4 usually only has 8G > > memory. > > That's the maximum. Lots of them just come with 4GB or less. > > > And arm64 only has top down memblock allocation. Its high memory > > above 4G will be fragmentary, the crahskernel high reservation will fail > > if the value is big. > > > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, > > we leave it to crashkernel=size@offset syntax. Two reasons for this: > > 1) crashkernel is needed on enterprise platform, such as workstation or > > server. While RPi4 is obviously not the target. I contacted several RPi4 > > players in Redhat and my friends, none of them ever played kdump > > testing. If they really have to, crashkernel=size@offset is enough for > > them to set. > > I'd like crashkernel=size (without @offset) on RPi4 to still do the > right thing: a low allocation at least as we had until recently (or > high+low where high here is maybe above 1GB). IOW, no regression for > this crashkernel=size case. We can then change the explicit > crashkernel=x,high to mean only above 4GB irrespective of the platform > and crashkernel=x,low to be below arm64_dma_phys_limit. Since crashkernel=,high and crashkernel=size fallback was added in arm64 recently, with my understanding, you are suggesting: on arm64: RPi4: crashkernel=size 0~1G: low memory (no regression introduced) crashkernel=size,high 0~1G: low memory | 4G~top: high memory Other normal system: crashkernel=size|crashkernel=size,high 0~4G: low memory | 4G~top: high memory Please correct me if I misunderstood, I can make change according to your suggestion. > > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the > > problem of base page mapping for the whole linear mapping if crsahkernel= > > is set in kernel parameter shown in [1] at bottom. > > That's a different problem ;). I should re-read that thread, forgot most > of the details but I recall one of the counter arguments was that there > isn't a strong case to unmap the crashkernel reservation. Now, if we > place crashdump kernel image goes in the 'high' reservation, can we not > leave the 'low' reservation mapped? We don't really care about it as it > wouldn't have any meaningful code/data to be preserved. If the 'high' > one goes above 4G always, we don't depend on the arm64_dma_phys_limit. Yes, this looks ideal. While it only works when crashkernel=,high case and it succeeds to reserve a memory region for the specified size of crashkernel high memory. At below, we have 4 cases of crashkernel= syntax: crashkernel=size 1)first attempt: low memory under arm64_dma_phys_limit 2)fallback: finding memory above 4G crashkernel=size,high 3)first attempt: finding memory above 4G 4)fallback: low memory under arm64_dma_phys_limit case 3) works with your suggestion. However, 1), 2), 4) all need to defer to bootmem_init(). With these cases and different handling, reserve_crashkernel() could be too complicated. I am wondering if we can cancel the protection of crashkernel memory region on arm64 for now. In earlier discussion, people questioned if the protection is necessary on arm64. After comparison, I would rather take away the protection method of crashkernel region since they try to protect in a chance in one million , while the base page mapping for the whole linear mapping is mitigating arm64 high end server always. > > > Arm64 enterprise OS deployment is now very important in our > > Fedora/Centos stream/RHEL. I am wondering how wide and important kdump > > feature is needed and deployed on RPi4 platform and what is the user case. > > I doubt it's important beyond testing but I'd very much like not to > cause a regression and require the @offset argument for RPi4. OK, understood. Pity we don't have information to judge if it's a RPi4 system or not, with a Kconfig option, some cpu register or hardware information.
On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote: > On 03/17/23 at 06:05pm, Catalin Marinas wrote: > > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote: > > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G > > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, > > > we leave it to crashkernel=size@offset syntax. Two reasons for this: > > > 1) crashkernel is needed on enterprise platform, such as workstation or > > > server. While RPi4 is obviously not the target. I contacted several RPi4 > > > players in Redhat and my friends, none of them ever played kdump > > > testing. If they really have to, crashkernel=size@offset is enough for > > > them to set. > > > > I'd like crashkernel=size (without @offset) on RPi4 to still do the > > right thing: a low allocation at least as we had until recently (or > > high+low where high here is maybe above 1GB). IOW, no regression for > > this crashkernel=size case. We can then change the explicit > > crashkernel=x,high to mean only above 4GB irrespective of the platform > > and crashkernel=x,low to be below arm64_dma_phys_limit. > > Since crashkernel=,high and crashkernel=size fallback was added in arm64 > recently, with my understanding, you are suggesting: > > on arm64: > RPi4: > crashkernel=size > 0~1G: low memory (no regression introduced) And, if not enough low memory, fall back to memory above 1GB (for RPi4; it would be above 4GB for any other system). > crashkernel=size,high > 0~1G: low memory | 4G~top: high memory Yes. > Other normal system: > crashkernel=size|crashkernel=size,high > 0~4G: low memory | 4G~top: high memory Yes. IOW, specifying 'high' only forces the high allocation above 4GB instead of arm64_dma_phys_limit, irrespective of the platform. If no 'high' specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for the rest). > > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the > > > problem of base page mapping for the whole linear mapping if crsahkernel= > > > is set in kernel parameter shown in [1] at bottom. > > > > That's a different problem ;). I should re-read that thread, forgot most > > of the details but I recall one of the counter arguments was that there > > isn't a strong case to unmap the crashkernel reservation. Now, if we > > place crashdump kernel image goes in the 'high' reservation, can we not > > leave the 'low' reservation mapped? We don't really care about it as it > > wouldn't have any meaningful code/data to be preserved. If the 'high' > > one goes above 4G always, we don't depend on the arm64_dma_phys_limit. > > Yes, this looks ideal. While it only works when crashkernel=,high case and > it succeeds to reserve a memory region for the specified size of crashkernel > high memory. At below, we have 4 cases of crashkernel= syntax: > > crashkernel=size > 1)first attempt: low memory under arm64_dma_phys_limit > 2)fallback: finding memory above 4G (2) should be 'finding memory above arm64_dma_phys_limit' to keep the current behaviour for RPi4. > crashkernel=size,high > 3)first attempt: finding memory above 4G > 4)fallback: low memory under arm64_dma_phys_limit Yes. > case 3) works with your suggestion. However, 1), 2), 4) all need to > defer to bootmem_init(). With these cases and different handling, > reserve_crashkernel() could be too complicated. Ah, because of the fallback below arm64_dma_phys_limit as in (4), we still can't move the full crashkernel reservation early. Well, we could do it in two steps: (a) early attempt at crashkernel reservation above 4G if 'high' was specified and we avoid mapping it if successful and (b) do the late crashkernel reservation below arm64_dma_phys_limit and skip unmapping as being too late. This way most server-like platforms would get a reservation above 4G, unmapped. > I am wondering if we can cancel the protection of crashkernel memory > region on arm64 for now. In earlier discussion, people questioned if the > protection is necessary on arm64. After comparison, I would rather take > away the protection method of crashkernel region since they try to > protect in a chance in one million , while the base page mapping for the > whole linear mapping is mitigating arm64 high end server always. This works for me. We can add the protection later for addresses above 4GB only as mentioned above.
On 2023/3/24 1:25, Catalin Marinas wrote: > On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote: >> On 03/17/23 at 06:05pm, Catalin Marinas wrote: >>> On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote: >>>> In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G >>>> fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, >>>> we leave it to crashkernel=size@offset syntax. Two reasons for this: >>>> 1) crashkernel is needed on enterprise platform, such as workstation or >>>> server. While RPi4 is obviously not the target. I contacted several RPi4 >>>> players in Redhat and my friends, none of them ever played kdump >>>> testing. If they really have to, crashkernel=size@offset is enough for >>>> them to set. >>> >>> I'd like crashkernel=size (without @offset) on RPi4 to still do the >>> right thing: a low allocation at least as we had until recently (or >>> high+low where high here is maybe above 1GB). IOW, no regression for >>> this crashkernel=size case. We can then change the explicit >>> crashkernel=x,high to mean only above 4GB irrespective of the platform >>> and crashkernel=x,low to be below arm64_dma_phys_limit. >> >> Since crashkernel=,high and crashkernel=size fallback was added in arm64 >> recently, with my understanding, you are suggesting: >> >> on arm64: >> RPi4: >> crashkernel=size >> 0~1G: low memory (no regression introduced) > > And, if not enough low memory, fall back to memory above 1GB (for RPi4; > it would be above 4GB for any other system). > >> crashkernel=size,high >> 0~1G: low memory | 4G~top: high memory > > Yes. > >> Other normal system: >> crashkernel=size|crashkernel=size,high >> 0~4G: low memory | 4G~top: high memory > > Yes. > > IOW, specifying 'high' only forces the high allocation above 4GB instead > of arm64_dma_phys_limit, irrespective of the platform. If no 'high' > specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for > the rest). > >>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the >>>> problem of base page mapping for the whole linear mapping if crsahkernel= >>>> is set in kernel parameter shown in [1] at bottom. >>> >>> That's a different problem ;). I should re-read that thread, forgot most >>> of the details but I recall one of the counter arguments was that there >>> isn't a strong case to unmap the crashkernel reservation. Now, if we >>> place crashdump kernel image goes in the 'high' reservation, can we not >>> leave the 'low' reservation mapped? We don't really care about it as it >>> wouldn't have any meaningful code/data to be preserved. If the 'high' >>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit. >> >> Yes, this looks ideal. While it only works when crashkernel=,high case and >> it succeeds to reserve a memory region for the specified size of crashkernel >> high memory. At below, we have 4 cases of crashkernel= syntax: >> >> crashkernel=size >> 1)first attempt: low memory under arm64_dma_phys_limit >> 2)fallback: finding memory above 4G > > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the > current behaviour for RPi4. > >> crashkernel=size,high >> 3)first attempt: finding memory above 4G >> 4)fallback: low memory under arm64_dma_phys_limit > > Yes. > >> case 3) works with your suggestion. However, 1), 2), 4) all need to >> defer to bootmem_init(). With these cases and different handling, >> reserve_crashkernel() could be too complicated. > > Ah, because of the fallback below arm64_dma_phys_limit as in (4), we > still can't move the full crashkernel reservation early. Well, we could > do it in two steps: (a) early attempt at crashkernel reservation above > 4G if 'high' was specified and we avoid mapping it if successful and (b) > do the late crashkernel reservation below arm64_dma_phys_limit and skip > unmapping as being too late. This way most server-like platforms would > get a reservation above 4G, unmapped. > >> I am wondering if we can cancel the protection of crashkernel memory >> region on arm64 for now. In earlier discussion, people questioned if the >> protection is necessary on arm64. After comparison, I would rather take >> away the protection method of crashkernel region since they try to >> protect in a chance in one million , while the base page mapping for the >> whole linear mapping is mitigating arm64 high end server always. > > This works for me. We can add the protection later for addresses above > 4GB only as mentioned above. Recently, I've also been rethinking the performance issues when kdump is enabled. I have a new idea. For crashkernel=X, we can temporarily search for free memory from the low address to the high address. As below: save_bottom_up = memblock_bottom_up(); if (!high) memblock_set_bottom_up(true); crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); memblock_set_bottom_up(save_bottom_up); The final code change should be small, and I'll try it today. >
On 03/23/23 at 05:25pm, Catalin Marinas wrote: > On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote: > > On 03/17/23 at 06:05pm, Catalin Marinas wrote: > > > On Fri, Mar 17, 2023 at 11:09:13PM +0800, Baoquan He wrote: > > > > In fact, what I want to achieve is we set CRASH_ADDR_LOW_MAX to 4G > > > > fixedly on arm64, just like what we do on x86_64. As for RPi4 platform, > > > > we leave it to crashkernel=size@offset syntax. Two reasons for this: > > > > 1) crashkernel is needed on enterprise platform, such as workstation or > > > > server. While RPi4 is obviously not the target. I contacted several RPi4 > > > > players in Redhat and my friends, none of them ever played kdump > > > > testing. If they really have to, crashkernel=size@offset is enough for > > > > them to set. > > > > > > I'd like crashkernel=size (without @offset) on RPi4 to still do the > > > right thing: a low allocation at least as we had until recently (or > > > high+low where high here is maybe above 1GB). IOW, no regression for > > > this crashkernel=size case. We can then change the explicit > > > crashkernel=x,high to mean only above 4GB irrespective of the platform > > > and crashkernel=x,low to be below arm64_dma_phys_limit. > > > > Since crashkernel=,high and crashkernel=size fallback was added in arm64 > > recently, with my understanding, you are suggesting: > > > > on arm64: > > RPi4: > > crashkernel=size > > 0~1G: low memory (no regression introduced) > > And, if not enough low memory, fall back to memory above 1GB (for RPi4; > it would be above 4GB for any other system). > > > crashkernel=size,high > > 0~1G: low memory | 4G~top: high memory > > Yes. > > > Other normal system: > > crashkernel=size|crashkernel=size,high > > 0~4G: low memory | 4G~top: high memory > > Yes. > > IOW, specifying 'high' only forces the high allocation above 4GB instead > of arm64_dma_phys_limit, irrespective of the platform. If no 'high' > specified search_base remains CRASH_ADDR_LOW_MAX (1GB on RPi4, 4GB for > the rest). > > > > > 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the > > > > problem of base page mapping for the whole linear mapping if crsahkernel= > > > > is set in kernel parameter shown in [1] at bottom. > > > > > > That's a different problem ;). I should re-read that thread, forgot most > > > of the details but I recall one of the counter arguments was that there > > > isn't a strong case to unmap the crashkernel reservation. Now, if we > > > place crashdump kernel image goes in the 'high' reservation, can we not > > > leave the 'low' reservation mapped? We don't really care about it as it > > > wouldn't have any meaningful code/data to be preserved. If the 'high' > > > one goes above 4G always, we don't depend on the arm64_dma_phys_limit. > > > > Yes, this looks ideal. While it only works when crashkernel=,high case and > > it succeeds to reserve a memory region for the specified size of crashkernel > > high memory. At below, we have 4 cases of crashkernel= syntax: > > > > crashkernel=size > > 1)first attempt: low memory under arm64_dma_phys_limit > > 2)fallback: finding memory above 4G > > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the > current behaviour for RPi4. Then for RPi4, with case 2), it will find memory above arm64_dma_phys_limit, namely 1G. Then it will get two memory regions, one could be in [1G, 4G], another is below 4G. I am fine with this, as long as it won't cause confusion that people may think two low memory regions you mentioned earlier. Please help confirm if I understand your suggestioin correctly. I will start making patch with this clarified. Thanks. > > > crashkernel=size,high > > 3)first attempt: finding memory above 4G > > 4)fallback: low memory under arm64_dma_phys_limit > > Yes. > > > case 3) works with your suggestion. However, 1), 2), 4) all need to > > defer to bootmem_init(). With these cases and different handling, > > reserve_crashkernel() could be too complicated. > > Ah, because of the fallback below arm64_dma_phys_limit as in (4), we > still can't move the full crashkernel reservation early. Well, we could > do it in two steps: (a) early attempt at crashkernel reservation above > 4G if 'high' was specified and we avoid mapping it if successful and (b) > do the late crashkernel reservation below arm64_dma_phys_limit and skip > unmapping as being too late. This way most server-like platforms would > get a reservation above 4G, unmapped. Yeah, this covers case 3), while other cases are still in pit. > > > I am wondering if we can cancel the protection of crashkernel memory > > region on arm64 for now. In earlier discussion, people questioned if the > > protection is necessary on arm64. After comparison, I would rather take > > away the protection method of crashkernel region since they try to > > protect in a chance in one million , while the base page mapping for the > > whole linear mapping is mitigating arm64 high end server always. > > This works for me. We can add the protection later for addresses above > 4GB only as mentioned above. Thanks, I have posted a patchset to cancel the protection on crashkernel memory region as per your confirmation here. This can give distros a chance to back port them to fix the performance issue caused by the base page mapping. I personally expect we can hold the crashkernel region unprotected till we have a ideal solution since the code will be elegant with comfortable simplicity. Let's wait and see the code change if people interested want to keep the protection methods.
Hi Leizhen, On 03/24/23 at 10:47am, Leizhen (ThunderTown) wrote: ...... > >>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the > >>>> problem of base page mapping for the whole linear mapping if crsahkernel= > >>>> is set in kernel parameter shown in [1] at bottom. > >>> > >>> That's a different problem ;). I should re-read that thread, forgot most > >>> of the details but I recall one of the counter arguments was that there > >>> isn't a strong case to unmap the crashkernel reservation. Now, if we > >>> place crashdump kernel image goes in the 'high' reservation, can we not > >>> leave the 'low' reservation mapped? We don't really care about it as it > >>> wouldn't have any meaningful code/data to be preserved. If the 'high' > >>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit. > >> > >> Yes, this looks ideal. While it only works when crashkernel=,high case and > >> it succeeds to reserve a memory region for the specified size of crashkernel > >> high memory. At below, we have 4 cases of crashkernel= syntax: > >> > >> crashkernel=size > >> 1)first attempt: low memory under arm64_dma_phys_limit > >> 2)fallback: finding memory above 4G > > > > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the > > current behaviour for RPi4. > > > >> crashkernel=size,high > >> 3)first attempt: finding memory above 4G > >> 4)fallback: low memory under arm64_dma_phys_limit > > > > Yes. > > > >> case 3) works with your suggestion. However, 1), 2), 4) all need to > >> defer to bootmem_init(). With these cases and different handling, > >> reserve_crashkernel() could be too complicated. > > > > Ah, because of the fallback below arm64_dma_phys_limit as in (4), we > > still can't move the full crashkernel reservation early. Well, we could > > do it in two steps: (a) early attempt at crashkernel reservation above > > 4G if 'high' was specified and we avoid mapping it if successful and (b) > > do the late crashkernel reservation below arm64_dma_phys_limit and skip > > unmapping as being too late. This way most server-like platforms would > > get a reservation above 4G, unmapped. > > > >> I am wondering if we can cancel the protection of crashkernel memory > >> region on arm64 for now. In earlier discussion, people questioned if the > >> protection is necessary on arm64. After comparison, I would rather take > >> away the protection method of crashkernel region since they try to > >> protect in a chance in one million , while the base page mapping for the > >> whole linear mapping is mitigating arm64 high end server always. > > > > This works for me. We can add the protection later for addresses above > > 4GB only as mentioned above. > > Recently, I've also been rethinking the performance issues when kdump is > enabled. I have a new idea. For crashkernel=X, we can temporarily search > for free memory from the low address to the high address. As below: > > save_bottom_up = memblock_bottom_up(); > if (!high) > memblock_set_bottom_up(true); > crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); > memblock_set_bottom_up(save_bottom_up); > > The final code change should be small, and I'll try it today. I have sent a patchset to remove the crashkernel region protection code as per Catalin's confirmation. I personally like the code conciseness w/o protection because kinds of crahskernel reservation has been complex, the situation on arm64 will makes it worse if we try to keep the protection and fix the performance issue. While I am glad to see any attempt to achieve the two goals if it's satisfactory.
On Fri, Mar 24, 2023 at 10:08:17PM +0800, Baoquan He wrote: > On 03/23/23 at 05:25pm, Catalin Marinas wrote: > > On Mon, Mar 20, 2023 at 09:12:08PM +0800, Baoquan He wrote: > > > crashkernel=size > > > 1)first attempt: low memory under arm64_dma_phys_limit > > > 2)fallback: finding memory above 4G > > > > (2) should be 'finding memory above arm64_dma_phys_limit' to keep the > > current behaviour for RPi4. > > Then for RPi4, with case 2), it will find memory above > arm64_dma_phys_limit, namely 1G. Then it will get two memory regions, > one could be in [1G, 4G], another is below 4G. I am fine with this, as > long as it won't cause confusion that people may think two low memory > regions you mentioned earlier. Please help confirm if I understand your > suggestioin correctly. I will start making patch with this clarified. Yes, you understood correctly. While it may still be slightly confusing, if the user is not specific about high and low on the cmdline, we just allow the kernel to find the best places. I assume distros will just use ,high and get a consistent behaviour on all platforms.
On 2023/3/24 22:53, Baoquan He wrote: > Hi Leizhen, > > On 03/24/23 at 10:47am, Leizhen (ThunderTown) wrote: > ...... >>>>>> 2) with the fixed CRASH_ADDR_LOW_MAX as 4G, we can easily fix the >>>>>> problem of base page mapping for the whole linear mapping if crsahkernel= >>>>>> is set in kernel parameter shown in [1] at bottom. >>>>> >>>>> That's a different problem ;). I should re-read that thread, forgot most >>>>> of the details but I recall one of the counter arguments was that there >>>>> isn't a strong case to unmap the crashkernel reservation. Now, if we >>>>> place crashdump kernel image goes in the 'high' reservation, can we not >>>>> leave the 'low' reservation mapped? We don't really care about it as it >>>>> wouldn't have any meaningful code/data to be preserved. If the 'high' >>>>> one goes above 4G always, we don't depend on the arm64_dma_phys_limit. >>>> >>>> Yes, this looks ideal. While it only works when crashkernel=,high case and >>>> it succeeds to reserve a memory region for the specified size of crashkernel >>>> high memory. At below, we have 4 cases of crashkernel= syntax: >>>> >>>> crashkernel=size >>>> 1)first attempt: low memory under arm64_dma_phys_limit >>>> 2)fallback: finding memory above 4G >>> >>> (2) should be 'finding memory above arm64_dma_phys_limit' to keep the >>> current behaviour for RPi4. >>> >>>> crashkernel=size,high >>>> 3)first attempt: finding memory above 4G >>>> 4)fallback: low memory under arm64_dma_phys_limit >>> >>> Yes. >>> >>>> case 3) works with your suggestion. However, 1), 2), 4) all need to >>>> defer to bootmem_init(). With these cases and different handling, >>>> reserve_crashkernel() could be too complicated. >>> >>> Ah, because of the fallback below arm64_dma_phys_limit as in (4), we >>> still can't move the full crashkernel reservation early. Well, we could >>> do it in two steps: (a) early attempt at crashkernel reservation above >>> 4G if 'high' was specified and we avoid mapping it if successful and (b) >>> do the late crashkernel reservation below arm64_dma_phys_limit and skip >>> unmapping as being too late. This way most server-like platforms would >>> get a reservation above 4G, unmapped. >>> >>>> I am wondering if we can cancel the protection of crashkernel memory >>>> region on arm64 for now. In earlier discussion, people questioned if the >>>> protection is necessary on arm64. After comparison, I would rather take >>>> away the protection method of crashkernel region since they try to >>>> protect in a chance in one million , while the base page mapping for the >>>> whole linear mapping is mitigating arm64 high end server always. >>> >>> This works for me. We can add the protection later for addresses above >>> 4GB only as mentioned above. >> >> Recently, I've also been rethinking the performance issues when kdump is >> enabled. I have a new idea. For crashkernel=X, we can temporarily search >> for free memory from the low address to the high address. As below: >> >> save_bottom_up = memblock_bottom_up(); >> if (!high) >> memblock_set_bottom_up(true); >> crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); >> memblock_set_bottom_up(save_bottom_up); >> >> The final code change should be small, and I'll try it today. > > I have sent a patchset to remove the crashkernel region protection code > as per Catalin's confirmation. I personally like the code conciseness w/o > protection because kinds of crahskernel reservation has been complex, > the situation on arm64 will makes it worse if we try to keep the > protection and fix the performance issue. While I am glad to see any > attempt to achieve the two goals if it's satisfactory. I saw the patchset. No protection is also a good idea, the code is simplified a lot. > > . >
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 58a0bb2c17f1..307763f97549 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -127,12 +127,13 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) */ static void __init reserve_crashkernel(void) { - unsigned long long crash_base, crash_size; - unsigned long long crash_low_size = 0; + unsigned long long crash_base, crash_size, search_base; unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + unsigned long long crash_low_size = 0; char *cmdline = boot_command_line; - int ret; bool fixed_base = false; + bool high = false; + int ret; if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; @@ -155,7 +156,9 @@ static void __init reserve_crashkernel(void) else if (ret) return; + search_base = CRASH_ADDR_LOW_MAX; crash_max = CRASH_ADDR_HIGH_MAX; + high = true; } else if (ret || !crash_size) { /* The specified value is invalid */ return; @@ -166,31 +169,51 @@ static void __init reserve_crashkernel(void) /* User specifies base address explicitly. */ if (crash_base) { fixed_base = true; + search_base = crash_base; crash_max = crash_base + crash_size; } retry: crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, - crash_base, crash_max); + search_base, crash_max); if (!crash_base) { /* - * If the first attempt was for low memory, fall back to - * high memory, the minimum required low memory will be - * reserved later. + * For crashkernel=size[KMG]@offset[KMG], print out failure + * message if can't reserve the specified region. */ - if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { + if (fixed_base) { + pr_warn("crashkernel reservation failed - memory is in use.\n"); + return; + } + + /* + * For crashkernel=size[KMG], if the first attempt was for + * low memory, fall back to high memory, the minimum required + * low memory will be reserved later. + */ + if (!high && crash_max == CRASH_ADDR_LOW_MAX) { crash_max = CRASH_ADDR_HIGH_MAX; + search_base = CRASH_ADDR_LOW_MAX; crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; goto retry; } + /* + * For crashkernel=size[KMG],high, if the first attempt was + * for high memory, fall back to low memory. + */ + if (high && crash_max == CRASH_ADDR_HIGH_MAX) { + crash_max = CRASH_ADDR_LOW_MAX; + search_base = 0; + goto retry; + } pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); return; } - if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) && - crash_low_size && reserve_crashkernel_low(crash_low_size)) { + if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size && + reserve_crashkernel_low(crash_low_size)) { memblock_phys_free(crash_base, crash_size); return; }
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for suitable memory region top down. If the 'xM' of crashkernel high memory is reserved from high memory successfully, it will try to reserve crashkernel low memory later accoringly. Otherwise, it will try to search low memory area for the 'xM' suitable region. Please see the details in Documentation/admin-guide/kernel-parameters.txt. While we observed an unexpected case where a reserved region crosses the high and low meomry boundary. E.g on a system with 4G as low memory end, user added the kernel parameters like: 'crashkernel=512M,high', it could finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. The crashkernel high region crossing low and high memory boudary will bring issues: 1) For crashkernel=x,high, if getting crashkernel high region across low and high memory boundary, then user will see two memory regions in low memory, and one memory region in high memory. The two crashkernel low memory regions are confusing as shown in above example. 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low" and y <= 128M, when crashkernel high region crosses low and high memory boundary and the part of crashkernel high reservation below boundary is bigger than y, the expected crahskernel low reservation will be skipped. But the expected crashkernel high reservation is shrank and could not satisfy user space requirement. 3) The crossing boundary behaviour of crahskernel high reservation is different than x86 arch. On x86_64, the low memory end is 4G fixedly, and the memory near 4G is reserved by system, e.g for mapping firmware, pci mapping, so the crashkernel reservation crossing boundary never happens. From distros point of view, this brings inconsistency and confusion. Users need to dig into x86 and arm64 system details to find out why. For kernel itself, the impact of issue 3) could be slight. While issue 1) and 2) cause actual impact because it brings obscure semantics and behaviour to crashkernel=,high reservation. Here, for crashkernel=xM,high, search the high memory for the suitable region only in high memory. If failed, try reserving the suitable region only in low memory. Like this, the crashkernel high region will only exist in high memory, and crashkernel low region only exists in low memory. The reservation behaviour for crashkernel=,high is clearer and simpler. Note: On arm64, the high and low memory boudary could be 1G if it's RPi4 system, or 4G if other normal systems. Signed-off-by: Baoquan He <bhe@redhat.com> --- Change history: v3->v4: - Change pr_info() to pr_warn() when fixed allocation failed. Leizhen pointed out this and suggested changing. v2->v3: - Rephrase patch log to clarify the current crashkernel high reservation could cross the high and low memory boundary, but not 4G boundary only, because RPi4 of arm64 has high and low memory boudary as 1G. The v3 patch log could mislead people that the RPi4 also use 4G as high,low memory boundary. v1->v2: - Fold patch 2 of v1 into patch 1 for better reviewing. - Update patch log to add more details. arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++---------- 1 file changed, 33 insertions(+), 10 deletions(-)