diff mbox series

[v2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high

Message ID 20230203075723.114538-1-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] arm64: kdump: simplify the reservation behaviour of crashkernel=,high | expand

Commit Message

Baoquan He Feb. 3, 2023, 7:57 a.m. UTC
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
suitable memory region top down. If the 'xM' of crashkernel high memory
is reserved from high memory successfully, it will try to reserve
crashkernel low memory later accoringly. Otherwise, it will try to search
low memory area for the 'xM' suitable region. Please see the details in
Documentation/admin-guide/kernel-parameters.txt.

While we observed an unexpected case where a reserved region crosses the
high and low meomry boundary. E.g on a system with 4G as low memory end,
user added the kernel parameters like: 'crashkernel=512M,high', it could
finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
The crossing 4G boudary of crashkernel high region will bring issues:

1) For crashkernel=x,high, if getting crashkernel high region across
4G boudary, then user will see two memory regions under 4G, and one
memory region above 4G. The two crashkernel low memory regions are
confusing.

2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
and y <= 128M, when crashkernel high region crosses 4G boudary and the
part below 4G of crashkernel high reservation is bigger than y, the
expected crahskernel low reservation will be skipped. But the expected
crashkernel high reservation is shrank and could not satisfy user space
requirement.

3) The crossing boundary behaviour of crahskernel high reservation is
different than x86 arch. On x86_64, the memory near 4G is reserved by
system, e.g for mapping firmware, pci map. The crashkernel reservation
crossing 4G boundary never happens. From distros point of view, this
brings inconsistency and confusion. Users need to dig into x86 and arm64
details to find out why.

For kernel itself, the impact of issue 3) could be slight. While issue
1) and 2) cause actual impact becuase it brings obscure semantics and
behaviour to crashkernel=,high reservation.

Here, for crashkernel=xM,high, search the high memory for the suitable
region only in high memory. If failed, try reserving the suitable
region only in low memory. Like this, the crashkernel high region will
only exist in high memory, and crashkernel low region only exists in low
memory. The reservation behaviour for crashkernel=,high is clearer and
simpler.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
v1->v2:
 - Fold patch 2 of v1 into patch 1 for better reviewing.
 - Update patch log to add more details.

 arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

Comments

Catalin Marinas Feb. 3, 2023, 5:50 p.m. UTC | #1
On Fri, Feb 03, 2023 at 03:57:23PM +0800, Baoquan He wrote:
> On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> suitable memory region top down. If the 'xM' of crashkernel high memory
> is reserved from high memory successfully, it will try to reserve
> crashkernel low memory later accoringly. Otherwise, it will try to search
> low memory area for the 'xM' suitable region. Please see the details in
> Documentation/admin-guide/kernel-parameters.txt.
> 
> While we observed an unexpected case where a reserved region crosses the
> high and low meomry boundary. E.g on a system with 4G as low memory end,
> user added the kernel parameters like: 'crashkernel=512M,high', it could
> finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> The crossing 4G boudary of crashkernel high region will bring issues:
> 
> 1) For crashkernel=x,high, if getting crashkernel high region across
> 4G boudary, then user will see two memory regions under 4G, and one
> memory region above 4G. The two crashkernel low memory regions are
> confusing.

Looking at your patch, I just realised that the 4G boundary between
'low' and 'high' reservations is not always true. On RPi4, that would be
1GB, the limit of ZONE_DMA. Are there user-space tools that rely on this
32-bit boundary? If they do, they'd get confused on RPi4, not sure they
have the notion of the actual ZONE_DMA that the kernel has enabled. If
we do want ,high to mean always 4G or higher, we'd need to change the
logic a bit so that the search_base starts from 4G rather than
CRASH_ADDR_LOW_MAX. We could leave the latter when ,high was not
specified.

> 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
> and y <= 128M, when crashkernel high region crosses 4G boudary and the
> part below 4G of crashkernel high reservation is bigger than y, the
> expected crahskernel low reservation will be skipped. But the expected
> crashkernel high reservation is shrank and could not satisfy user space
> requirement.

I guess if the user passes both high and low, we should honour that and
ignore any y <= 128M checks.
Baoquan He Feb. 7, 2023, 3:59 a.m. UTC | #2
On 02/03/23 at 05:50pm, Catalin Marinas wrote:
> On Fri, Feb 03, 2023 at 03:57:23PM +0800, Baoquan He wrote:
> > On arm64, reservation for 'crashkernel=xM,high' is taken by searching for
> > suitable memory region top down. If the 'xM' of crashkernel high memory
> > is reserved from high memory successfully, it will try to reserve
> > crashkernel low memory later accoringly. Otherwise, it will try to search
> > low memory area for the 'xM' suitable region. Please see the details in
> > Documentation/admin-guide/kernel-parameters.txt.
> > 
> > While we observed an unexpected case where a reserved region crosses the
> > high and low meomry boundary. E.g on a system with 4G as low memory end,
> > user added the kernel parameters like: 'crashkernel=512M,high', it could
> > finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel.
> > The crossing 4G boudary of crashkernel high region will bring issues:
> > 
> > 1) For crashkernel=x,high, if getting crashkernel high region across
> > 4G boudary, then user will see two memory regions under 4G, and one
> > memory region above 4G. The two crashkernel low memory regions are
> > confusing.
> 
> Looking at your patch, I just realised that the 4G boundary between
> 'low' and 'high' reservations is not always true. On RPi4, that would be
> 1GB, the limit of ZONE_DMA. Are there user-space tools that rely on this
> 32-bit boundary? If they do, they'd get confused on RPi4, not sure they
> have the notion of the actual ZONE_DMA that the kernel has enabled. If
> we do want ,high to mean always 4G or higher, we'd need to change the
> logic a bit so that the search_base starts from 4G rather than
> CRASH_ADDR_LOW_MAX. We could leave the latter when ,high was not
> specified.

Oh, there could be misunderstanding here. In the current arm64, the
boundary of high memory and low memory is CRASH_ADDR_LOW_MAX. It means
on RPi4, it's 1G. While on all other systems, it's 4G. I use 4G as
boundary in patch log, because systems I know and kdump need support are
all product systems at work, e.g system in baremetal server, or guest
instance in cloud. Sorry for that, with the current crashkernel handling
code in arm64, we should cover RPi4 too so that any description is kept
consistent with code implementation.

I took 4G mainly because I took it as an example people can easily
understand. I should use a generic term. So RPi4 still uses 1G as
boundary of high and low memory.

Talking about RPi4, how do you think about my below patchset? I replied
to you in another mail to ask you for help to check below patchset, and
decide which solution we should take to address the base page mapping
for the whole system when crashkernel is set. That mail could be missed.

https://lore.kernel.org/all/Y9zaJim2oGgXMiOS@MiWiFi-R3L-srv/T/#u
==
arm64, kdump: enforce to take 4G as the crashkernel low memory end
https://lore.kernel.org/all/20220828005545.94389-1-bhe@redhat.com/T/#u


> 
> > 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low"
> > and y <= 128M, when crashkernel high region crosses 4G boudary and the
> > part below 4G of crashkernel high reservation is bigger than y, the
> > expected crahskernel low reservation will be skipped. But the expected
> > crashkernel high reservation is shrank and could not satisfy user space
> > requirement.
> 
> I guess if the user passes both high and low, we should honour that and
> ignore any y <= 128M checks.

Yes, agree. In this v2 patch and the earlier v1, the 'y<=128M' checking
has been removed.
diff mbox series

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 58a0bb2c17f1..b8cb780df0cb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -127,12 +127,13 @@  static int __init reserve_crashkernel_low(unsigned long long low_size)
  */
 static void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_base, crash_size;
-	unsigned long long crash_low_size = 0;
+	unsigned long long crash_base, crash_size, search_base;
 	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+	unsigned long long crash_low_size = 0;
 	char *cmdline = boot_command_line;
-	int ret;
 	bool fixed_base = false;
+	bool high = false;
+	int ret;
 
 	if (!IS_ENABLED(CONFIG_KEXEC_CORE))
 		return;
@@ -155,7 +156,9 @@  static void __init reserve_crashkernel(void)
 		else if (ret)
 			return;
 
+		search_base = CRASH_ADDR_LOW_MAX;
 		crash_max = CRASH_ADDR_HIGH_MAX;
+		high = true;
 	} else if (ret || !crash_size) {
 		/* The specified value is invalid */
 		return;
@@ -166,31 +169,51 @@  static void __init reserve_crashkernel(void)
 	/* User specifies base address explicitly. */
 	if (crash_base) {
 		fixed_base = true;
+		search_base = crash_base;
 		crash_max = crash_base + crash_size;
 	}
 
 retry:
 	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-					       crash_base, crash_max);
+					       search_base, crash_max);
 	if (!crash_base) {
 		/*
-		 * If the first attempt was for low memory, fall back to
-		 * high memory, the minimum required low memory will be
-		 * reserved later.
+		 * For crashkernel=size[KMG]@offset[KMG], print out failure
+		 * message if can't reserve the specified region.
 		 */
-		if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+		if (fixed_base) {
+			pr_info("crashkernel reservation failed - memory is in use.\n");
+			return;
+		}
+
+		/*
+		 * For crashkernel=size[KMG], if the first attempt was for
+		 * low memory, fall back to high memory, the minimum required
+		 * low memory will be reserved later.
+		 */
+		if (!high && crash_max == CRASH_ADDR_LOW_MAX) {
 			crash_max = CRASH_ADDR_HIGH_MAX;
+			search_base = CRASH_ADDR_LOW_MAX;
 			crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
 			goto retry;
 		}
 
+		/*
+		 * For crashkernel=size[KMG],high, if the first attempt was
+		 * for high memory, fall back to low memory.
+		 */
+		if (high && crash_max == CRASH_ADDR_HIGH_MAX) {
+			crash_max = CRASH_ADDR_LOW_MAX;
+			search_base = 0;
+			goto retry;
+		}
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
 			crash_size);
 		return;
 	}
 
-	if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
-	     crash_low_size && reserve_crashkernel_low(crash_low_size)) {
+	if ((crash_base >= CRASH_ADDR_LOW_MAX) && crash_low_size &&
+	     reserve_crashkernel_low(crash_low_size)) {
 		memblock_phys_free(crash_base, crash_size);
 		return;
 	}