mbox series

[v12,0/9] support reserving crashkernel above 4G on arm64 kdump

Message ID 20200907134745.25732-1-chenzhou10@huawei.com (mailing list archive)
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Message

chenzhou Sept. 7, 2020, 1:47 p.m. UTC
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
if the memory reserved for crash dump kernel falled in ZONE_DMA32,
the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone, and fall back to
high allocation if it fails.
If requized size X is too large and leads to very little low memory
in DMA zone after low allocation, the system may not work normally.
So add a threshold and go for high allocation directly if the required
size is too large. The value of threshold is set as the half of
the low memory.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.
For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following nine patches:
0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
0006-arm64-kdump-reimplement-crashkernel-X.patch
0007-kdump-add-threshold-for-the-required-memory.patch
0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
0009-kdump-update-Documentation-about-crashkernel.patch

0001-0003 are some x86 cleanups which prepares for making
functionsreserve_crashkernel[_low]() generic.

0004 makes functions reserve_crashkernel[_low]() generic.
0005-0006 reimplements crashkernel=X.
0007 adds threshold for the required memory.
0008 adds memory for devices by DT property linux,usable-memory-range.
0009 updates the doc.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150

Chen Zhou (9):
  x86: kdump: move CRASH_ALIGN to 2M
  x86: kdump: make the lower bound of crash kernel reservation
    consistent
  x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
    reserve_crashkernel[_low]()
  x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
  arm64: kdump: introduce some macroes for crash kernel reservation
  arm64: kdump: reimplement crashkernel=X
  kdump: add threshold for the required memory
  arm64: kdump: add memory for devices by DT property
    linux,usable-memory-range
  kdump: update Documentation about crashkernel

 Documentation/admin-guide/kdump/kdump.rst     |  25 ++-
 .../admin-guide/kernel-parameters.txt         |  13 +-
 arch/arm64/include/asm/kexec.h                |  15 ++
 arch/arm64/include/asm/processor.h            |   1 +
 arch/arm64/kernel/setup.c                     |  13 +-
 arch/arm64/mm/init.c                          | 105 ++++------
 arch/arm64/mm/mmu.c                           |   4 +
 arch/x86/include/asm/kexec.h                  |  28 +++
 arch/x86/kernel/setup.c                       | 165 +--------------
 include/linux/crash_core.h                    |   4 +
 include/linux/kexec.h                         |   2 -
 kernel/crash_core.c                           | 192 ++++++++++++++++++
 kernel/kexec_core.c                           |  17 --
 13 files changed, 328 insertions(+), 256 deletions(-)

Comments

John Donnelly Sept. 12, 2020, 11:44 a.m. UTC | #1
On 9/7/20 8:47 AM, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone, and fall back to
> high allocation if it fails.
> If requized size X is too large and leads to very little low memory
> in DMA zone after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
> 
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> This patchset contains the following nine patches:
> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-kdump-add-threshold-for-the-required-memory.patch
> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0009-kdump-update-Documentation-about-crashkernel.patch
> 
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
> 
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements crashkernel=X.
> 0007 adds threshold for the required memory.
> 0008 adds memory for devices by DT property linux,usable-memory-range.
> 0009 updates the doc.
> 
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
> 
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> 
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsfMsIet$
> [2]: https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bv1JxB2D$
> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BgTzrgKq$
> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Btz3iM8F$
> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BuqcVDab$
> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bgdlc1Y7$
> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsuuZ6C_$
> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bo4IxHqi$
> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BjlqN_6I$
> [v8]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BlBSztwY$
> [v9]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BoNFCNt9$
> [v10]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/2/1443__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BvfD2Ihf$
> [v11]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/8/1/150__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BohKxmce$
> 
> Chen Zhou (9):
>    x86: kdump: move CRASH_ALIGN to 2M
>    x86: kdump: make the lower bound of crash kernel reservation
>      consistent
>    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>      reserve_crashkernel[_low]()
>    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>    arm64: kdump: introduce some macroes for crash kernel reservation
>    arm64: kdump: reimplement crashkernel=X
>    kdump: add threshold for the required memory
>    arm64: kdump: add memory for devices by DT property
>      linux,usable-memory-range
>    kdump: update Documentation about crashkernel
> 
>   Documentation/admin-guide/kdump/kdump.rst     |  25 ++-
>   .../admin-guide/kernel-parameters.txt         |  13 +-
>   arch/arm64/include/asm/kexec.h                |  15 ++
>   arch/arm64/include/asm/processor.h            |   1 +
>   arch/arm64/kernel/setup.c                     |  13 +-
>   arch/arm64/mm/init.c                          | 105 ++++------
>   arch/arm64/mm/mmu.c                           |   4 +
>   arch/x86/include/asm/kexec.h                  |  28 +++
>   arch/x86/kernel/setup.c                       | 165 +--------------
>   include/linux/crash_core.h                    |   4 +
>   include/linux/kexec.h                         |   2 -
>   kernel/crash_core.c                           | 192 ++++++++++++++++++
>   kernel/kexec_core.c                           |  17 --
>   13 files changed, 328 insertions(+), 256 deletions(-)
> 


I did a brief unit-test on 5.9-rc4.

Please add:


Tested-by:  John Donnelly <John.p.donnelly@oracle.com>


This activity is over a year old. It needs accepted.
chenzhou Sept. 15, 2020, 7:16 a.m. UTC | #2
On 2020/9/7 21:47, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone, and fall back to
> high allocation if it fails.
> If requized size X is too large and leads to very little low memory
> in DMA zone after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following nine patches:
> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-kdump-add-threshold-for-the-required-memory.patch
> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0009-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
>
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements crashkernel=X.
> 0007 adds threshold for the required memory.
> 0008 adds memory for devices by DT property linux,usable-memory-range.
> 0009 updates the doc.
Hi Catalin and Dave,

Any other suggestions about this patchset? Let me know if you have any questions.

Thanks,
Chen Zhou
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19 
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
>
> Chen Zhou (9):
>   x86: kdump: move CRASH_ALIGN to 2M
>   x86: kdump: make the lower bound of crash kernel reservation
>     consistent
>   x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>     reserve_crashkernel[_low]()
>   x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>   arm64: kdump: introduce some macroes for crash kernel reservation
>   arm64: kdump: reimplement crashkernel=X
>   kdump: add threshold for the required memory
>   arm64: kdump: add memory for devices by DT property
>     linux,usable-memory-range
>   kdump: update Documentation about crashkernel
>
>  Documentation/admin-guide/kdump/kdump.rst     |  25 ++-
>  .../admin-guide/kernel-parameters.txt         |  13 +-
>  arch/arm64/include/asm/kexec.h                |  15 ++
>  arch/arm64/include/asm/processor.h            |   1 +
>  arch/arm64/kernel/setup.c                     |  13 +-
>  arch/arm64/mm/init.c                          | 105 ++++------
>  arch/arm64/mm/mmu.c                           |   4 +
>  arch/x86/include/asm/kexec.h                  |  28 +++
>  arch/x86/kernel/setup.c                       | 165 +--------------
>  include/linux/crash_core.h                    |   4 +
>  include/linux/kexec.h                         |   2 -
>  kernel/crash_core.c                           | 192 ++++++++++++++++++
>  kernel/kexec_core.c                           |  17 --
>  13 files changed, 328 insertions(+), 256 deletions(-)
>
John Donnelly Sept. 23, 2020, 5:47 p.m. UTC | #3
> On Sep 15, 2020, at 2:16 AM, chenzhou <chenzhou10@huawei.com> wrote:
> 
> 
> 
> On 2020/9/7 21:47, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>> 
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in DMA zone, and fall back to
>> high allocation if it fails.
>> If requized size X is too large and leads to very little low memory
>> in DMA zone after low allocation, the system may not work normally.
>> So add a threshold and go for high allocation directly if the required
>> size is too large. The value of threshold is set as the half of
>> the low memory.
>> 
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically.
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>> 
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices. So there may be two regions reserved for
>> crash dump kernel.
>> In order to distinct from the high region and make no effect to the use
>> of existing kexec-tools, rename the low region as "Crash kernel (low)",
>> and pass the low region by reusing DT property
>> "linux,usable-memory-range". We made the low memory region as the last
>> range of "linux,usable-memory-range" to keep compatibility with existing
>> user-space and older kdump kernels.
>> 
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>> 
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>> 
>> This patchset contains the following nine patches:
>> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
>> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
>> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
>> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
>> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
>> 0006-arm64-kdump-reimplement-crashkernel-X.patch
>> 0007-kdump-add-threshold-for-the-required-memory.patch
>> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
>> 0009-kdump-update-Documentation-about-crashkernel.patch
>> 
>> 0001-0003 are some x86 cleanups which prepares for making
>> functionsreserve_crashkernel[_low]() generic.
>> 
>> 0004 makes functions reserve_crashkernel[_low]() generic.
>> 0005-0006 reimplements crashkernel=X.
>> 0007 adds threshold for the required memory.
>> 0008 adds memory for devices by DT property linux,usable-memory-range.
>> 0009 updates the doc.
> Hi Catalin and Dave,

  Hi,

   This patch set  has been going on since May, 2019.  When will this be accepted and integrated into a rc build ?

  



> 
> Any other suggestions about this patchset? Let me know if you have any questions.
> 
> Thanks,
> Chen Zhou
>> 
>> Changes since [v11]
>> - Rebased on top of 5.9-rc4.
>> - Make the function reserve_crashkernel() of x86 generic.
>> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
>> and arm64 use the generic version to reimplement crashkernel=X.
>> 
>> Changes since [v10]
>> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>> 
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>> 
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>> 
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>> 
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>> 
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>> 
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>> 
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>> 
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>> 
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>> 
>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_zMCI6U-$ 
>> [2]: https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_5c9NEUf$  
>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_1bFn-eN$ 
>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_wVqWygD$ 
>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_8fQ7uBl$ 
>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_ztbOBKM$ 
>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_9TAk7Oj$ 
>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_9IFx5Hx$ 
>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_0x8im8q$ 
>> [v8]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_yVVP42e$ 
>> [v9]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_y2-BLN1$ 
>> [v10]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/2/1443__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_0qJHLGR$ 
>> [v11]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/8/1/150__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_3QitPUY$ 
>> 
>> Chen Zhou (9):
>>  x86: kdump: move CRASH_ALIGN to 2M
>>  x86: kdump: make the lower bound of crash kernel reservation
>>    consistent
>>  x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>>    reserve_crashkernel[_low]()
>>  x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>>  arm64: kdump: introduce some macroes for crash kernel reservation
>>  arm64: kdump: reimplement crashkernel=X
>>  kdump: add threshold for the required memory
>>  arm64: kdump: add memory for devices by DT property
>>    linux,usable-memory-range
>>  kdump: update Documentation about crashkernel
>> 
>> Documentation/admin-guide/kdump/kdump.rst     |  25 ++-
>> .../admin-guide/kernel-parameters.txt         |  13 +-
>> arch/arm64/include/asm/kexec.h                |  15 ++
>> arch/arm64/include/asm/processor.h            |   1 +
>> arch/arm64/kernel/setup.c                     |  13 +-
>> arch/arm64/mm/init.c                          | 105 ++++------
>> arch/arm64/mm/mmu.c                           |   4 +
>> arch/x86/include/asm/kexec.h                  |  28 +++
>> arch/x86/kernel/setup.c                       | 165 +--------------
>> include/linux/crash_core.h                    |   4 +
>> include/linux/kexec.h                         |   2 -
>> kernel/crash_core.c                           | 192 ++++++++++++++++++
>> kernel/kexec_core.c                           |  17 --
>> 13 files changed, 328 insertions(+), 256 deletions(-)
Catalin Marinas Oct. 5, 2020, 5:09 p.m. UTC | #4
On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> On 9/7/20 8:47 AM, Chen Zhou wrote:
> > Chen Zhou (9):
> >    x86: kdump: move CRASH_ALIGN to 2M
> >    x86: kdump: make the lower bound of crash kernel reservation
> >      consistent
> >    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> >      reserve_crashkernel[_low]()
> >    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> >    arm64: kdump: introduce some macroes for crash kernel reservation
> >    arm64: kdump: reimplement crashkernel=X
> >    kdump: add threshold for the required memory
> >    arm64: kdump: add memory for devices by DT property
> >      linux,usable-memory-range
> >    kdump: update Documentation about crashkernel
[...]
> I did a brief unit-test on 5.9-rc4.
> 
> Please add:
> 
> Tested-by:  John Donnelly <John.p.donnelly@oracle.com>

Thanks for testing.

> This activity is over a year old. It needs accepted.

It's getting there, hopefully in 5.11. There are some minor tweaks to
address.
Bhupesh Sharma Oct. 5, 2020, 5:42 p.m. UTC | #5
Hi Catalin, Chen,

On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> > On 9/7/20 8:47 AM, Chen Zhou wrote:
> > > Chen Zhou (9):
> > >    x86: kdump: move CRASH_ALIGN to 2M
> > >    x86: kdump: make the lower bound of crash kernel reservation
> > >      consistent
> > >    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > >      reserve_crashkernel[_low]()
> > >    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > >    arm64: kdump: introduce some macroes for crash kernel reservation
> > >    arm64: kdump: reimplement crashkernel=X
> > >    kdump: add threshold for the required memory
> > >    arm64: kdump: add memory for devices by DT property
> > >      linux,usable-memory-range
> > >    kdump: update Documentation about crashkernel
> [...]
> > I did a brief unit-test on 5.9-rc4.
> >
> > Please add:
> >
> > Tested-by:  John Donnelly <John.p.donnelly@oracle.com>
>
> Thanks for testing.
>
> > This activity is over a year old. It needs accepted.
>
> It's getting there, hopefully in 5.11. There are some minor tweaks to
> address.

I think my earlier email with the test results on this series bounced
off the mailing list server (for some weird reason), but I still see
several issues with this patchset. I will add specific issues in the
review comments for each patch again, but overall, with a crashkernel
size of say 786M, I see the following issue:

# cat /proc/cmdline
BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
rd.lvm.lv=<..snip..> crashkernel=786M

I see two regions of size 786M and 256M reserved in low and high
regions respectively, So we reserve a total of 1042M of memory, which
is an incorrect behaviour:

# dmesg | grep -i crash
[    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel
(System low RAM: 768MB)
[    0.000000] Reserving 786MB of memory at 654158MB for crashkernel
(System RAM: 130816MB)
[    0.000000] Kernel command line:
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
root=/dev/mapper/rhel_ampere--hr330a--03-root ro
rd.lvm.lv=rhel_ampere-hr330a-03/root
rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M

# cat /proc/iomem | grep -i crash
  b0000000-bfffffff : Crash kernel (low)
  bfcbe00000-bffcffffff : Crash kernel

IMO, we should test this feature more before including this in 5.11

Thanks,
Bhupesh
chenzhou Oct. 6, 2020, 1:48 a.m. UTC | #6
Hi Bhupesh,


On 2020/10/6 1:42, Bhupesh Sharma wrote:
> Hi Catalin, Chen,
>
> On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
>>> On 9/7/20 8:47 AM, Chen Zhou wrote:
>>>> Chen Zhou (9):
>>>>    x86: kdump: move CRASH_ALIGN to 2M
>>>>    x86: kdump: make the lower bound of crash kernel reservation
>>>>      consistent
>>>>    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>>>>      reserve_crashkernel[_low]()
>>>>    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>>>>    arm64: kdump: introduce some macroes for crash kernel reservation
>>>>    arm64: kdump: reimplement crashkernel=X
>>>>    kdump: add threshold for the required memory
>>>>    arm64: kdump: add memory for devices by DT property
>>>>      linux,usable-memory-range
>>>>    kdump: update Documentation about crashkernel
>> [...]
>>> I did a brief unit-test on 5.9-rc4.
>>>
>>> Please add:
>>>
>>> Tested-by:  John Donnelly <John.p.donnelly@oracle.com>
>> Thanks for testing.
>>
>>> This activity is over a year old. It needs accepted.
>> It's getting there, hopefully in 5.11. There are some minor tweaks to
>> address.
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
>
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
> rd.lvm.lv=<..snip..> crashkernel=786M
>
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
>
> # dmesg | grep -i crash
> [    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel
> (System low RAM: 768MB)
> [    0.000000] Reserving 786MB of memory at 654158MB for crashkernel
> (System RAM: 130816MB)
> [    0.000000] Kernel command line:
> BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
> root=/dev/mapper/rhel_ampere--hr330a--03-root ro
> rd.lvm.lv=rhel_ampere-hr330a-03/root
> rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>
> # cat /proc/iomem | grep -i crash
>   b0000000-bfffffff : Crash kernel (low)
>   bfcbe00000-bffcffffff : Crash kernel
>
> IMO, we should test this feature more before including this in 5.11
Thanks for you test. This behavior is what we what. What is the correct behavior you think?

Besides, this feature is been tested by John and PK, and i test for various parameters.
We may miss something, any comments are welcome.

Thanks,
Chen Zhou
>
> Thanks,
> Bhupesh
>
> .
>
Catalin Marinas Oct. 6, 2020, 6 p.m. UTC | #7
On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
> 
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
> 
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
> 
> # dmesg | grep -i crash
> [    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> [    0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> [    0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
> 
> # cat /proc/iomem | grep -i crash
>   b0000000-bfffffff : Crash kernel (low)
>   bfcbe00000-bffcffffff : Crash kernel

As Chen said, that's the intended behaviour and how x86 works. The
requested 768M goes in the high range if there's not enough low memory
and an additional buffer for swiotlb is allocated, hence the low 256M.

We could (as an additional patch), subtract the 256M from the high
allocation so that you'd get a low 256M and a high 512M, not sure it's
worth it. Note that with a "crashkernel=768M,high" option, you still get
the additional low 256M, otherwise the crashkernel won't be able to
boot as there's no memory in ZONE_DMA. In the explicit ",high" request
case, I'm not sure subtracted the 256M is more intuitive.

In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
to cover the entire 32-bit address space (i.e. identical to the current
ZONE_DMA32).

> IMO, we should test this feature more before including this in 5.11

Definitely. That's one of the reasons we haven't queued it yet. So any
help with testing here is appreciated.

Thanks.
Bhupesh Sharma Oct. 7, 2020, 7:07 a.m. UTC | #8
Hi Catalin,

On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > I think my earlier email with the test results on this series bounced
> > off the mailing list server (for some weird reason), but I still see
> > several issues with this patchset. I will add specific issues in the
> > review comments for each patch again, but overall, with a crashkernel
> > size of say 786M, I see the following issue:
> >
> > # cat /proc/cmdline
> > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
> >
> > I see two regions of size 786M and 256M reserved in low and high
> > regions respectively, So we reserve a total of 1042M of memory, which
> > is an incorrect behaviour:
> >
> > # dmesg | grep -i crash
> > [    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> > [    0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> > [    0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
> >
> > # cat /proc/iomem | grep -i crash
> >   b0000000-bfffffff : Crash kernel (low)
> >   bfcbe00000-bffcffffff : Crash kernel
>
> As Chen said, that's the intended behaviour and how x86 works. The
> requested 768M goes in the high range if there's not enough low memory
> and an additional buffer for swiotlb is allocated, hence the low 256M.

I understand, but why 256M (as low) for arm64? x86_64 setups usually
have more system memory available as compared to several commercially
available arm64 setups. So is the intent, just to keep the behavior
similar between arm64 and x86_64?

Should we have a CONFIG option / bootarg to help one select the max
'low_size'? Currently the ' low_size' value is calculated as:

    /*
         * two parts from kernel/dma/swiotlb.c:
         * -swiotlb size: user-specified with swiotlb= or default.
         *
         * -swiotlb overflow buffer: now hardcoded to 32k. We round it
         * to 8M for other buffers that may need to stay low too. Also
         * make sure we allocate enough extra low memory so that we
         * don't run out of DMA buffers for 32-bit devices.
         */
        low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);

Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
bootargs, the low_size, still ends up being 256M in such cases,
whereas this 256M can be used for some other purposes - so should we
be limiting this to 64M and failing the crash kernel allocation
request (gracefully) otherwise?

> We could (as an additional patch), subtract the 256M from the high
> allocation so that you'd get a low 256M and a high 512M, not sure it's
> worth it. Note that with a "crashkernel=768M,high" option, you still get
> the additional low 256M, otherwise the crashkernel won't be able to
> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
> case, I'm not sure subtracted the 256M is more intuitive.

> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
> to cover the entire 32-bit address space (i.e. identical to the current
> ZONE_DMA32).
>
> > IMO, we should test this feature more before including this in 5.11
>
> Definitely. That's one of the reasons we haven't queued it yet. So any
> help with testing here is appreciated.

Sure, I am running more checks on this series. I will be soon back
with more updates.

Regards,
Bhupesh
Catalin Marinas Oct. 7, 2020, 4:33 p.m. UTC | #9
On Wed, Oct 07, 2020 at 12:37:49PM +0530, Bhupesh Sharma wrote:
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > > I think my earlier email with the test results on this series bounced
> > > off the mailing list server (for some weird reason), but I still see
> > > several issues with this patchset. I will add specific issues in the
> > > review comments for each patch again, but overall, with a crashkernel
> > > size of say 786M, I see the following issue:
> > >
> > > # cat /proc/cmdline
> > > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
> > >
> > > I see two regions of size 786M and 256M reserved in low and high
> > > regions respectively, So we reserve a total of 1042M of memory, which
> > > is an incorrect behaviour:
> > >
> > > # dmesg | grep -i crash
> > > [    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> > > [    0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> > > [    0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
> > >
> > > # cat /proc/iomem | grep -i crash
> > >   b0000000-bfffffff : Crash kernel (low)
> > >   bfcbe00000-bffcffffff : Crash kernel
> >
> > As Chen said, that's the intended behaviour and how x86 works. The
> > requested 768M goes in the high range if there's not enough low memory
> > and an additional buffer for swiotlb is allocated, hence the low 256M.
> 
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?

Similar in the sense of the fallback to high memory and some low memory
allocation but the amounts can vary per architecture.

> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
> 
>     /*
>          * two parts from kernel/dma/swiotlb.c:
>          * -swiotlb size: user-specified with swiotlb= or default.
>          *
>          * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>          * to 8M for other buffers that may need to stay low too. Also
>          * make sure we allocate enough extra low memory so that we
>          * don't run out of DMA buffers for 32-bit devices.
>          */
>         low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> 
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?

I think it makes sense to set a low_size = 0 if
swiotlb_size_or_default() is 0. The assumption would be that if the main
kernel doesn't need an swiotlb, the crashdump one wouldn't need it
either. But this probably needs the ZONE_DMA for non-RPi4 platforms
addressed as well (expanded to the whole ZONE_DMA32).
chenzhou Oct. 19, 2020, 2:43 a.m. UTC | #10
Hi Bhupesh,


On 2020/10/7 15:07, Bhupesh Sharma wrote:
> Hi Catalin,
>
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
>>> I think my earlier email with the test results on this series bounced
>>> off the mailing list server (for some weird reason), but I still see
>>> several issues with this patchset. I will add specific issues in the
>>> review comments for each patch again, but overall, with a crashkernel
>>> size of say 786M, I see the following issue:
>>>
>>> # cat /proc/cmdline
>>> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
>>>
>>> I see two regions of size 786M and 256M reserved in low and high
>>> regions respectively, So we reserve a total of 1042M of memory, which
>>> is an incorrect behaviour:
>>>
>>> # dmesg | grep -i crash
>>> [    0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
>>> [    0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
>>> [    0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>>>
>>> # cat /proc/iomem | grep -i crash
>>>   b0000000-bfffffff : Crash kernel (low)
>>>   bfcbe00000-bffcffffff : Crash kernel
>> As Chen said, that's the intended behaviour and how x86 works. The
>> requested 768M goes in the high range if there's not enough low memory
>> and an additional buffer for swiotlb is allocated, hence the low 256M.
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?
>
> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
>
>     /*
>          * two parts from kernel/dma/swiotlb.c:
>          * -swiotlb size: user-specified with swiotlb= or default.
>          *
>          * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>          * to 8M for other buffers that may need to stay low too. Also
>          * make sure we allocate enough extra low memory so that we
>          * don't run out of DMA buffers for 32-bit devices.
>          */
>         low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?
>
>> We could (as an additional patch), subtract the 256M from the high
>> allocation so that you'd get a low 256M and a high 512M, not sure it's
>> worth it. Note that with a "crashkernel=768M,high" option, you still get
>> the additional low 256M, otherwise the crashkernel won't be able to
>> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
>> case, I'm not sure subtracted the 256M is more intuitive.
>> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
>> to cover the entire 32-bit address space (i.e. identical to the current
>> ZONE_DMA32).
>>
>>> IMO, we should test this feature more before including this in 5.11
>> Definitely. That's one of the reasons we haven't queued it yet. So any
>> help with testing here is appreciated.
> Sure, I am running more checks on this series. I will be soon back
> with more updates.

Sorry to bother you. I am looking forward to your review comments.


Thanks,
Chen Zhou
>
> Regards,
> Bhupesh
>
> .
>