mbox series

[0/2] arm64, kdump: enforce to take 4G as the crashkernel low memory end

Message ID 20220828005545.94389-1-bhe@redhat.com (mailing list archive)
Headers show
Series arm64, kdump: enforce to take 4G as the crashkernel low memory end | expand

Message

Baoquan He Aug. 28, 2022, 12:55 a.m. UTC
Problem:
=======
On arm64, block and section mapping is supported to build page tables.
However, currently it enforces to take base page mapping for the whole
linear mapping if CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled and
crashkernel kernel parameter is set. This will cause longer time of the
linear mapping process during bootup and severe performance degradation
during running time.

Root cause:
==========
On arm64, crashkernel reservation relies on knowing the upper limit of
low memory zone because it needs to reserve memory in the zone so that
devices' DMA addressing in kdump kernel can be satisfied. However, the
limit on arm64 is variant. And the upper limit can only be decided late
till bootmem_init() is called.

And we need to map the crashkernel region with base page granularity when
doing linear mapping, because kdump needs to protect the crashkernel region
via set_memory_valid(,0) after kdump kernel loading. However, arm64 doesn't
support well on splitting the built block or section mapping due to some
cpu reststriction [1]. And unfortunately, the linear mapping is done before
bootmem_init().

To resolve the above conflict on arm64, the compromise is enforcing to
take base page mapping for the entire linear mapping if crashkernel is
set, and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabed. Hence
performance is sacrificed.

Solution:
=========
To fix the problem, we should always take 4G as the crashkernel low
memory end in case CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled.
With this, we don't need to defer the crashkernel reservation till
bootmem_init() is called to set the arm64_dma_phys_limit. As long as
memblock init is done, we can conclude what is the upper limit of low
memory zone.

1) both CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 are disabled or memblock_start_of_DRAM() > 4G
  limit = PHYS_ADDR_MAX+1  (Corner cases)
2) CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 are enabled:
   limit = 4G  (generic case)

Justification:
==============
In fact, kdump kernel doesn't need to cover all peripherals'
addressing bits. Only device taken as dump target need be taken care of
and its addressing bits need be satified. Currently, there are two kinds
of dumping, dumped to local storage disk or dumped through network card to
remove storage server. It means only storage disk or netowrk card taken
as dump target need be consider if their addressing bits are satisfied.
For saving memory, we usually generate kdump specific initramfs including
necessary kernel modules for dump target devices. All other unnecessary
kernel modules are excluded and their correspondent devices won't be
initialized during kdump kernel bootup.

So far, only Raspberry Pi 4 has some peripherals whcih can only address
30 bits memory range as reported in [2]. Devices on all other arm64 systems
can address 32bits memory range.

So by enforcing to take 4G as the crashkernel low memory end, the only
risk is if RPi4 owns storage disk or network card which can't address
32bits memory range because they could be set as dump target. Even if
RPi4 truly has storage devices or network card which can only address 30
bits memory range, it should be a corner case. We can document it since
crashkernel is more taken as a feature on server. Besides, RPi4 still can
use crashkernel=xM@yM to sepcify a location for 32bits addressing if it
really has that kind of storage device or network card and kdump is expected.

[1]
https://lore.kernel.org/all/YrIIJkhKWSuAqkCx@arm.com/T/#u

[2]
[PATCH v6 0/4] Raspberry Pi 4 DMA addressing support
https://lore.kernel.org/linux-arm-kernel/20190911182546.17094-1-nsaenzjulienne@suse.de/T/


======
Question to Nicolas:

Hi Nicolas,

In cover letter of [2] patchset, you told RPi4 has peripherals which
can only address 30bits range. In below sentence, do you mean "the PCIe,
V3D, GENET" can't address 32bit range, or they have wider view of
address space the same as 40-bit DMA channels? I am confused about that.

And the storage device or network card on RPi4 can address 32bit range
or 32bit range, do we have document or do you happen to know that?

"""
The new Raspberry Pi 4 has up to 4GB of memory but most peripherals can
only address the first GB: their DMA address range is
0xc0000000-0xfc000000 which is aliased to the first GB of physical
memory 0x00000000-0x3c000000. Note that only some peripherals have these
limitations: the PCIe, V3D, GENET, and 40-bit DMA channels have a wider
view of the address space by virtue of being hooked up trough a second
interconnect.
"""


Baoquan He (2):
  arm64, kdump: enforce to take 4G as the crashkernel low memory end
  arm64: remove unneed defer_reserve_crashkernel() and crash_mem_map

 arch/arm64/include/asm/memory.h |  5 ----
 arch/arm64/mm/init.c            | 24 ++++++++-------
 arch/arm64/mm/mmu.c             | 53 ++++++++++++++-------------------
 3 files changed, 36 insertions(+), 46 deletions(-)


base-commit: 10d4879f9ef01cc6190fafe4257d06f375bab92c

Comments

Baoquan He Aug. 28, 2022, 1:57 a.m. UTC | #1
Forgot adding Nicolas when sending patch, add now.

On 08/28/22 at 08:55am, Baoquan He wrote:
> Problem:
> =======
> On arm64, block and section mapping is supported to build page tables.
> However, currently it enforces to take base page mapping for the whole
> linear mapping if CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled and
> crashkernel kernel parameter is set. This will cause longer time of the
> linear mapping process during bootup and severe performance degradation
> during running time.
> 
> Root cause:
> ==========
> On arm64, crashkernel reservation relies on knowing the upper limit of
> low memory zone because it needs to reserve memory in the zone so that
> devices' DMA addressing in kdump kernel can be satisfied. However, the
> limit on arm64 is variant. And the upper limit can only be decided late
> till bootmem_init() is called.
> 
> And we need to map the crashkernel region with base page granularity when
> doing linear mapping, because kdump needs to protect the crashkernel region
> via set_memory_valid(,0) after kdump kernel loading. However, arm64 doesn't
> support well on splitting the built block or section mapping due to some
> cpu reststriction [1]. And unfortunately, the linear mapping is done before
> bootmem_init().
> 
> To resolve the above conflict on arm64, the compromise is enforcing to
> take base page mapping for the entire linear mapping if crashkernel is
> set, and CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabed. Hence
> performance is sacrificed.
> 
> Solution:
> =========
> To fix the problem, we should always take 4G as the crashkernel low
> memory end in case CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 is enabled.
> With this, we don't need to defer the crashkernel reservation till
> bootmem_init() is called to set the arm64_dma_phys_limit. As long as
> memblock init is done, we can conclude what is the upper limit of low
> memory zone.
> 
> 1) both CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 are disabled or memblock_start_of_DRAM() > 4G
>   limit = PHYS_ADDR_MAX+1  (Corner cases)
> 2) CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 are enabled:
>    limit = 4G  (generic case)
> 
> Justification:
> ==============
> In fact, kdump kernel doesn't need to cover all peripherals'
> addressing bits. Only device taken as dump target need be taken care of
> and its addressing bits need be satified. Currently, there are two kinds
> of dumping, dumped to local storage disk or dumped through network card to
> remove storage server. It means only storage disk or netowrk card taken
> as dump target need be consider if their addressing bits are satisfied.
> For saving memory, we usually generate kdump specific initramfs including
> necessary kernel modules for dump target devices. All other unnecessary
> kernel modules are excluded and their correspondent devices won't be
> initialized during kdump kernel bootup.
> 
> So far, only Raspberry Pi 4 has some peripherals whcih can only address
> 30 bits memory range as reported in [2]. Devices on all other arm64 systems
> can address 32bits memory range.
> 
> So by enforcing to take 4G as the crashkernel low memory end, the only
> risk is if RPi4 owns storage disk or network card which can't address
> 32bits memory range because they could be set as dump target. Even if
> RPi4 truly has storage devices or network card which can only address 30
> bits memory range, it should be a corner case. We can document it since
> crashkernel is more taken as a feature on server. Besides, RPi4 still can
> use crashkernel=xM@yM to sepcify a location for 32bits addressing if it
> really has that kind of storage device or network card and kdump is expected.
> 
> [1]
> https://lore.kernel.org/all/YrIIJkhKWSuAqkCx@arm.com/T/#u
> 
> [2]
> [PATCH v6 0/4] Raspberry Pi 4 DMA addressing support
> https://lore.kernel.org/linux-arm-kernel/20190911182546.17094-1-nsaenzjulienne@suse.de/T/
> 
> 
> ======
> Question to Nicolas:
> 
> Hi Nicolas,
> 
> In cover letter of [2] patchset, you told RPi4 has peripherals which
> can only address 30bits range. In below sentence, do you mean "the PCIe,
> V3D, GENET" can't address 32bit range, or they have wider view of
> address space the same as 40-bit DMA channels? I am confused about that.
> 
> And the storage device or network card on RPi4 can address 32bit range
> or 32bit range, do we have document or do you happen to know that?
> 
> """
> The new Raspberry Pi 4 has up to 4GB of memory but most peripherals can
> only address the first GB: their DMA address range is
> 0xc0000000-0xfc000000 which is aliased to the first GB of physical
> memory 0x00000000-0x3c000000. Note that only some peripherals have these
> limitations: the PCIe, V3D, GENET, and 40-bit DMA channels have a wider
> view of the address space by virtue of being hooked up trough a second
> interconnect.
> """
> 
> 
> Baoquan He (2):
>   arm64, kdump: enforce to take 4G as the crashkernel low memory end
>   arm64: remove unneed defer_reserve_crashkernel() and crash_mem_map
> 
>  arch/arm64/include/asm/memory.h |  5 ----
>  arch/arm64/mm/init.c            | 24 ++++++++-------
>  arch/arm64/mm/mmu.c             | 53 ++++++++++++++-------------------
>  3 files changed, 36 insertions(+), 46 deletions(-)
> 
> 
> base-commit: 10d4879f9ef01cc6190fafe4257d06f375bab92c
> -- 
> 2.34.1
>