[v6] ARM: boot: Obtain start of physical memory from DTB

Message ID	20200429082120.16259-1-geert+renesas@glider.be (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Qajn=6N=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 391782083B From: Geert Uytterhoeven <geert+renesas@glider.be> To: Dmitry Osipenko <digetx@gmail.com>, Russell King <linux@armlinux.org.uk>, Nicolas Pitre <nico@fluxnic.net>, Arnd Bergmann <arnd@arndb.de>, Eric Miao <eric.miao@nvidia.com>, =?utf-8?q?Uwe_Kleine-K=C3=B6nig?= <u.kleine-koenig@pengutronix.de> Subject: [PATCH v6] ARM: boot: Obtain start of physical memory from DTB Date: Wed, 29 Apr 2020 10:21:20 +0200 Message-Id: <20200429082120.16259-1-geert+renesas@glider.be> summary: Content analysis details: (-0.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [2a02:1800:110:4:0:0:f00:19 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 SPF_NONE SPF: sender does not publish an SPF Record 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different Precedence: list Cc: Geert Uytterhoeven <geert+renesas@glider.be>, Masahiro Yamada <masahiroy@kernel.org>, linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org, Chris Brandt <chris.brandt@renesas.com>, Ard Biesheuvel <ardb@kernel.org>, linux-arm-kernel@lists.infradead.org, Marek Szyprowski <m.szyprowski@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	[v6] ARM: boot: Obtain start of physical memory from DTB \| expand [v6] ARM: boot: Obtain start of physical memory from DTB

Geert Uytterhoeven April 29, 2020, 8:21 a.m. UTC

Currently, the start address of physical memory is obtained by masking
the program counter with a fixed mask of 0xf8000000.  This mask value
was chosen as a balance between the requirements of different platforms.
However, this does require that the start address of physical memory is
a multiple of 128 MiB, precluding booting Linux on platforms where this
requirement is not fulfilled.

Fix this limitation by obtaining the start address from the DTB instead,
if available (either explicitly passed, or appended to the kernel).
Fall back to the traditional method when needed.

This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
i.e. not at a multiple of 128 MiB.

Suggested-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
---
This depends on "[PATCH v5] ARM: decompressor: simplify libfdt builds"
(https://lore.kernel.org/r/20200422130853.1907809-1-masahiroy@kernel.org)

v6:
  - Rebase on top of "[PATCH v5] ARM: decompressor: simplify libfdt
    builds",
  - Include <linux/libfdt.h> instead of <libfdt.h>,

v5:
  - Add Tested-by, Reviewed-by,
  - Round up start of memory to satisfy 16 MiB alignment rule,

v4:
  - Fix stack location after commit 184bf653a7a452c1 ("ARM:
    decompressor: factor out routine to obtain the inflated image
    size"),

v3:
  - Add Reviewed-by,
  - Fix ATAGs with appended DTB,
  - Add Tested-by,

v2:
  - Use "cmp r0, #-1", instead of "cmn r0, #1",
  - Add missing stack setup,
  - Support appended DTB.
---
 arch/arm/boot/compressed/Makefile            |  5 +-
 arch/arm/boot/compressed/fdt_get_mem_start.c | 56 ++++++++++++++++++++
 arch/arm/boot/compressed/head.S              | 54 ++++++++++++++++++-
 3 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/boot/compressed/fdt_get_mem_start.c

Lukasz Stelmach May 19, 2020, 8:53 a.m. UTC | #1

It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> Currently, the start address of physical memory is obtained by masking
> the program counter with a fixed mask of 0xf8000000.  This mask value
> was chosen as a balance between the requirements of different platforms.
> However, this does require that the start address of physical memory is
> a multiple of 128 MiB, precluding booting Linux on platforms where this
> requirement is not fulfilled.
>
> Fix this limitation by obtaining the start address from the DTB instead,
> if available (either explicitly passed, or appended to the kernel).
> Fall back to the traditional method when needed.
>
> This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> i.e. not at a multiple of 128 MiB.
>
> Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Dmitry Osipenko <digetx@gmail.com>
> ---

[...]

Apparently reading physical memory layout from DTB breaks crashdump
kernels. A crashdump kernel is loaded into a region of memory, that is
reserved in the original (i.e. to be crashed) kernel. The reserved
region is large enough for the crashdump kernel to run completely inside
it and don't modify anything outside it, just read and dump the remains
of the crashed kernel. Using the information from DTB makes the
decompressor place the kernel outside of the dedicated region.

The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
it is decompressed to 0x00008000 (see r4 reported before jumping from
within __enter_kernel). If I were to suggest something, there need to be
one more bit of information passed in the DTB telling the decompressor
to use the old masking technique to determain kernel address. It would
be set in the DTB loaded along with the crashdump kernel.

Despite the fact the kernel is able to start and get quite far it simply
panics (for a reason unknown to me at the moment).

Kind regards,
ŁS

--8<---------------cut here---------------start------------->8---
[   42.358349] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 0: buf=0xf1871bcb bufsz=0x52c870 mem=0x18eb8000 memsz=0x52d000
[   42.374615] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 1: buf=0x012365f6 bufsz=0x5abf mem=0x193f6000 memsz=0x6000
root@target:~# sync && echo c > /proc/sysrq-trigger
[   62.206252] sysrq: Trigger a crash
[   62.209711] Kernel panic - not syncing: sysrq triggered crash
[   62.215548] CPU: 0 PID: 1236 Comm: bash Kdump: loaded Tainted: G        W         5.7.0-rc6-00011-gad3fbe6a883e #174
[   62.226225] Hardware name: BCM2711
[   62.229676] Backtrace:
[   62.232178] [<c010bfa4>] (dump_backtrace) from [<c010c334>] (show_stack+0x20/0x24)
[   62.239863]  r7:00000008 r6:c0b4a48d r5:00000000 r4:c0eb7b18
[   62.245617] [<c010c314>] (show_stack) from [<c03e475c>] (dump_stack+0x20/0x28)
[   62.252954] [<c03e473c>] (dump_stack) from [<c011e368>] (panic+0xf4/0x320)
[   62.259941] [<c011e274>] (panic) from [<c044bb60>] (sysrq_handle_crash+0x1c/0x20)
[   62.267536]  r3:c044bb44 r2:c57e1c21 r1:60000093 r0:c0b4a48d
[   62.273278]  r7:00000008
[   62.275853] [<c044bb44>] (sysrq_handle_crash) from [<c044c198>] (__handle_sysrq+0xa0/0x150)
[   62.284334] [<c044c0f8>] (__handle_sysrq) from [<c044c620>] (write_sysrq_trigger+0x68/0x78)
[   62.292814]  r10:00000002 r9:e9123f50 r8:00000002 r7:012f2408 r6:e9112cc0 r5:c044c5b8
[   62.300757]  r4:00000002
[   62.303335] [<c044c5b8>] (write_sysrq_trigger) from [<c02a7ad4>] (proc_reg_write+0x98/0xa8)
[   62.311808]  r5:c044c5b8 r4:eb655700
[   62.315443] [<c02a7a3c>] (proc_reg_write) from [<c023b080>] (__vfs_write+0x48/0xf4)
[   62.323216]  r9:012f2408 r8:c02a7a3c r7:00000002 r6:e9112cc0 r5:e9123f50 r4:c0e04248
[   62.331077] [<c023b038>] (__vfs_write) from [<c023c900>] (vfs_write+0xa8/0xcc)
[   62.338407]  r8:e9123f50 r7:012f2408 r6:00000002 r5:00000000 r4:e9112cc0
[   62.345211] [<c023c858>] (vfs_write) from [<c023cae0>] (ksys_write+0x78/0xc4)
[   62.352454]  r9:012f2408 r8:e9123f5c r7:c0e04248 r6:e9123f50 r5:012f2408 r4:e9112cc0
[   62.360316] [<c023ca68>] (ksys_write) from [<c023cb44>] (sys_write+0x18/0x1c)
[   62.367559]  r10:00000004 r9:e9122000 r8:c0100264 r7:00000004 r6:b6edcd90 r5:012f2408
[   62.375504]  r4:00000002
[   62.378080] [<c023cb2c>] (sys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[   62.385759] Exception stack(0xe9123fa8 to 0xe9123ff0)
[   62.390889] 3fa0:                   00000002 012f2408 00000001 012f2408 00000002 00000000
[   62.399190] 3fc0: 00000002 012f2408 b6edcd90 00000004 012f2408 00000002 00000000 00118fd8
[   62.407488] 3fe0: 0000006c be82b7e8 b6df7010 b6e546e4
[   62.412647] Loading crashdump kernel...
[   62.416628] Bye!
Uncompressing Linux... done, booting the kernel.
r2:0x193F6000
r4:0x00008000
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.7.0-rc6-00011-gad3fbe6a883e (l.stelmach@AMDC1062) (gcc version 8.3.0 (Debian 8.3.0-2), GNU ld (GNU Binutils for Debian) 2.31.1) #174 Tue May 19
09:37:10 CEST 2020
[    0.000000] CPU: ARMv7 Processor [410fd083] revision 3 (ARMv7), cr=10c5383d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: Raspberry Pi 4 Model B
[    0.000000] earlycon: uart8250 at MMIO32 0xfe215040 (options '')
[    0.000000] printk: bootconsole [uart8250] enabled
[    0.000000] Memory policy: Data cache writeback
[    0.000000] Reserved memory: created CMA memory pool at 0x04000000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] 8<--- cut here ---
[    0.000000] Unable to handle kernel paging request at virtual address d93f6000
[    0.000000] pgd = (ptrval)
[    0.000000] [d93f6000] *pgd=00000000
[    0.000000] Internal error: Oops: 5 [#1] ARM
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc6-00011-gad3fbe6a883e #174
[    0.000000] Hardware name: BCM2711
[    0.000000] PC is at fdt32_ld+0xc/0x18
[    0.000000] LR is at fdt_check_header+0x14/0x15c
[    0.000000] pc : [<c03e4b10>]    lr : [<c03e4c24>]    psr: a00000d3
[    0.000000] sp : c0e01ed8  ip : c0e01ee8  fp : c0e01ee4
[    0.000000] r10: c3ffff40  r9 : c0e2e4f4  r8 : 00000000
[    0.000000] r7 : c0f5b35c  r6 : d93f6000  r5 : c0e085d0  r4 : c0d25510
[    0.000000] r3 : d93f6000  r2 : c0f5b35c  r1 : 00000000  r0 : d93f6000
[    0.000000] Flags: NzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
[    0.000000] Control: 10c5383d  Table: 00004059  DAC: 00000051
[    0.000000] Process swapper (pid: 0, stack limit = 0x(ptrval))
[    0.000000] Stack: (0xc0e01ed8 to 0xc0e02000)
[    0.000000] 1ec0:                                                       c0e01f04 c0e01ee8
[    0.000000] 1ee0: c03e4c24 c03e4b10 c0d25510 c0e085d0 d93f6000 c0f5b35c c0e01f2c c0e01f08
[    0.000000] 1f00: c05f7d94 c03e4c1c c0d25510 c0e085d0 07ffffff 00000000 c0f4c580 c0e2e4f4
[    0.000000] 1f20: c0e01f4c c0e01f30 c0d26a54 c05f7ccc 00000000 c0e01f40 c0123458 c0d2ff08
[    0.000000] 1f40: c0e01fa4 c0e01f50 c0d03c18 c0d26a28 ffffffff 10c5383d c0d0d244 c0e04248
[    0.000000] 1f60: c0b11587 c09000e7 c0e01f94 c0e01f78 c01580e4 00000000 c0e01f9c c0d00330
[    0.000000] 1f80: c0e04248 c0e04240 ffffffff 193f6000 c0eb77fc 10c53c7d c0e01ff4 c0e01fa8
[    0.000000] 1fa0: c0d00b5c c0d035a4 00000000 00000000 00000000 00000000 00000000 c0d31a30
[    0.000000] 1fc0: 00000000 00000000 00000000 c0d00330 00000051 10c03c7d ffffffff 193f6000
[    0.000000] 1fe0: 410fd083 10c53c7d 00000000 c0e01ff8 00000000 c0d00b08 00000000 00000000
[    0.000000] Backtrace:
[    0.000000] [<c03e4b04>] (fdt32_ld) from [<c03e4c24>] (fdt_check_header+0x14/0x15c)
[    0.000000] [<c03e4c10>] (fdt_check_header) from [<c05f7d94>] (__unflatten_device_tree+0xd4/0x284)
[    0.000000]  r7:c0f5b35c r6:d93f6000 r5:c0e085d0 r4:c0d25510
[    0.000000] [<c05f7cc0>] (__unflatten_device_tree) from [<c0d26a54>] (unflatten_device_tree+0x38/0x54)
[    0.000000]  r9:c0e2e4f4 r8:c0f4c580 r7:00000000 r6:07ffffff r5:c0e085d0 r4:c0d25510
[    0.000000] [<c0d26a1c>] (unflatten_device_tree) from [<c0d03c18>] (setup_arch+0x680/0xabc)
[    0.000000]  r4:c0d2ff08
[    0.000000] [<c0d03598>] (setup_arch) from [<c0d00b5c>] (start_kernel+0x60/0x500)
[    0.000000]  r10:10c53c7d r9:c0eb77fc r8:193f6000 r7:ffffffff r6:c0e04240 r5:c0e04248
[    0.000000]  r4:c0d00330
[    0.000000] [<c0d00afc>] (start_kernel) from [<00000000>] (0x0)
[    0.000000]  r10:10c53c7d r9:410fd083 r8:193f6000 r7:ffffffff r6:10c03c7d r5:00000051
[    0.000000]  r4:c0d00330
[    0.000000] Code: c03e49d0 e1a0c00d e92dd800 e24cb004 (e5900000)
[    0.000000] random: get_random_bytes called from init_oops_id+0x30/0x4c with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
--8<---------------cut here---------------end--------------->8---

Russell King (Oracle) May 19, 2020, 9:30 a.m. UTC | #2

On Tue, May 19, 2020 at 10:53:52AM +0200, Lukasz Stelmach wrote:
> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > Currently, the start address of physical memory is obtained by masking
> > the program counter with a fixed mask of 0xf8000000.  This mask value
> > was chosen as a balance between the requirements of different platforms.
> > However, this does require that the start address of physical memory is
> > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > requirement is not fulfilled.
> >
> > Fix this limitation by obtaining the start address from the DTB instead,
> > if available (either explicitly passed, or appended to the kernel).
> > Fall back to the traditional method when needed.
> >
> > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > i.e. not at a multiple of 128 MiB.
> >
> > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> 
> [...]
> 
> Apparently reading physical memory layout from DTB breaks crashdump
> kernels. A crashdump kernel is loaded into a region of memory, that is
> reserved in the original (i.e. to be crashed) kernel. The reserved
> region is large enough for the crashdump kernel to run completely inside
> it and don't modify anything outside it, just read and dump the remains
> of the crashed kernel. Using the information from DTB makes the
> decompressor place the kernel outside of the dedicated region.
> 
> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> it is decompressed to 0x00008000 (see r4 reported before jumping from
> within __enter_kernel).

Right, and it's important that the kernel decompresses to 0x18008000
so it doesn't overwrite memory that was being used by the crashing
kernel, and thus can create a true coredump image of the failed
kernel.  Meanwhile, the DTB still needs to describe the full memory
layout so that we know where memory is located in order to coredump
it properly.

So, this is a flaw with this approach, and will need the commit to be
dropped yet again - this patch is fundamentally incompatible with the
way kexec's crashdump works.

Looking back at the history, we've been trying this approach since
February with four patches submitted to the patch system, and problems
eventually found with each of them.  I think this shows that the way
the decompressor works out where to decompress the kernel to today is
relied upon all over the place, and changing it is always going to
cause problems.  So, I don't think we /can/ change it without causing
a regression for someone.

Geert Uytterhoeven May 19, 2020, 9:44 a.m. UTC | #3

Hi Łukasz

Thanks for your report!

On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > Currently, the start address of physical memory is obtained by masking
> > the program counter with a fixed mask of 0xf8000000.  This mask value
> > was chosen as a balance between the requirements of different platforms.
> > However, this does require that the start address of physical memory is
> > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > requirement is not fulfilled.
> >
> > Fix this limitation by obtaining the start address from the DTB instead,
> > if available (either explicitly passed, or appended to the kernel).
> > Fall back to the traditional method when needed.
> >
> > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > i.e. not at a multiple of 128 MiB.
> >
> > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
>
> [...]
>
> Apparently reading physical memory layout from DTB breaks crashdump
> kernels. A crashdump kernel is loaded into a region of memory, that is
> reserved in the original (i.e. to be crashed) kernel. The reserved
> region is large enough for the crashdump kernel to run completely inside
> it and don't modify anything outside it, just read and dump the remains
> of the crashed kernel. Using the information from DTB makes the
> decompressor place the kernel outside of the dedicated region.
>
> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> it is decompressed to 0x00008000 (see r4 reported before jumping from
> within __enter_kernel). If I were to suggest something, there need to be
> one more bit of information passed in the DTB telling the decompressor
> to use the old masking technique to determain kernel address. It would
> be set in the DTB loaded along with the crashdump kernel.

Shouldn't the DTB passed to the crashkernel describe which region of
memory is to be used instead? Describing "to use the old masking
technique" sounds a bit hackish to me.
I guess it cannot just restrict the /memory node to the reserved region,
as the crashkernel needs to be able to dump the remains of the crashed
kernel, which lie outside this region.
However, something under /chosen should work.

> Despite the fact the kernel is able to start and get quite far it simply
> panics (for a reason unknown to me at the moment).
>
> Kind regards,
> ŁS
>
> --8<---------------cut here---------------start------------->8---
> [   42.358349] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 0: buf=0xf1871bcb bufsz=0x52c870 mem=0x18eb8000 memsz=0x52d000
> [   42.374615] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 1: buf=0x012365f6 bufsz=0x5abf mem=0x193f6000 memsz=0x6000
> root@target:~# sync && echo c > /proc/sysrq-trigger
> [   62.206252] sysrq: Trigger a crash
> [   62.209711] Kernel panic - not syncing: sysrq triggered crash
> [   62.215548] CPU: 0 PID: 1236 Comm: bash Kdump: loaded Tainted: G        W         5.7.0-rc6-00011-gad3fbe6a883e #174
> [   62.226225] Hardware name: BCM2711
> [   62.229676] Backtrace:
> [   62.232178] [<c010bfa4>] (dump_backtrace) from [<c010c334>] (show_stack+0x20/0x24)
> [   62.239863]  r7:00000008 r6:c0b4a48d r5:00000000 r4:c0eb7b18
> [   62.245617] [<c010c314>] (show_stack) from [<c03e475c>] (dump_stack+0x20/0x28)
> [   62.252954] [<c03e473c>] (dump_stack) from [<c011e368>] (panic+0xf4/0x320)
> [   62.259941] [<c011e274>] (panic) from [<c044bb60>] (sysrq_handle_crash+0x1c/0x20)
> [   62.267536]  r3:c044bb44 r2:c57e1c21 r1:60000093 r0:c0b4a48d
> [   62.273278]  r7:00000008
> [   62.275853] [<c044bb44>] (sysrq_handle_crash) from [<c044c198>] (__handle_sysrq+0xa0/0x150)
> [   62.284334] [<c044c0f8>] (__handle_sysrq) from [<c044c620>] (write_sysrq_trigger+0x68/0x78)
> [   62.292814]  r10:00000002 r9:e9123f50 r8:00000002 r7:012f2408 r6:e9112cc0 r5:c044c5b8
> [   62.300757]  r4:00000002
> [   62.303335] [<c044c5b8>] (write_sysrq_trigger) from [<c02a7ad4>] (proc_reg_write+0x98/0xa8)
> [   62.311808]  r5:c044c5b8 r4:eb655700
> [   62.315443] [<c02a7a3c>] (proc_reg_write) from [<c023b080>] (__vfs_write+0x48/0xf4)
> [   62.323216]  r9:012f2408 r8:c02a7a3c r7:00000002 r6:e9112cc0 r5:e9123f50 r4:c0e04248
> [   62.331077] [<c023b038>] (__vfs_write) from [<c023c900>] (vfs_write+0xa8/0xcc)
> [   62.338407]  r8:e9123f50 r7:012f2408 r6:00000002 r5:00000000 r4:e9112cc0
> [   62.345211] [<c023c858>] (vfs_write) from [<c023cae0>] (ksys_write+0x78/0xc4)
> [   62.352454]  r9:012f2408 r8:e9123f5c r7:c0e04248 r6:e9123f50 r5:012f2408 r4:e9112cc0
> [   62.360316] [<c023ca68>] (ksys_write) from [<c023cb44>] (sys_write+0x18/0x1c)
> [   62.367559]  r10:00000004 r9:e9122000 r8:c0100264 r7:00000004 r6:b6edcd90 r5:012f2408
> [   62.375504]  r4:00000002
> [   62.378080] [<c023cb2c>] (sys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
> [   62.385759] Exception stack(0xe9123fa8 to 0xe9123ff0)
> [   62.390889] 3fa0:                   00000002 012f2408 00000001 012f2408 00000002 00000000
> [   62.399190] 3fc0: 00000002 012f2408 b6edcd90 00000004 012f2408 00000002 00000000 00118fd8
> [   62.407488] 3fe0: 0000006c be82b7e8 b6df7010 b6e546e4
> [   62.412647] Loading crashdump kernel...
> [   62.416628] Bye!
> Uncompressing Linux... done, booting the kernel.
> r2:0x193F6000
> r4:0x00008000
> [    0.000000] Booting Linux on physical CPU 0x0
> [    0.000000] Linux version 5.7.0-rc6-00011-gad3fbe6a883e (l.stelmach@AMDC1062) (gcc version 8.3.0 (Debian 8.3.0-2), GNU ld (GNU Binutils for Debian) 2.31.1) #174 Tue May 19
> 09:37:10 CEST 2020

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Russell King (Oracle) May 19, 2020, 9:46 a.m. UTC | #4

On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> Hi Łukasz
> 
> Thanks for your report!
> 
> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > Currently, the start address of physical memory is obtained by masking
> > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > was chosen as a balance between the requirements of different platforms.
> > > However, this does require that the start address of physical memory is
> > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > requirement is not fulfilled.
> > >
> > > Fix this limitation by obtaining the start address from the DTB instead,
> > > if available (either explicitly passed, or appended to the kernel).
> > > Fall back to the traditional method when needed.
> > >
> > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > i.e. not at a multiple of 128 MiB.
> > >
> > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > ---
> >
> > [...]
> >
> > Apparently reading physical memory layout from DTB breaks crashdump
> > kernels. A crashdump kernel is loaded into a region of memory, that is
> > reserved in the original (i.e. to be crashed) kernel. The reserved
> > region is large enough for the crashdump kernel to run completely inside
> > it and don't modify anything outside it, just read and dump the remains
> > of the crashed kernel. Using the information from DTB makes the
> > decompressor place the kernel outside of the dedicated region.
> >
> > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > within __enter_kernel). If I were to suggest something, there need to be
> > one more bit of information passed in the DTB telling the decompressor
> > to use the old masking technique to determain kernel address. It would
> > be set in the DTB loaded along with the crashdump kernel.
> 
> Shouldn't the DTB passed to the crashkernel describe which region of
> memory is to be used instead?

Definitely not.  The crashkernel needs to know where the RAM in the
machine is, so that it can create a coredump of the crashed kernel.

> Describing "to use the old masking technique" sounds a bit hackish to me.
> I guess it cannot just restrict the /memory node to the reserved region,
> as the crashkernel needs to be able to dump the remains of the crashed
> kernel, which lie outside this region.

Correct.

> However, something under /chosen should work.

Yet another sticky plaster...

Geert Uytterhoeven May 19, 2020, 11:21 a.m. UTC | #5

Hi Russell,

CC devicetree

On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > > Currently, the start address of physical memory is obtained by masking
> > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > > was chosen as a balance between the requirements of different platforms.
> > > > However, this does require that the start address of physical memory is
> > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > requirement is not fulfilled.
> > > >
> > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > if available (either explicitly passed, or appended to the kernel).
> > > > Fall back to the traditional method when needed.
> > > >
> > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > i.e. not at a multiple of 128 MiB.
> > > >
> > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > > ---
> > >
> > > [...]
> > >
> > > Apparently reading physical memory layout from DTB breaks crashdump
> > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > region is large enough for the crashdump kernel to run completely inside
> > > it and don't modify anything outside it, just read and dump the remains
> > > of the crashed kernel. Using the information from DTB makes the
> > > decompressor place the kernel outside of the dedicated region.
> > >
> > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > within __enter_kernel). If I were to suggest something, there need to be
> > > one more bit of information passed in the DTB telling the decompressor
> > > to use the old masking technique to determain kernel address. It would
> > > be set in the DTB loaded along with the crashdump kernel.
> >
> > Shouldn't the DTB passed to the crashkernel describe which region of
> > memory is to be used instead?
>
> Definitely not.  The crashkernel needs to know where the RAM in the
> machine is, so that it can create a coredump of the crashed kernel.

So the DTB should describe both ;-)

> > Describing "to use the old masking technique" sounds a bit hackish to me.
> > I guess it cannot just restrict the /memory node to the reserved region,
> > as the crashkernel needs to be able to dump the remains of the crashed
> > kernel, which lie outside this region.
>
> Correct.
>
> > However, something under /chosen should work.
>
> Yet another sticky plaster...

IMHO the old masking technique is the hacky solution covered by
plasters.

DT describes the hardware.  In general, where to put the kernel is a
software policy, and thus doesn't belong in DT, except perhaps under
/chosen.  But that would open another can of worms, as people usually
have no business in specifying where the kernel should be located.
In the crashkernel case, there is a clear separation between memory to
be used by the crashkernel, and memory to be solely inspected by the
crashkernel.

Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says:

    "The client program may access memory not covered by any memory
     reservations (see section 5.3)"

(Section 5.3 "Memory Reservation Block" only talks about structures in
the FDT, not about DTS)

Hence according to the above, the crashkernel is rightfully allowed to
do whatever it wants with all memory under the /memory node.
However, there is also
Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
This suggests the crashkernel should be passed a DTB that contains a
/reserved-memory node, describing which memory cannot be used freely.
Then the decompressor needs to take this into account when deciding
where the put the kernel.

Yes, the above requires changing code. But at least it provides a
path forward, getting rid of the fragile old masking technique.

Thanks for your comments!


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Arnd Bergmann May 19, 2020, 11:28 a.m. UTC | #6

On Tue, May 19, 2020 at 1:21 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:

> >
> > > However, something under /chosen should work.
> >
> > Yet another sticky plaster...
>
> IMHO the old masking technique is the hacky solution covered by
> plasters.
>
> DT describes the hardware.  In general, where to put the kernel is a
> software policy, and thus doesn't belong in DT, except perhaps under
> /chosen.  But that would open another can of worms, as people usually
> have no business in specifying where the kernel should be located.
> In the crashkernel case, there is a clear separation between memory to
> be used by the crashkernel, and memory to be solely inspected by the
> crashkernel.
>
> Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says:
>
>     "The client program may access memory not covered by any memory
>      reservations (see section 5.3)"
>
> (Section 5.3 "Memory Reservation Block" only talks about structures in
> the FDT, not about DTS)
>
> Hence according to the above, the crashkernel is rightfully allowed to
> do whatever it wants with all memory under the /memory node.
> However, there is also
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
> This suggests the crashkernel should be passed a DTB that contains a
> /reserved-memory node, describing which memory cannot be used freely.
> Then the decompressor needs to take this into account when deciding
> where the put the kernel.
>
> Yes, the above requires changing code. But at least it provides a
> path forward, getting rid of the fragile old masking technique.

There is an existing "linux,usable-memory-range" property documented
in Documentation/devicetree/bindings/chosen.txt, which as I understand
is exactly what you are looking for, except that it is currently only
documented for arm64.

Would extending this to arm work?

      Arnd

Russell King (Oracle) May 19, 2020, 11:43 a.m. UTC | #7

On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
> Hi Russell,
> 
> CC devicetree
> 
> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > > > Currently, the start address of physical memory is obtained by masking
> > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > > > was chosen as a balance between the requirements of different platforms.
> > > > > However, this does require that the start address of physical memory is
> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > > requirement is not fulfilled.
> > > > >
> > > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > > if available (either explicitly passed, or appended to the kernel).
> > > > > Fall back to the traditional method when needed.
> > > > >
> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > > i.e. not at a multiple of 128 MiB.
> > > > >
> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > > > ---
> > > >
> > > > [...]
> > > >
> > > > Apparently reading physical memory layout from DTB breaks crashdump
> > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > > region is large enough for the crashdump kernel to run completely inside
> > > > it and don't modify anything outside it, just read and dump the remains
> > > > of the crashed kernel. Using the information from DTB makes the
> > > > decompressor place the kernel outside of the dedicated region.
> > > >
> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > > within __enter_kernel). If I were to suggest something, there need to be
> > > > one more bit of information passed in the DTB telling the decompressor
> > > > to use the old masking technique to determain kernel address. It would
> > > > be set in the DTB loaded along with the crashdump kernel.
> > >
> > > Shouldn't the DTB passed to the crashkernel describe which region of
> > > memory is to be used instead?
> >
> > Definitely not.  The crashkernel needs to know where the RAM in the
> > machine is, so that it can create a coredump of the crashed kernel.
> 
> So the DTB should describe both ;-)
> 
> > > Describing "to use the old masking technique" sounds a bit hackish to me.
> > > I guess it cannot just restrict the /memory node to the reserved region,
> > > as the crashkernel needs to be able to dump the remains of the crashed
> > > kernel, which lie outside this region.
> >
> > Correct.
> >
> > > However, something under /chosen should work.
> >
> > Yet another sticky plaster...
> 
> IMHO the old masking technique is the hacky solution covered by
> plasters.

One line of code is not "covered by plasters".  There are no plasters.
It's a solution that works for 99.99% of people, unlike your approach
that has had a stream of issues over the last four months, and has
required many reworks of the code to fix each one.  That in itself
speaks volumes about the suitability of the approach.

> DT describes the hardware.

Right, so DT is correct.

> In general, where to put the kernel is a
> software policy, and thus doesn't belong in DT, except perhaps under
> /chosen.  But that would open another can of worms, as people usually
> have no business in specifying where the kernel should be located.
> In the crashkernel case, there is a clear separation between memory to
> be used by the crashkernel, and memory to be solely inspected by the
> crashkernel.
> 
> Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says:
> 
>     "The client program may access memory not covered by any memory
>      reservations (see section 5.3)"
> 
> (Section 5.3 "Memory Reservation Block" only talks about structures in
> the FDT, not about DTS)
> 
> Hence according to the above, the crashkernel is rightfully allowed to
> do whatever it wants with all memory under the /memory node.
> However, there is also
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
> This suggests the crashkernel should be passed a DTB that contains a
> /reserved-memory node, describing which memory cannot be used freely.
> Then the decompressor needs to take this into account when deciding
> where the put the kernel.

So you want to throw yet more complexity at this solution to try and
make it work...

> Yes, the above requires changing code. But at least it provides a
> path forward, getting rid of the fragile old masking technique.

It's hardly fragile when it's worked fine for the last 20+ years,
whereas your solution can't work without some regression being reported
within a couple of weeks of it being applied.  Again, that speaks
volumes about which one is better than the other.

Continually patching this approach to workaround one issue after another
shows that it is _this_ solution that is the fragile implementation.

A fragile implementation is by definition one that keeps breaking.
That's yours.  The masking approach hasn't "broken" for anyone, and
hasn't been the cause of one single regression anywhere.  Yes, there
are some platforms that it doesn't work for (because they choose to
reserve the first chunk of RAM for something) but that is not a
regression.

So, I'm not going to apply the next revision of this patch for at least
one whole kernel cycle - that means scheduling it for 5.10-rc at the
earliest.

Lukasz Stelmach May 19, 2020, 11:46 a.m. UTC | #8

It was <2020-05-19 wto 13:21>, when Geert Uytterhoeven wrote:
> Hi Russell,
>
> CC devicetree
>
> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
>>>>> Currently, the start address of physical memory is obtained by masking
>>>>> the program counter with a fixed mask of 0xf8000000.  This mask value
>>>>> was chosen as a balance between the requirements of different platforms.
>>>>> However, this does require that the start address of physical memory is
>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this
>>>>> requirement is not fulfilled.
>>>>>
>>>>> Fix this limitation by obtaining the start address from the DTB instead,
>>>>> if available (either explicitly passed, or appended to the kernel).
>>>>> Fall back to the traditional method when needed.
>>>>>
>>>>> This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
>>>>> on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
>>>>> i.e. not at a multiple of 128 MiB.
>>>>>
>>>>> Suggested-by: Nicolas Pitre <nico@fluxnic.net>
>>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>>>> Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
>>>>> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> Tested-by: Dmitry Osipenko <digetx@gmail.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>> Apparently reading physical memory layout from DTB breaks crashdump
>>>> kernels. A crashdump kernel is loaded into a region of memory, that
>>>> is reserved in the original (i.e. to be crashed) kernel. The
>>>> reserved region is large enough for the crashdump kernel to run
>>>> completely inside it and don't modify anything outside it, just
>>>> read and dump the remains of the crashed kernel. Using the
>>>> information from DTB makes the decompressor place the kernel
>>>> outside of the dedicated region.
>>>>
>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000
>>>> and 0x193f6000 (physical). The kernel is expected to run at
>>>> 0x18008000, but it is decompressed to 0x00008000 (see r4 reported
>>>> before jumping from within __enter_kernel). If I were to suggest
>>>> something, there need to be one more bit of information passed in
>>>> the DTB telling the decompressor to use the old masking technique
>>>> to determain kernel address. It would be set in the DTB loaded
>>>> along with the crashdump kernel.
>>>
>>> Shouldn't the DTB passed to the crashkernel describe which region of
>>> memory is to be used instead?
>>
>> Definitely not.  The crashkernel needs to know where the RAM in the
>> machine is, so that it can create a coredump of the crashed kernel.
>
> So the DTB should describe both ;-)

So we can do without the mem= cmdline option which is required
now. Sounds reasonable to me.

Geert Uytterhoeven May 19, 2020, 12:11 p.m. UTC | #9

Hi Arnd,

On Tue, May 19, 2020 at 1:28 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, May 19, 2020 at 1:21 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
>
> > >
> > > > However, something under /chosen should work.
> > >
> > > Yet another sticky plaster...
> >
> > IMHO the old masking technique is the hacky solution covered by
> > plasters.
> >
> > DT describes the hardware.  In general, where to put the kernel is a
> > software policy, and thus doesn't belong in DT, except perhaps under
> > /chosen.  But that would open another can of worms, as people usually
> > have no business in specifying where the kernel should be located.
> > In the crashkernel case, there is a clear separation between memory to
> > be used by the crashkernel, and memory to be solely inspected by the
> > crashkernel.
> >
> > Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says:
> >
> >     "The client program may access memory not covered by any memory
> >      reservations (see section 5.3)"
> >
> > (Section 5.3 "Memory Reservation Block" only talks about structures in
> > the FDT, not about DTS)
> >
> > Hence according to the above, the crashkernel is rightfully allowed to
> > do whatever it wants with all memory under the /memory node.
> > However, there is also
> > Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
> > This suggests the crashkernel should be passed a DTB that contains a
> > /reserved-memory node, describing which memory cannot be used freely.
> > Then the decompressor needs to take this into account when deciding
> > where the put the kernel.
> >
> > Yes, the above requires changing code. But at least it provides a
> > path forward, getting rid of the fragile old masking technique.
>
> There is an existing "linux,usable-memory-range" property documented
> in Documentation/devicetree/bindings/chosen.txt, which as I understand
> is exactly what you are looking for, except that it is currently only
> documented for arm64.

Thank you, that looks appropriate!

It seems this is not really used by the early startup code.
Is that because the early startup code always runs in-place, and the
kernel image is not even copied?

> Would extending this to arm work?

Let's see.... Th arm early boot code seems to be more complex than the
arm64 code ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Lukasz Stelmach May 19, 2020, 12:20 p.m. UTC | #10

It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote:
> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
>> <linux@armlinux.org.uk> wrote:
>> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
>> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
>> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
>> > > > > Currently, the start address of physical memory is obtained by masking
>> > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
>> > > > > was chosen as a balance between the requirements of different platforms.
>> > > > > However, this does require that the start address of physical memory is
>> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
>> > > > > requirement is not fulfilled.
>> > > > >
>> > > > > Fix this limitation by obtaining the start address from the DTB instead,
>> > > > > if available (either explicitly passed, or appended to the kernel).
>> > > > > Fall back to the traditional method when needed.
>> > > > >
>> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
>> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
>> > > > > i.e. not at a multiple of 128 MiB.
>> > > > >
>> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
>> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
>> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
>> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
>> > > > > ---
>> > > >
>> > > > [...]
>> > > >
>> > > > Apparently reading physical memory layout from DTB breaks crashdump
>> > > > kernels. A crashdump kernel is loaded into a region of memory, that is
>> > > > reserved in the original (i.e. to be crashed) kernel. The reserved
>> > > > region is large enough for the crashdump kernel to run completely inside
>> > > > it and don't modify anything outside it, just read and dump the remains
>> > > > of the crashed kernel. Using the information from DTB makes the
>> > > > decompressor place the kernel outside of the dedicated region.
>> > > >
>> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
>> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
>> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
>> > > > within __enter_kernel). If I were to suggest something, there need to be
>> > > > one more bit of information passed in the DTB telling the decompressor
>> > > > to use the old masking technique to determain kernel address. It would
>> > > > be set in the DTB loaded along with the crashdump kernel.
>> > >
>> > > Shouldn't the DTB passed to the crashkernel describe which region of
>> > > memory is to be used instead?
>> >
>> > Definitely not.  The crashkernel needs to know where the RAM in the
>> > machine is, so that it can create a coredump of the crashed kernel.
>> 
>> So the DTB should describe both ;-)
>> 
>> > > Describing "to use the old masking technique" sounds a bit hackish to me.
>> > > I guess it cannot just restrict the /memory node to the reserved region,
>> > > as the crashkernel needs to be able to dump the remains of the crashed
>> > > kernel, which lie outside this region.
>> >
>> > Correct.
>> >
>> > > However, something under /chosen should work.
>> >
>> > Yet another sticky plaster...
>> 
>> IMHO the old masking technique is the hacky solution covered by
>> plasters.
>
> One line of code is not "covered by plasters".  There are no plasters.
> It's a solution that works for 99.99% of people, unlike your approach
> that has had a stream of issues over the last four months, and has
> required many reworks of the code to fix each one.  That in itself
> speaks volumes about the suitability of the approach.

As I have been working with kexec code (patches soon) I would like to
defend the DT approach a bit. It allows to avoid zImage relocation when
a decompressed kernel is larger than ~128MiB. In such case zImage isn't
small either and moving it around takes some time.

Russell King (Oracle) May 19, 2020, 12:27 p.m. UTC | #11

On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote:
> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote:
> > On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
> >> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> >> <linux@armlinux.org.uk> wrote:
> >> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> >> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> >> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> >> > > > > Currently, the start address of physical memory is obtained by masking
> >> > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> >> > > > > was chosen as a balance between the requirements of different platforms.
> >> > > > > However, this does require that the start address of physical memory is
> >> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> >> > > > > requirement is not fulfilled.
> >> > > > >
> >> > > > > Fix this limitation by obtaining the start address from the DTB instead,
> >> > > > > if available (either explicitly passed, or appended to the kernel).
> >> > > > > Fall back to the traditional method when needed.
> >> > > > >
> >> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> >> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> >> > > > > i.e. not at a multiple of 128 MiB.
> >> > > > >
> >> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> >> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> >> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> >> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> >> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> >> > > > > ---
> >> > > >
> >> > > > [...]
> >> > > >
> >> > > > Apparently reading physical memory layout from DTB breaks crashdump
> >> > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> >> > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> >> > > > region is large enough for the crashdump kernel to run completely inside
> >> > > > it and don't modify anything outside it, just read and dump the remains
> >> > > > of the crashed kernel. Using the information from DTB makes the
> >> > > > decompressor place the kernel outside of the dedicated region.
> >> > > >
> >> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> >> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> >> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> >> > > > within __enter_kernel). If I were to suggest something, there need to be
> >> > > > one more bit of information passed in the DTB telling the decompressor
> >> > > > to use the old masking technique to determain kernel address. It would
> >> > > > be set in the DTB loaded along with the crashdump kernel.
> >> > >
> >> > > Shouldn't the DTB passed to the crashkernel describe which region of
> >> > > memory is to be used instead?
> >> >
> >> > Definitely not.  The crashkernel needs to know where the RAM in the
> >> > machine is, so that it can create a coredump of the crashed kernel.
> >> 
> >> So the DTB should describe both ;-)
> >> 
> >> > > Describing "to use the old masking technique" sounds a bit hackish to me.
> >> > > I guess it cannot just restrict the /memory node to the reserved region,
> >> > > as the crashkernel needs to be able to dump the remains of the crashed
> >> > > kernel, which lie outside this region.
> >> >
> >> > Correct.
> >> >
> >> > > However, something under /chosen should work.
> >> >
> >> > Yet another sticky plaster...
> >> 
> >> IMHO the old masking technique is the hacky solution covered by
> >> plasters.
> >
> > One line of code is not "covered by plasters".  There are no plasters.
> > It's a solution that works for 99.99% of people, unlike your approach
> > that has had a stream of issues over the last four months, and has
> > required many reworks of the code to fix each one.  That in itself
> > speaks volumes about the suitability of the approach.
> 
> As I have been working with kexec code (patches soon) I would like to
> defend the DT approach a bit. It allows to avoid zImage relocation when
> a decompressed kernel is larger than ~128MiB. In such case zImage isn't
> small either and moving it around takes some time.

... which is something that has been supported for a very long time,
before the days of DT.

Lukasz Stelmach May 19, 2020, 12:49 p.m. UTC | #12

It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote:
> On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote:
>> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote:
>>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
>>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
>>>> <linux@armlinux.org.uk> wrote:
>>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
>>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
>>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
>>>>>>>> Currently, the start address of physical memory is obtained by masking
>>>>>>>> the program counter with a fixed mask of 0xf8000000.  This mask value
>>>>>>>> was chosen as a balance between the requirements of different platforms.
>>>>>>>> However, this does require that the start address of physical memory is
>>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this
>>>>>>>> requirement is not fulfilled.
>>>>>>>>
>>>>>>>> Fix this limitation by obtaining the start address from the DTB instead,
>>>>>>>> if available (either explicitly passed, or appended to the kernel).
>>>>>>>> Fall back to the traditional method when needed.
[...]
>>>>>>> Apparently reading physical memory layout from DTB breaks crashdump
>>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is
>>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved
>>>>>>> region is large enough for the crashdump kernel to run completely inside
>>>>>>> it and don't modify anything outside it, just read and dump the remains
>>>>>>> of the crashed kernel. Using the information from DTB makes the
>>>>>>> decompressor place the kernel outside of the dedicated region.
>>>>>>>
>>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
>>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
>>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from
>>>>>>> within __enter_kernel). If I were to suggest something, there need to be
>>>>>>> one more bit of information passed in the DTB telling the decompressor
>>>>>>> to use the old masking technique to determain kernel address. It would
>>>>>>> be set in the DTB loaded along with the crashdump kernel.
[...]
>>>>>> Describing "to use the old masking technique" sounds a bit hackish to me.
>>>>>> I guess it cannot just restrict the /memory node to the reserved region,
>>>>>> as the crashkernel needs to be able to dump the remains of the crashed
>>>>>> kernel, which lie outside this region.
>>>>>
>>>>> Correct.
>>>>>
>>>>>> However, something under /chosen should work.
>>>>>
>>>>> Yet another sticky plaster...
>>>> 
>>>> IMHO the old masking technique is the hacky solution covered by
>>>> plasters.
>>>
>>> One line of code is not "covered by plasters".  There are no plasters.
>>> It's a solution that works for 99.99% of people, unlike your approach
>>> that has had a stream of issues over the last four months, and has
>>> required many reworks of the code to fix each one.  That in itself
>>> speaks volumes about the suitability of the approach.
>> 
>> As I have been working with kexec code (patches soon) I would like to
>> defend the DT approach a bit. It allows to avoid zImage relocation when
>> a decompressed kernel is larger than ~128MiB. In such case zImage isn't
>> small either and moving it around takes some time.
>
> ... which is something that has been supported for a very long time,
> before the days of DT.

How? If a decompressed kernel requires >128M and a bootloader would like
to put a zImage high enough to *avoid* copying it once again, then the
decompressor can't see any memory below the 128M window it starts in and
can't decompress the kernel there. If we do not care about copying
zImage, then, indeed, everything works fine as it is today. You are
most probably right 99% doesn't require 128M kernel, but the case is
IMHO obvious enough, that it should be adressed somehow.

Kind regards,

Russell King (Oracle) May 19, 2020, 1:12 p.m. UTC | #13

On Tue, May 19, 2020 at 02:49:57PM +0200, Lukasz Stelmach wrote:
> It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote:
> > On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote:
> >> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote:
> >>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
> >>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> >>>> <linux@armlinux.org.uk> wrote:
> >>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> >>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> >>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> >>>>>>>> Currently, the start address of physical memory is obtained by masking
> >>>>>>>> the program counter with a fixed mask of 0xf8000000.  This mask value
> >>>>>>>> was chosen as a balance between the requirements of different platforms.
> >>>>>>>> However, this does require that the start address of physical memory is
> >>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this
> >>>>>>>> requirement is not fulfilled.
> >>>>>>>>
> >>>>>>>> Fix this limitation by obtaining the start address from the DTB instead,
> >>>>>>>> if available (either explicitly passed, or appended to the kernel).
> >>>>>>>> Fall back to the traditional method when needed.
> [...]
> >>>>>>> Apparently reading physical memory layout from DTB breaks crashdump
> >>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is
> >>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved
> >>>>>>> region is large enough for the crashdump kernel to run completely inside
> >>>>>>> it and don't modify anything outside it, just read and dump the remains
> >>>>>>> of the crashed kernel. Using the information from DTB makes the
> >>>>>>> decompressor place the kernel outside of the dedicated region.
> >>>>>>>
> >>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> >>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> >>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from
> >>>>>>> within __enter_kernel). If I were to suggest something, there need to be
> >>>>>>> one more bit of information passed in the DTB telling the decompressor
> >>>>>>> to use the old masking technique to determain kernel address. It would
> >>>>>>> be set in the DTB loaded along with the crashdump kernel.
> [...]
> >>>>>> Describing "to use the old masking technique" sounds a bit hackish to me.
> >>>>>> I guess it cannot just restrict the /memory node to the reserved region,
> >>>>>> as the crashkernel needs to be able to dump the remains of the crashed
> >>>>>> kernel, which lie outside this region.
> >>>>>
> >>>>> Correct.
> >>>>>
> >>>>>> However, something under /chosen should work.
> >>>>>
> >>>>> Yet another sticky plaster...
> >>>> 
> >>>> IMHO the old masking technique is the hacky solution covered by
> >>>> plasters.
> >>>
> >>> One line of code is not "covered by plasters".  There are no plasters.
> >>> It's a solution that works for 99.99% of people, unlike your approach
> >>> that has had a stream of issues over the last four months, and has
> >>> required many reworks of the code to fix each one.  That in itself
> >>> speaks volumes about the suitability of the approach.
> >> 
> >> As I have been working with kexec code (patches soon) I would like to
> >> defend the DT approach a bit. It allows to avoid zImage relocation when
> >> a decompressed kernel is larger than ~128MiB. In such case zImage isn't
> >> small either and moving it around takes some time.
> >
> > ... which is something that has been supported for a very long time,
> > before the days of DT.
> 
> How? If a decompressed kernel requires >128M and a bootloader would like
> to put a zImage high enough to *avoid* copying it once again, then the
> decompressor can't see any memory below the 128M window it starts in and
> can't decompress the kernel there.

Do you have such a large kernel?  It would be rather inefficient as
branch instructions could not be used; every function call would have
to be indirect.  The maximum is +/- 32MB for a branch.

> If we do not care about copying
> zImage, then, indeed, everything works fine as it is today. You are
> most probably right 99% doesn't require 128M kernel, but the case is
> IMHO obvious enough, that it should be adressed somehow.

If I have a kernel in excess of 4GB... "it should be addressed somehow"!

Ard Biesheuvel May 19, 2020, 1:56 p.m. UTC | #14

On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Russell,
>
> CC devicetree
>
> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > > > Currently, the start address of physical memory is obtained by masking
> > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > > > was chosen as a balance between the requirements of different platforms.
> > > > > However, this does require that the start address of physical memory is
> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > > requirement is not fulfilled.
> > > > >
> > > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > > if available (either explicitly passed, or appended to the kernel).
> > > > > Fall back to the traditional method when needed.
> > > > >
> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > > i.e. not at a multiple of 128 MiB.
> > > > >
> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > > > ---
> > > >
> > > > [...]
> > > >
> > > > Apparently reading physical memory layout from DTB breaks crashdump
> > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > > region is large enough for the crashdump kernel to run completely inside
> > > > it and don't modify anything outside it, just read and dump the remains
> > > > of the crashed kernel. Using the information from DTB makes the
> > > > decompressor place the kernel outside of the dedicated region.
> > > >
> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > > within __enter_kernel). If I were to suggest something, there need to be
> > > > one more bit of information passed in the DTB telling the decompressor
> > > > to use the old masking technique to determain kernel address. It would
> > > > be set in the DTB loaded along with the crashdump kernel.
> > >
> > > Shouldn't the DTB passed to the crashkernel describe which region of
> > > memory is to be used instead?
> >
> > Definitely not.  The crashkernel needs to know where the RAM in the
> > machine is, so that it can create a coredump of the crashed kernel.
>
> So the DTB should describe both ;-)
>
> > > Describing "to use the old masking technique" sounds a bit hackish to me.
> > > I guess it cannot just restrict the /memory node to the reserved region,
> > > as the crashkernel needs to be able to dump the remains of the crashed
> > > kernel, which lie outside this region.
> >
> > Correct.
> >
> > > However, something under /chosen should work.
> >
> > Yet another sticky plaster...
>
> IMHO the old masking technique is the hacky solution covered by
> plasters.
>

I think debating which solution is the hacky one will not get us anywhere.

The simple reality is that the existing solution works fine for
existing platforms, and so any changes in the logic will have to be
opt-in in one way or the other.

Since U-boot supports EFI boot these days, one potential option is to
rely on that. I have some changes implementing this that go on top of
this patch, but they don't actually rely on it - it was just to
prevent lexical conflicts.

The only remaining options imo are a kernel command line option, or a
DT property that tells the decompressor to look at the memory nodes.
But using the DT memory nodes on all platforms like this patch does is
obviously just too risky.

On another note, I do think the usable-memory-region property should
be implemented for ARM as well - relying on this rounding to ensure
that the decompressor does the right thing is too fragile.

Lukasz Stelmach May 19, 2020, 2:02 p.m. UTC | #15

It was <2020-05-19 wto 14:12>, when Russell King - ARM Linux admin wrote:
> On Tue, May 19, 2020 at 02:49:57PM +0200, Lukasz Stelmach wrote:
>> It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote:
>>> On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote:
>>>> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote:
>>>>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
>>>>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
>>>>>> <linux@armlinux.org.uk> wrote:
>>>>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
>>>>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
>>>>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
>>>>>>>>>> Currently, the start address of physical memory is obtained by masking
>>>>>>>>>> the program counter with a fixed mask of 0xf8000000.  This mask value
>>>>>>>>>> was chosen as a balance between the requirements of different platforms.
>>>>>>>>>> However, this does require that the start address of physical memory is
>>>>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this
>>>>>>>>>> requirement is not fulfilled.
>>>>>>>>>>
>>>>>>>>>> Fix this limitation by obtaining the start address from the DTB instead,
>>>>>>>>>> if available (either explicitly passed, or appended to the kernel).
>>>>>>>>>> Fall back to the traditional method when needed.
>> [...]
>>>>>>>>> Apparently reading physical memory layout from DTB breaks crashdump
>>>>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is
>>>>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved
>>>>>>>>> region is large enough for the crashdump kernel to run completely inside
>>>>>>>>> it and don't modify anything outside it, just read and dump the remains
>>>>>>>>> of the crashed kernel. Using the information from DTB makes the
>>>>>>>>> decompressor place the kernel outside of the dedicated region.
>>>>>>>>>
>>>>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
>>>>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
>>>>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from
>>>>>>>>> within __enter_kernel). If I were to suggest something, there need to be
>>>>>>>>> one more bit of information passed in the DTB telling the decompressor
>>>>>>>>> to use the old masking technique to determain kernel address. It would
>>>>>>>>> be set in the DTB loaded along with the crashdump kernel.
>> [...]
>>>>>>>> Describing "to use the old masking technique" sounds a bit hackish to me.
>>>>>>>> I guess it cannot just restrict the /memory node to the reserved region,
>>>>>>>> as the crashkernel needs to be able to dump the remains of the crashed
>>>>>>>> kernel, which lie outside this region.
>>>>>>>
>>>>>>> Correct.
>>>>>>>
>>>>>>>> However, something under /chosen should work.
>>>>>>>
>>>>>>> Yet another sticky plaster...
>>>>>> 
>>>>>> IMHO the old masking technique is the hacky solution covered by
>>>>>> plasters.
>>>>>
>>>>> One line of code is not "covered by plasters".  There are no plasters.
>>>>> It's a solution that works for 99.99% of people, unlike your approach
>>>>> that has had a stream of issues over the last four months, and has
>>>>> required many reworks of the code to fix each one.  That in itself
>>>>> speaks volumes about the suitability of the approach.
>>>> 
>>>> As I have been working with kexec code (patches soon) I would like to
>>>> defend the DT approach a bit. It allows to avoid zImage relocation when
>>>> a decompressed kernel is larger than ~128MiB. In such case zImage isn't
>>>> small either and moving it around takes some time.
>>>
>>> ... which is something that has been supported for a very long time,
>>> before the days of DT.
>> 
>> How? If a decompressed kernel requires >128M and a bootloader would like
>> to put a zImage high enough to *avoid* copying it once again, then the
>> decompressor can't see any memory below the 128M window it starts in and
>> can't decompress the kernel there.
>
> Do you have such a large kernel?  It would be rather inefficient as
> branch instructions could not be used; every function call would have
> to be indirect.  The maximum is +/- 32MB for a branch.

This number includes data, particularly initramfs which may be linked
into the kernel. Assuming kernel <32MB (15MB like min ATM) we are left
with slightly more than 100MB. Which isn't that much if you would like
it to be root file system for your device.

Of course initramfs could be loaded separately by a bootloader and yet
having both kernel and initramfs in one file has some advantages.

I am not here to argue. It's your call.

>> If we do not care about copying
>> zImage, then, indeed, everything works fine as it is today. You are
>> most probably right 99% doesn't require 128M kernel, but the case is
>> IMHO obvious enough, that it should be adressed somehow.
>
> If I have a kernel in excess of 4GB... "it should be addressed somehow"!

I believe software and hardware limitations should not be compared this
way.

Russell King (Oracle) May 19, 2020, 2:28 p.m. UTC | #16

On Tue, May 19, 2020 at 03:56:59PM +0200, Ard Biesheuvel wrote:
> On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Russell,
> >
> > CC devicetree
> >
> > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > > > > Currently, the start address of physical memory is obtained by masking
> > > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > > > > was chosen as a balance between the requirements of different platforms.
> > > > > > However, this does require that the start address of physical memory is
> > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > > > requirement is not fulfilled.
> > > > > >
> > > > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > > > if available (either explicitly passed, or appended to the kernel).
> > > > > > Fall back to the traditional method when needed.
> > > > > >
> > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > > > i.e. not at a multiple of 128 MiB.
> > > > > >
> > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > > > > ---
> > > > >
> > > > > [...]
> > > > >
> > > > > Apparently reading physical memory layout from DTB breaks crashdump
> > > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > > > region is large enough for the crashdump kernel to run completely inside
> > > > > it and don't modify anything outside it, just read and dump the remains
> > > > > of the crashed kernel. Using the information from DTB makes the
> > > > > decompressor place the kernel outside of the dedicated region.
> > > > >
> > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > > > within __enter_kernel). If I were to suggest something, there need to be
> > > > > one more bit of information passed in the DTB telling the decompressor
> > > > > to use the old masking technique to determain kernel address. It would
> > > > > be set in the DTB loaded along with the crashdump kernel.
> > > >
> > > > Shouldn't the DTB passed to the crashkernel describe which region of
> > > > memory is to be used instead?
> > >
> > > Definitely not.  The crashkernel needs to know where the RAM in the
> > > machine is, so that it can create a coredump of the crashed kernel.
> >
> > So the DTB should describe both ;-)
> >
> > > > Describing "to use the old masking technique" sounds a bit hackish to me.
> > > > I guess it cannot just restrict the /memory node to the reserved region,
> > > > as the crashkernel needs to be able to dump the remains of the crashed
> > > > kernel, which lie outside this region.
> > >
> > > Correct.
> > >
> > > > However, something under /chosen should work.
> > >
> > > Yet another sticky plaster...
> >
> > IMHO the old masking technique is the hacky solution covered by
> > plasters.
> >
> 
> I think debating which solution is the hacky one will not get us anywhere.
> 
> The simple reality is that the existing solution works fine for
> existing platforms, and so any changes in the logic will have to be
> opt-in in one way or the other.
> 
> Since U-boot supports EFI boot these days, one potential option is to
> rely on that. I have some changes implementing this that go on top of
> this patch, but they don't actually rely on it - it was just to
> prevent lexical conflicts.
> 
> The only remaining options imo are a kernel command line option, or a
> DT property that tells the decompressor to look at the memory nodes.
> But using the DT memory nodes on all platforms like this patch does is
> obviously just too risky.
> 
> On another note, I do think the usable-memory-region property should
> be implemented for ARM as well - relying on this rounding to ensure
> that the decompressor does the right thing is too fragile.

What is "too fragile" is trying to change this and expecting everything
to continue working as it did before.

How will switching to EFI help?  Won't the crashdump kernel detect EFI
and try to get the memory map from EFI, whereby it runs into exactly
the same issue that the DT approach does?

The current crashkernel situation works precisely because of the 128M
masking that is being done.

Ard Biesheuvel May 19, 2020, 2:32 p.m. UTC | #17

On Tue, 19 May 2020 at 16:29, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, May 19, 2020 at 03:56:59PM +0200, Ard Biesheuvel wrote:
> > On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > >
> > > Hi Russell,
> > >
> > > CC devicetree
> > >
> > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> > > <linux@armlinux.org.uk> wrote:
> > > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote:
> > > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote:
> > > > > > > Currently, the start address of physical memory is obtained by masking
> > > > > > > the program counter with a fixed mask of 0xf8000000.  This mask value
> > > > > > > was chosen as a balance between the requirements of different platforms.
> > > > > > > However, this does require that the start address of physical memory is
> > > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > > > > requirement is not fulfilled.
> > > > > > >
> > > > > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > > > > if available (either explicitly passed, or appended to the kernel).
> > > > > > > Fall back to the traditional method when needed.
> > > > > > >
> > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > > > > i.e. not at a multiple of 128 MiB.
> > > > > > >
> > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
> > > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com>
> > > > > > > ---
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > Apparently reading physical memory layout from DTB breaks crashdump
> > > > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > > > > region is large enough for the crashdump kernel to run completely inside
> > > > > > it and don't modify anything outside it, just read and dump the remains
> > > > > > of the crashed kernel. Using the information from DTB makes the
> > > > > > decompressor place the kernel outside of the dedicated region.
> > > > > >
> > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > > > > within __enter_kernel). If I were to suggest something, there need to be
> > > > > > one more bit of information passed in the DTB telling the decompressor
> > > > > > to use the old masking technique to determain kernel address. It would
> > > > > > be set in the DTB loaded along with the crashdump kernel.
> > > > >
> > > > > Shouldn't the DTB passed to the crashkernel describe which region of
> > > > > memory is to be used instead?
> > > >
> > > > Definitely not.  The crashkernel needs to know where the RAM in the
> > > > machine is, so that it can create a coredump of the crashed kernel.
> > >
> > > So the DTB should describe both ;-)
> > >
> > > > > Describing "to use the old masking technique" sounds a bit hackish to me.
> > > > > I guess it cannot just restrict the /memory node to the reserved region,
> > > > > as the crashkernel needs to be able to dump the remains of the crashed
> > > > > kernel, which lie outside this region.
> > > >
> > > > Correct.
> > > >
> > > > > However, something under /chosen should work.
> > > >
> > > > Yet another sticky plaster...
> > >
> > > IMHO the old masking technique is the hacky solution covered by
> > > plasters.
> > >
> >
> > I think debating which solution is the hacky one will not get us anywhere.
> >
> > The simple reality is that the existing solution works fine for
> > existing platforms, and so any changes in the logic will have to be
> > opt-in in one way or the other.
> >
> > Since U-boot supports EFI boot these days, one potential option is to
> > rely on that. I have some changes implementing this that go on top of
> > this patch, but they don't actually rely on it - it was just to
> > prevent lexical conflicts.
> >
> > The only remaining options imo are a kernel command line option, or a
> > DT property that tells the decompressor to look at the memory nodes.
> > But using the DT memory nodes on all platforms like this patch does is
> > obviously just too risky.
> >
> > On another note, I do think the usable-memory-region property should
> > be implemented for ARM as well - relying on this rounding to ensure
> > that the decompressor does the right thing is too fragile.
>
> What is "too fragile" is trying to change this and expecting everything
> to continue working as it did before.
>

That is my point.

> How will switching to EFI help?  Won't the crashdump kernel detect EFI
> and try to get the memory map from EFI, whereby it runs into exactly
> the same issue that the DT approach does?
>

No. If you boot from kexec, then the EFI stub is completely
circumvented, and things work as before.

> The current crashkernel situation works precisely because of the 128M
> masking that is being done.
>

Indeed. That is precisely my point.

[v6] ARM: boot: Obtain start of physical memory from DTB

Commit Message

Comments

Patch