Message ID | 20200429082120.16259-1-geert+renesas@glider.be (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] ARM: boot: Obtain start of physical memory from DTB | expand |
It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > Currently, the start address of physical memory is obtained by masking > the program counter with a fixed mask of 0xf8000000. This mask value > was chosen as a balance between the requirements of different platforms. > However, this does require that the start address of physical memory is > a multiple of 128 MiB, precluding booting Linux on platforms where this > requirement is not fulfilled. > > Fix this limitation by obtaining the start address from the DTB instead, > if available (either explicitly passed, or appended to the kernel). > Fall back to the traditional method when needed. > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > i.e. not at a multiple of 128 MiB. > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Tested-by: Dmitry Osipenko <digetx@gmail.com> > --- [...] Apparently reading physical memory layout from DTB breaks crashdump kernels. A crashdump kernel is loaded into a region of memory, that is reserved in the original (i.e. to be crashed) kernel. The reserved region is large enough for the crashdump kernel to run completely inside it and don't modify anything outside it, just read and dump the remains of the crashed kernel. Using the information from DTB makes the decompressor place the kernel outside of the dedicated region. The log below shows that a zImage and DTB are loaded at 0x18eb8000 and 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but it is decompressed to 0x00008000 (see r4 reported before jumping from within __enter_kernel). If I were to suggest something, there need to be one more bit of information passed in the DTB telling the decompressor to use the old masking technique to determain kernel address. It would be set in the DTB loaded along with the crashdump kernel. Despite the fact the kernel is able to start and get quite far it simply panics (for a reason unknown to me at the moment). Kind regards, ŁS --8<---------------cut here---------------start------------->8--- [ 42.358349] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 0: buf=0xf1871bcb bufsz=0x52c870 mem=0x18eb8000 memsz=0x52d000 [ 42.374615] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 1: buf=0x012365f6 bufsz=0x5abf mem=0x193f6000 memsz=0x6000 root@target:~# sync && echo c > /proc/sysrq-trigger [ 62.206252] sysrq: Trigger a crash [ 62.209711] Kernel panic - not syncing: sysrq triggered crash [ 62.215548] CPU: 0 PID: 1236 Comm: bash Kdump: loaded Tainted: G W 5.7.0-rc6-00011-gad3fbe6a883e #174 [ 62.226225] Hardware name: BCM2711 [ 62.229676] Backtrace: [ 62.232178] [<c010bfa4>] (dump_backtrace) from [<c010c334>] (show_stack+0x20/0x24) [ 62.239863] r7:00000008 r6:c0b4a48d r5:00000000 r4:c0eb7b18 [ 62.245617] [<c010c314>] (show_stack) from [<c03e475c>] (dump_stack+0x20/0x28) [ 62.252954] [<c03e473c>] (dump_stack) from [<c011e368>] (panic+0xf4/0x320) [ 62.259941] [<c011e274>] (panic) from [<c044bb60>] (sysrq_handle_crash+0x1c/0x20) [ 62.267536] r3:c044bb44 r2:c57e1c21 r1:60000093 r0:c0b4a48d [ 62.273278] r7:00000008 [ 62.275853] [<c044bb44>] (sysrq_handle_crash) from [<c044c198>] (__handle_sysrq+0xa0/0x150) [ 62.284334] [<c044c0f8>] (__handle_sysrq) from [<c044c620>] (write_sysrq_trigger+0x68/0x78) [ 62.292814] r10:00000002 r9:e9123f50 r8:00000002 r7:012f2408 r6:e9112cc0 r5:c044c5b8 [ 62.300757] r4:00000002 [ 62.303335] [<c044c5b8>] (write_sysrq_trigger) from [<c02a7ad4>] (proc_reg_write+0x98/0xa8) [ 62.311808] r5:c044c5b8 r4:eb655700 [ 62.315443] [<c02a7a3c>] (proc_reg_write) from [<c023b080>] (__vfs_write+0x48/0xf4) [ 62.323216] r9:012f2408 r8:c02a7a3c r7:00000002 r6:e9112cc0 r5:e9123f50 r4:c0e04248 [ 62.331077] [<c023b038>] (__vfs_write) from [<c023c900>] (vfs_write+0xa8/0xcc) [ 62.338407] r8:e9123f50 r7:012f2408 r6:00000002 r5:00000000 r4:e9112cc0 [ 62.345211] [<c023c858>] (vfs_write) from [<c023cae0>] (ksys_write+0x78/0xc4) [ 62.352454] r9:012f2408 r8:e9123f5c r7:c0e04248 r6:e9123f50 r5:012f2408 r4:e9112cc0 [ 62.360316] [<c023ca68>] (ksys_write) from [<c023cb44>] (sys_write+0x18/0x1c) [ 62.367559] r10:00000004 r9:e9122000 r8:c0100264 r7:00000004 r6:b6edcd90 r5:012f2408 [ 62.375504] r4:00000002 [ 62.378080] [<c023cb2c>] (sys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54) [ 62.385759] Exception stack(0xe9123fa8 to 0xe9123ff0) [ 62.390889] 3fa0: 00000002 012f2408 00000001 012f2408 00000002 00000000 [ 62.399190] 3fc0: 00000002 012f2408 b6edcd90 00000004 012f2408 00000002 00000000 00118fd8 [ 62.407488] 3fe0: 0000006c be82b7e8 b6df7010 b6e546e4 [ 62.412647] Loading crashdump kernel... [ 62.416628] Bye! Uncompressing Linux... done, booting the kernel. r2:0x193F6000 r4:0x00008000 [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 5.7.0-rc6-00011-gad3fbe6a883e (l.stelmach@AMDC1062) (gcc version 8.3.0 (Debian 8.3.0-2), GNU ld (GNU Binutils for Debian) 2.31.1) #174 Tue May 19 09:37:10 CEST 2020 [ 0.000000] CPU: ARMv7 Processor [410fd083] revision 3 (ARMv7), cr=10c5383d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: Raspberry Pi 4 Model B [ 0.000000] earlycon: uart8250 at MMIO32 0xfe215040 (options '') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] Memory policy: Data cache writeback [ 0.000000] Reserved memory: created CMA memory pool at 0x04000000, size 64 MiB [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [ 0.000000] 8<--- cut here --- [ 0.000000] Unable to handle kernel paging request at virtual address d93f6000 [ 0.000000] pgd = (ptrval) [ 0.000000] [d93f6000] *pgd=00000000 [ 0.000000] Internal error: Oops: 5 [#1] ARM [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc6-00011-gad3fbe6a883e #174 [ 0.000000] Hardware name: BCM2711 [ 0.000000] PC is at fdt32_ld+0xc/0x18 [ 0.000000] LR is at fdt_check_header+0x14/0x15c [ 0.000000] pc : [<c03e4b10>] lr : [<c03e4c24>] psr: a00000d3 [ 0.000000] sp : c0e01ed8 ip : c0e01ee8 fp : c0e01ee4 [ 0.000000] r10: c3ffff40 r9 : c0e2e4f4 r8 : 00000000 [ 0.000000] r7 : c0f5b35c r6 : d93f6000 r5 : c0e085d0 r4 : c0d25510 [ 0.000000] r3 : d93f6000 r2 : c0f5b35c r1 : 00000000 r0 : d93f6000 [ 0.000000] Flags: NzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none [ 0.000000] Control: 10c5383d Table: 00004059 DAC: 00000051 [ 0.000000] Process swapper (pid: 0, stack limit = 0x(ptrval)) [ 0.000000] Stack: (0xc0e01ed8 to 0xc0e02000) [ 0.000000] 1ec0: c0e01f04 c0e01ee8 [ 0.000000] 1ee0: c03e4c24 c03e4b10 c0d25510 c0e085d0 d93f6000 c0f5b35c c0e01f2c c0e01f08 [ 0.000000] 1f00: c05f7d94 c03e4c1c c0d25510 c0e085d0 07ffffff 00000000 c0f4c580 c0e2e4f4 [ 0.000000] 1f20: c0e01f4c c0e01f30 c0d26a54 c05f7ccc 00000000 c0e01f40 c0123458 c0d2ff08 [ 0.000000] 1f40: c0e01fa4 c0e01f50 c0d03c18 c0d26a28 ffffffff 10c5383d c0d0d244 c0e04248 [ 0.000000] 1f60: c0b11587 c09000e7 c0e01f94 c0e01f78 c01580e4 00000000 c0e01f9c c0d00330 [ 0.000000] 1f80: c0e04248 c0e04240 ffffffff 193f6000 c0eb77fc 10c53c7d c0e01ff4 c0e01fa8 [ 0.000000] 1fa0: c0d00b5c c0d035a4 00000000 00000000 00000000 00000000 00000000 c0d31a30 [ 0.000000] 1fc0: 00000000 00000000 00000000 c0d00330 00000051 10c03c7d ffffffff 193f6000 [ 0.000000] 1fe0: 410fd083 10c53c7d 00000000 c0e01ff8 00000000 c0d00b08 00000000 00000000 [ 0.000000] Backtrace: [ 0.000000] [<c03e4b04>] (fdt32_ld) from [<c03e4c24>] (fdt_check_header+0x14/0x15c) [ 0.000000] [<c03e4c10>] (fdt_check_header) from [<c05f7d94>] (__unflatten_device_tree+0xd4/0x284) [ 0.000000] r7:c0f5b35c r6:d93f6000 r5:c0e085d0 r4:c0d25510 [ 0.000000] [<c05f7cc0>] (__unflatten_device_tree) from [<c0d26a54>] (unflatten_device_tree+0x38/0x54) [ 0.000000] r9:c0e2e4f4 r8:c0f4c580 r7:00000000 r6:07ffffff r5:c0e085d0 r4:c0d25510 [ 0.000000] [<c0d26a1c>] (unflatten_device_tree) from [<c0d03c18>] (setup_arch+0x680/0xabc) [ 0.000000] r4:c0d2ff08 [ 0.000000] [<c0d03598>] (setup_arch) from [<c0d00b5c>] (start_kernel+0x60/0x500) [ 0.000000] r10:10c53c7d r9:c0eb77fc r8:193f6000 r7:ffffffff r6:c0e04240 r5:c0e04248 [ 0.000000] r4:c0d00330 [ 0.000000] [<c0d00afc>] (start_kernel) from [<00000000>] (0x0) [ 0.000000] r10:10c53c7d r9:410fd083 r8:193f6000 r7:ffffffff r6:10c03c7d r5:00000051 [ 0.000000] r4:c0d00330 [ 0.000000] Code: c03e49d0 e1a0c00d e92dd800 e24cb004 (e5900000) [ 0.000000] random: get_random_bytes called from init_oops_id+0x30/0x4c with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- --8<---------------cut here---------------end--------------->8---
On Tue, May 19, 2020 at 10:53:52AM +0200, Lukasz Stelmach wrote: > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > Currently, the start address of physical memory is obtained by masking > > the program counter with a fixed mask of 0xf8000000. This mask value > > was chosen as a balance between the requirements of different platforms. > > However, this does require that the start address of physical memory is > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > requirement is not fulfilled. > > > > Fix this limitation by obtaining the start address from the DTB instead, > > if available (either explicitly passed, or appended to the kernel). > > Fall back to the traditional method when needed. > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > i.e. not at a multiple of 128 MiB. > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > --- > > [...] > > Apparently reading physical memory layout from DTB breaks crashdump > kernels. A crashdump kernel is loaded into a region of memory, that is > reserved in the original (i.e. to be crashed) kernel. The reserved > region is large enough for the crashdump kernel to run completely inside > it and don't modify anything outside it, just read and dump the remains > of the crashed kernel. Using the information from DTB makes the > decompressor place the kernel outside of the dedicated region. > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > it is decompressed to 0x00008000 (see r4 reported before jumping from > within __enter_kernel). Right, and it's important that the kernel decompresses to 0x18008000 so it doesn't overwrite memory that was being used by the crashing kernel, and thus can create a true coredump image of the failed kernel. Meanwhile, the DTB still needs to describe the full memory layout so that we know where memory is located in order to coredump it properly. So, this is a flaw with this approach, and will need the commit to be dropped yet again - this patch is fundamentally incompatible with the way kexec's crashdump works. Looking back at the history, we've been trying this approach since February with four patches submitted to the patch system, and problems eventually found with each of them. I think this shows that the way the decompressor works out where to decompress the kernel to today is relied upon all over the place, and changing it is always going to cause problems. So, I don't think we /can/ change it without causing a regression for someone.
Hi Łukasz Thanks for your report! On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > Currently, the start address of physical memory is obtained by masking > > the program counter with a fixed mask of 0xf8000000. This mask value > > was chosen as a balance between the requirements of different platforms. > > However, this does require that the start address of physical memory is > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > requirement is not fulfilled. > > > > Fix this limitation by obtaining the start address from the DTB instead, > > if available (either explicitly passed, or appended to the kernel). > > Fall back to the traditional method when needed. > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > i.e. not at a multiple of 128 MiB. > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > --- > > [...] > > Apparently reading physical memory layout from DTB breaks crashdump > kernels. A crashdump kernel is loaded into a region of memory, that is > reserved in the original (i.e. to be crashed) kernel. The reserved > region is large enough for the crashdump kernel to run completely inside > it and don't modify anything outside it, just read and dump the remains > of the crashed kernel. Using the information from DTB makes the > decompressor place the kernel outside of the dedicated region. > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > it is decompressed to 0x00008000 (see r4 reported before jumping from > within __enter_kernel). If I were to suggest something, there need to be > one more bit of information passed in the DTB telling the decompressor > to use the old masking technique to determain kernel address. It would > be set in the DTB loaded along with the crashdump kernel. Shouldn't the DTB passed to the crashkernel describe which region of memory is to be used instead? Describing "to use the old masking technique" sounds a bit hackish to me. I guess it cannot just restrict the /memory node to the reserved region, as the crashkernel needs to be able to dump the remains of the crashed kernel, which lie outside this region. However, something under /chosen should work. > Despite the fact the kernel is able to start and get quite far it simply > panics (for a reason unknown to me at the moment). > > Kind regards, > ŁS > > --8<---------------cut here---------------start------------->8--- > [ 42.358349] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 0: buf=0xf1871bcb bufsz=0x52c870 mem=0x18eb8000 memsz=0x52d000 > [ 42.374615] kexec_file:__do_sys_kexec_file_load:435: kexec_file: Loading segment 1: buf=0x012365f6 bufsz=0x5abf mem=0x193f6000 memsz=0x6000 > root@target:~# sync && echo c > /proc/sysrq-trigger > [ 62.206252] sysrq: Trigger a crash > [ 62.209711] Kernel panic - not syncing: sysrq triggered crash > [ 62.215548] CPU: 0 PID: 1236 Comm: bash Kdump: loaded Tainted: G W 5.7.0-rc6-00011-gad3fbe6a883e #174 > [ 62.226225] Hardware name: BCM2711 > [ 62.229676] Backtrace: > [ 62.232178] [<c010bfa4>] (dump_backtrace) from [<c010c334>] (show_stack+0x20/0x24) > [ 62.239863] r7:00000008 r6:c0b4a48d r5:00000000 r4:c0eb7b18 > [ 62.245617] [<c010c314>] (show_stack) from [<c03e475c>] (dump_stack+0x20/0x28) > [ 62.252954] [<c03e473c>] (dump_stack) from [<c011e368>] (panic+0xf4/0x320) > [ 62.259941] [<c011e274>] (panic) from [<c044bb60>] (sysrq_handle_crash+0x1c/0x20) > [ 62.267536] r3:c044bb44 r2:c57e1c21 r1:60000093 r0:c0b4a48d > [ 62.273278] r7:00000008 > [ 62.275853] [<c044bb44>] (sysrq_handle_crash) from [<c044c198>] (__handle_sysrq+0xa0/0x150) > [ 62.284334] [<c044c0f8>] (__handle_sysrq) from [<c044c620>] (write_sysrq_trigger+0x68/0x78) > [ 62.292814] r10:00000002 r9:e9123f50 r8:00000002 r7:012f2408 r6:e9112cc0 r5:c044c5b8 > [ 62.300757] r4:00000002 > [ 62.303335] [<c044c5b8>] (write_sysrq_trigger) from [<c02a7ad4>] (proc_reg_write+0x98/0xa8) > [ 62.311808] r5:c044c5b8 r4:eb655700 > [ 62.315443] [<c02a7a3c>] (proc_reg_write) from [<c023b080>] (__vfs_write+0x48/0xf4) > [ 62.323216] r9:012f2408 r8:c02a7a3c r7:00000002 r6:e9112cc0 r5:e9123f50 r4:c0e04248 > [ 62.331077] [<c023b038>] (__vfs_write) from [<c023c900>] (vfs_write+0xa8/0xcc) > [ 62.338407] r8:e9123f50 r7:012f2408 r6:00000002 r5:00000000 r4:e9112cc0 > [ 62.345211] [<c023c858>] (vfs_write) from [<c023cae0>] (ksys_write+0x78/0xc4) > [ 62.352454] r9:012f2408 r8:e9123f5c r7:c0e04248 r6:e9123f50 r5:012f2408 r4:e9112cc0 > [ 62.360316] [<c023ca68>] (ksys_write) from [<c023cb44>] (sys_write+0x18/0x1c) > [ 62.367559] r10:00000004 r9:e9122000 r8:c0100264 r7:00000004 r6:b6edcd90 r5:012f2408 > [ 62.375504] r4:00000002 > [ 62.378080] [<c023cb2c>] (sys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54) > [ 62.385759] Exception stack(0xe9123fa8 to 0xe9123ff0) > [ 62.390889] 3fa0: 00000002 012f2408 00000001 012f2408 00000002 00000000 > [ 62.399190] 3fc0: 00000002 012f2408 b6edcd90 00000004 012f2408 00000002 00000000 00118fd8 > [ 62.407488] 3fe0: 0000006c be82b7e8 b6df7010 b6e546e4 > [ 62.412647] Loading crashdump kernel... > [ 62.416628] Bye! > Uncompressing Linux... done, booting the kernel. > r2:0x193F6000 > r4:0x00008000 > [ 0.000000] Booting Linux on physical CPU 0x0 > [ 0.000000] Linux version 5.7.0-rc6-00011-gad3fbe6a883e (l.stelmach@AMDC1062) (gcc version 8.3.0 (Debian 8.3.0-2), GNU ld (GNU Binutils for Debian) 2.31.1) #174 Tue May 19 > 09:37:10 CEST 2020 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > Hi Łukasz > > Thanks for your report! > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > Currently, the start address of physical memory is obtained by masking > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > was chosen as a balance between the requirements of different platforms. > > > However, this does require that the start address of physical memory is > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > requirement is not fulfilled. > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > if available (either explicitly passed, or appended to the kernel). > > > Fall back to the traditional method when needed. > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > i.e. not at a multiple of 128 MiB. > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > --- > > > > [...] > > > > Apparently reading physical memory layout from DTB breaks crashdump > > kernels. A crashdump kernel is loaded into a region of memory, that is > > reserved in the original (i.e. to be crashed) kernel. The reserved > > region is large enough for the crashdump kernel to run completely inside > > it and don't modify anything outside it, just read and dump the remains > > of the crashed kernel. Using the information from DTB makes the > > decompressor place the kernel outside of the dedicated region. > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > within __enter_kernel). If I were to suggest something, there need to be > > one more bit of information passed in the DTB telling the decompressor > > to use the old masking technique to determain kernel address. It would > > be set in the DTB loaded along with the crashdump kernel. > > Shouldn't the DTB passed to the crashkernel describe which region of > memory is to be used instead? Definitely not. The crashkernel needs to know where the RAM in the machine is, so that it can create a coredump of the crashed kernel. > Describing "to use the old masking technique" sounds a bit hackish to me. > I guess it cannot just restrict the /memory node to the reserved region, > as the crashkernel needs to be able to dump the remains of the crashed > kernel, which lie outside this region. Correct. > However, something under /chosen should work. Yet another sticky plaster...
Hi Russell, CC devicetree On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > > Currently, the start address of physical memory is obtained by masking > > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > > was chosen as a balance between the requirements of different platforms. > > > > However, this does require that the start address of physical memory is > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > > requirement is not fulfilled. > > > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > > if available (either explicitly passed, or appended to the kernel). > > > > Fall back to the traditional method when needed. > > > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > > i.e. not at a multiple of 128 MiB. > > > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > > --- > > > > > > [...] > > > > > > Apparently reading physical memory layout from DTB breaks crashdump > > > kernels. A crashdump kernel is loaded into a region of memory, that is > > > reserved in the original (i.e. to be crashed) kernel. The reserved > > > region is large enough for the crashdump kernel to run completely inside > > > it and don't modify anything outside it, just read and dump the remains > > > of the crashed kernel. Using the information from DTB makes the > > > decompressor place the kernel outside of the dedicated region. > > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > > within __enter_kernel). If I were to suggest something, there need to be > > > one more bit of information passed in the DTB telling the decompressor > > > to use the old masking technique to determain kernel address. It would > > > be set in the DTB loaded along with the crashdump kernel. > > > > Shouldn't the DTB passed to the crashkernel describe which region of > > memory is to be used instead? > > Definitely not. The crashkernel needs to know where the RAM in the > machine is, so that it can create a coredump of the crashed kernel. So the DTB should describe both ;-) > > Describing "to use the old masking technique" sounds a bit hackish to me. > > I guess it cannot just restrict the /memory node to the reserved region, > > as the crashkernel needs to be able to dump the remains of the crashed > > kernel, which lie outside this region. > > Correct. > > > However, something under /chosen should work. > > Yet another sticky plaster... IMHO the old masking technique is the hacky solution covered by plasters. DT describes the hardware. In general, where to put the kernel is a software policy, and thus doesn't belong in DT, except perhaps under /chosen. But that would open another can of worms, as people usually have no business in specifying where the kernel should be located. In the crashkernel case, there is a clear separation between memory to be used by the crashkernel, and memory to be solely inspected by the crashkernel. Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says: "The client program may access memory not covered by any memory reservations (see section 5.3)" (Section 5.3 "Memory Reservation Block" only talks about structures in the FDT, not about DTS) Hence according to the above, the crashkernel is rightfully allowed to do whatever it wants with all memory under the /memory node. However, there is also Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. This suggests the crashkernel should be passed a DTB that contains a /reserved-memory node, describing which memory cannot be used freely. Then the decompressor needs to take this into account when deciding where the put the kernel. Yes, the above requires changing code. But at least it provides a path forward, getting rid of the fragile old masking technique. Thanks for your comments! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Tue, May 19, 2020 at 1:21 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > > However, something under /chosen should work. > > > > Yet another sticky plaster... > > IMHO the old masking technique is the hacky solution covered by > plasters. > > DT describes the hardware. In general, where to put the kernel is a > software policy, and thus doesn't belong in DT, except perhaps under > /chosen. But that would open another can of worms, as people usually > have no business in specifying where the kernel should be located. > In the crashkernel case, there is a clear separation between memory to > be used by the crashkernel, and memory to be solely inspected by the > crashkernel. > > Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says: > > "The client program may access memory not covered by any memory > reservations (see section 5.3)" > > (Section 5.3 "Memory Reservation Block" only talks about structures in > the FDT, not about DTS) > > Hence according to the above, the crashkernel is rightfully allowed to > do whatever it wants with all memory under the /memory node. > However, there is also > Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. > This suggests the crashkernel should be passed a DTB that contains a > /reserved-memory node, describing which memory cannot be used freely. > Then the decompressor needs to take this into account when deciding > where the put the kernel. > > Yes, the above requires changing code. But at least it provides a > path forward, getting rid of the fragile old masking technique. There is an existing "linux,usable-memory-range" property documented in Documentation/devicetree/bindings/chosen.txt, which as I understand is exactly what you are looking for, except that it is currently only documented for arm64. Would extending this to arm work? Arnd
On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: > Hi Russell, > > CC devicetree > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > > > Currently, the start address of physical memory is obtained by masking > > > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > > > was chosen as a balance between the requirements of different platforms. > > > > > However, this does require that the start address of physical memory is > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > > > requirement is not fulfilled. > > > > > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > > > if available (either explicitly passed, or appended to the kernel). > > > > > Fall back to the traditional method when needed. > > > > > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > > > i.e. not at a multiple of 128 MiB. > > > > > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > > > --- > > > > > > > > [...] > > > > > > > > Apparently reading physical memory layout from DTB breaks crashdump > > > > kernels. A crashdump kernel is loaded into a region of memory, that is > > > > reserved in the original (i.e. to be crashed) kernel. The reserved > > > > region is large enough for the crashdump kernel to run completely inside > > > > it and don't modify anything outside it, just read and dump the remains > > > > of the crashed kernel. Using the information from DTB makes the > > > > decompressor place the kernel outside of the dedicated region. > > > > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > > > within __enter_kernel). If I were to suggest something, there need to be > > > > one more bit of information passed in the DTB telling the decompressor > > > > to use the old masking technique to determain kernel address. It would > > > > be set in the DTB loaded along with the crashdump kernel. > > > > > > Shouldn't the DTB passed to the crashkernel describe which region of > > > memory is to be used instead? > > > > Definitely not. The crashkernel needs to know where the RAM in the > > machine is, so that it can create a coredump of the crashed kernel. > > So the DTB should describe both ;-) > > > > Describing "to use the old masking technique" sounds a bit hackish to me. > > > I guess it cannot just restrict the /memory node to the reserved region, > > > as the crashkernel needs to be able to dump the remains of the crashed > > > kernel, which lie outside this region. > > > > Correct. > > > > > However, something under /chosen should work. > > > > Yet another sticky plaster... > > IMHO the old masking technique is the hacky solution covered by > plasters. One line of code is not "covered by plasters". There are no plasters. It's a solution that works for 99.99% of people, unlike your approach that has had a stream of issues over the last four months, and has required many reworks of the code to fix each one. That in itself speaks volumes about the suitability of the approach. > DT describes the hardware. Right, so DT is correct. > In general, where to put the kernel is a > software policy, and thus doesn't belong in DT, except perhaps under > /chosen. But that would open another can of worms, as people usually > have no business in specifying where the kernel should be located. > In the crashkernel case, there is a clear separation between memory to > be used by the crashkernel, and memory to be solely inspected by the > crashkernel. > > Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says: > > "The client program may access memory not covered by any memory > reservations (see section 5.3)" > > (Section 5.3 "Memory Reservation Block" only talks about structures in > the FDT, not about DTS) > > Hence according to the above, the crashkernel is rightfully allowed to > do whatever it wants with all memory under the /memory node. > However, there is also > Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. > This suggests the crashkernel should be passed a DTB that contains a > /reserved-memory node, describing which memory cannot be used freely. > Then the decompressor needs to take this into account when deciding > where the put the kernel. So you want to throw yet more complexity at this solution to try and make it work... > Yes, the above requires changing code. But at least it provides a > path forward, getting rid of the fragile old masking technique. It's hardly fragile when it's worked fine for the last 20+ years, whereas your solution can't work without some regression being reported within a couple of weeks of it being applied. Again, that speaks volumes about which one is better than the other. Continually patching this approach to workaround one issue after another shows that it is _this_ solution that is the fragile implementation. A fragile implementation is by definition one that keeps breaking. That's yours. The masking approach hasn't "broken" for anyone, and hasn't been the cause of one single regression anywhere. Yes, there are some platforms that it doesn't work for (because they choose to reserve the first chunk of RAM for something) but that is not a regression. So, I'm not going to apply the next revision of this patch for at least one whole kernel cycle - that means scheduling it for 5.10-rc at the earliest.
It was <2020-05-19 wto 13:21>, when Geert Uytterhoeven wrote: > Hi Russell, > > CC devicetree > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: >> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: >>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: >>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: >>>>> Currently, the start address of physical memory is obtained by masking >>>>> the program counter with a fixed mask of 0xf8000000. This mask value >>>>> was chosen as a balance between the requirements of different platforms. >>>>> However, this does require that the start address of physical memory is >>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this >>>>> requirement is not fulfilled. >>>>> >>>>> Fix this limitation by obtaining the start address from the DTB instead, >>>>> if available (either explicitly passed, or appended to the kernel). >>>>> Fall back to the traditional method when needed. >>>>> >>>>> This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM >>>>> on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), >>>>> i.e. not at a multiple of 128 MiB. >>>>> >>>>> Suggested-by: Nicolas Pitre <nico@fluxnic.net> >>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> >>>>> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> >>>>> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> >>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> >>>>> Tested-by: Dmitry Osipenko <digetx@gmail.com> >>>>> --- >>>> >>>> [...] >>>> >>>> Apparently reading physical memory layout from DTB breaks crashdump >>>> kernels. A crashdump kernel is loaded into a region of memory, that >>>> is reserved in the original (i.e. to be crashed) kernel. The >>>> reserved region is large enough for the crashdump kernel to run >>>> completely inside it and don't modify anything outside it, just >>>> read and dump the remains of the crashed kernel. Using the >>>> information from DTB makes the decompressor place the kernel >>>> outside of the dedicated region. >>>> >>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 >>>> and 0x193f6000 (physical). The kernel is expected to run at >>>> 0x18008000, but it is decompressed to 0x00008000 (see r4 reported >>>> before jumping from within __enter_kernel). If I were to suggest >>>> something, there need to be one more bit of information passed in >>>> the DTB telling the decompressor to use the old masking technique >>>> to determain kernel address. It would be set in the DTB loaded >>>> along with the crashdump kernel. >>> >>> Shouldn't the DTB passed to the crashkernel describe which region of >>> memory is to be used instead? >> >> Definitely not. The crashkernel needs to know where the RAM in the >> machine is, so that it can create a coredump of the crashed kernel. > > So the DTB should describe both ;-) So we can do without the mem= cmdline option which is required now. Sounds reasonable to me.
Hi Arnd, On Tue, May 19, 2020 at 1:28 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, May 19, 2020 at 1:21 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > > > However, something under /chosen should work. > > > > > > Yet another sticky plaster... > > > > IMHO the old masking technique is the hacky solution covered by > > plasters. > > > > DT describes the hardware. In general, where to put the kernel is a > > software policy, and thus doesn't belong in DT, except perhaps under > > /chosen. But that would open another can of worms, as people usually > > have no business in specifying where the kernel should be located. > > In the crashkernel case, there is a clear separation between memory to > > be used by the crashkernel, and memory to be solely inspected by the > > crashkernel. > > > > Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says: > > > > "The client program may access memory not covered by any memory > > reservations (see section 5.3)" > > > > (Section 5.3 "Memory Reservation Block" only talks about structures in > > the FDT, not about DTS) > > > > Hence according to the above, the crashkernel is rightfully allowed to > > do whatever it wants with all memory under the /memory node. > > However, there is also > > Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. > > This suggests the crashkernel should be passed a DTB that contains a > > /reserved-memory node, describing which memory cannot be used freely. > > Then the decompressor needs to take this into account when deciding > > where the put the kernel. > > > > Yes, the above requires changing code. But at least it provides a > > path forward, getting rid of the fragile old masking technique. > > There is an existing "linux,usable-memory-range" property documented > in Documentation/devicetree/bindings/chosen.txt, which as I understand > is exactly what you are looking for, except that it is currently only > documented for arm64. Thank you, that looks appropriate! It seems this is not really used by the early startup code. Is that because the early startup code always runs in-place, and the kernel image is not even copied? > Would extending this to arm work? Let's see.... Th arm early boot code seems to be more complex than the arm64 code ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote: > On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: >> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin >> <linux@armlinux.org.uk> wrote: >> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: >> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: >> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: >> > > > > Currently, the start address of physical memory is obtained by masking >> > > > > the program counter with a fixed mask of 0xf8000000. This mask value >> > > > > was chosen as a balance between the requirements of different platforms. >> > > > > However, this does require that the start address of physical memory is >> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this >> > > > > requirement is not fulfilled. >> > > > > >> > > > > Fix this limitation by obtaining the start address from the DTB instead, >> > > > > if available (either explicitly passed, or appended to the kernel). >> > > > > Fall back to the traditional method when needed. >> > > > > >> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM >> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), >> > > > > i.e. not at a multiple of 128 MiB. >> > > > > >> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> >> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> >> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> >> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> >> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> >> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> >> > > > > --- >> > > > >> > > > [...] >> > > > >> > > > Apparently reading physical memory layout from DTB breaks crashdump >> > > > kernels. A crashdump kernel is loaded into a region of memory, that is >> > > > reserved in the original (i.e. to be crashed) kernel. The reserved >> > > > region is large enough for the crashdump kernel to run completely inside >> > > > it and don't modify anything outside it, just read and dump the remains >> > > > of the crashed kernel. Using the information from DTB makes the >> > > > decompressor place the kernel outside of the dedicated region. >> > > > >> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and >> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but >> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from >> > > > within __enter_kernel). If I were to suggest something, there need to be >> > > > one more bit of information passed in the DTB telling the decompressor >> > > > to use the old masking technique to determain kernel address. It would >> > > > be set in the DTB loaded along with the crashdump kernel. >> > > >> > > Shouldn't the DTB passed to the crashkernel describe which region of >> > > memory is to be used instead? >> > >> > Definitely not. The crashkernel needs to know where the RAM in the >> > machine is, so that it can create a coredump of the crashed kernel. >> >> So the DTB should describe both ;-) >> >> > > Describing "to use the old masking technique" sounds a bit hackish to me. >> > > I guess it cannot just restrict the /memory node to the reserved region, >> > > as the crashkernel needs to be able to dump the remains of the crashed >> > > kernel, which lie outside this region. >> > >> > Correct. >> > >> > > However, something under /chosen should work. >> > >> > Yet another sticky plaster... >> >> IMHO the old masking technique is the hacky solution covered by >> plasters. > > One line of code is not "covered by plasters". There are no plasters. > It's a solution that works for 99.99% of people, unlike your approach > that has had a stream of issues over the last four months, and has > required many reworks of the code to fix each one. That in itself > speaks volumes about the suitability of the approach. As I have been working with kexec code (patches soon) I would like to defend the DT approach a bit. It allows to avoid zImage relocation when a decompressed kernel is larger than ~128MiB. In such case zImage isn't small either and moving it around takes some time.
On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote: > It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote: > > On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: > >> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > >> <linux@armlinux.org.uk> wrote: > >> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > >> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > >> > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > >> > > > > Currently, the start address of physical memory is obtained by masking > >> > > > > the program counter with a fixed mask of 0xf8000000. This mask value > >> > > > > was chosen as a balance between the requirements of different platforms. > >> > > > > However, this does require that the start address of physical memory is > >> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > >> > > > > requirement is not fulfilled. > >> > > > > > >> > > > > Fix this limitation by obtaining the start address from the DTB instead, > >> > > > > if available (either explicitly passed, or appended to the kernel). > >> > > > > Fall back to the traditional method when needed. > >> > > > > > >> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > >> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > >> > > > > i.e. not at a multiple of 128 MiB. > >> > > > > > >> > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > >> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > >> > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > >> > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > >> > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > >> > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > >> > > > > --- > >> > > > > >> > > > [...] > >> > > > > >> > > > Apparently reading physical memory layout from DTB breaks crashdump > >> > > > kernels. A crashdump kernel is loaded into a region of memory, that is > >> > > > reserved in the original (i.e. to be crashed) kernel. The reserved > >> > > > region is large enough for the crashdump kernel to run completely inside > >> > > > it and don't modify anything outside it, just read and dump the remains > >> > > > of the crashed kernel. Using the information from DTB makes the > >> > > > decompressor place the kernel outside of the dedicated region. > >> > > > > >> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > >> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > >> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > >> > > > within __enter_kernel). If I were to suggest something, there need to be > >> > > > one more bit of information passed in the DTB telling the decompressor > >> > > > to use the old masking technique to determain kernel address. It would > >> > > > be set in the DTB loaded along with the crashdump kernel. > >> > > > >> > > Shouldn't the DTB passed to the crashkernel describe which region of > >> > > memory is to be used instead? > >> > > >> > Definitely not. The crashkernel needs to know where the RAM in the > >> > machine is, so that it can create a coredump of the crashed kernel. > >> > >> So the DTB should describe both ;-) > >> > >> > > Describing "to use the old masking technique" sounds a bit hackish to me. > >> > > I guess it cannot just restrict the /memory node to the reserved region, > >> > > as the crashkernel needs to be able to dump the remains of the crashed > >> > > kernel, which lie outside this region. > >> > > >> > Correct. > >> > > >> > > However, something under /chosen should work. > >> > > >> > Yet another sticky plaster... > >> > >> IMHO the old masking technique is the hacky solution covered by > >> plasters. > > > > One line of code is not "covered by plasters". There are no plasters. > > It's a solution that works for 99.99% of people, unlike your approach > > that has had a stream of issues over the last four months, and has > > required many reworks of the code to fix each one. That in itself > > speaks volumes about the suitability of the approach. > > As I have been working with kexec code (patches soon) I would like to > defend the DT approach a bit. It allows to avoid zImage relocation when > a decompressed kernel is larger than ~128MiB. In such case zImage isn't > small either and moving it around takes some time. ... which is something that has been supported for a very long time, before the days of DT.
It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote: > On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote: >> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote: >>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: >>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin >>>> <linux@armlinux.org.uk> wrote: >>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: >>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: >>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: >>>>>>>> Currently, the start address of physical memory is obtained by masking >>>>>>>> the program counter with a fixed mask of 0xf8000000. This mask value >>>>>>>> was chosen as a balance between the requirements of different platforms. >>>>>>>> However, this does require that the start address of physical memory is >>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this >>>>>>>> requirement is not fulfilled. >>>>>>>> >>>>>>>> Fix this limitation by obtaining the start address from the DTB instead, >>>>>>>> if available (either explicitly passed, or appended to the kernel). >>>>>>>> Fall back to the traditional method when needed. [...] >>>>>>> Apparently reading physical memory layout from DTB breaks crashdump >>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is >>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved >>>>>>> region is large enough for the crashdump kernel to run completely inside >>>>>>> it and don't modify anything outside it, just read and dump the remains >>>>>>> of the crashed kernel. Using the information from DTB makes the >>>>>>> decompressor place the kernel outside of the dedicated region. >>>>>>> >>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and >>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but >>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from >>>>>>> within __enter_kernel). If I were to suggest something, there need to be >>>>>>> one more bit of information passed in the DTB telling the decompressor >>>>>>> to use the old masking technique to determain kernel address. It would >>>>>>> be set in the DTB loaded along with the crashdump kernel. [...] >>>>>> Describing "to use the old masking technique" sounds a bit hackish to me. >>>>>> I guess it cannot just restrict the /memory node to the reserved region, >>>>>> as the crashkernel needs to be able to dump the remains of the crashed >>>>>> kernel, which lie outside this region. >>>>> >>>>> Correct. >>>>> >>>>>> However, something under /chosen should work. >>>>> >>>>> Yet another sticky plaster... >>>> >>>> IMHO the old masking technique is the hacky solution covered by >>>> plasters. >>> >>> One line of code is not "covered by plasters". There are no plasters. >>> It's a solution that works for 99.99% of people, unlike your approach >>> that has had a stream of issues over the last four months, and has >>> required many reworks of the code to fix each one. That in itself >>> speaks volumes about the suitability of the approach. >> >> As I have been working with kexec code (patches soon) I would like to >> defend the DT approach a bit. It allows to avoid zImage relocation when >> a decompressed kernel is larger than ~128MiB. In such case zImage isn't >> small either and moving it around takes some time. > > ... which is something that has been supported for a very long time, > before the days of DT. How? If a decompressed kernel requires >128M and a bootloader would like to put a zImage high enough to *avoid* copying it once again, then the decompressor can't see any memory below the 128M window it starts in and can't decompress the kernel there. If we do not care about copying zImage, then, indeed, everything works fine as it is today. You are most probably right 99% doesn't require 128M kernel, but the case is IMHO obvious enough, that it should be adressed somehow. Kind regards,
On Tue, May 19, 2020 at 02:49:57PM +0200, Lukasz Stelmach wrote: > It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote: > > On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote: > >> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote: > >>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: > >>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > >>>> <linux@armlinux.org.uk> wrote: > >>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > >>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > >>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > >>>>>>>> Currently, the start address of physical memory is obtained by masking > >>>>>>>> the program counter with a fixed mask of 0xf8000000. This mask value > >>>>>>>> was chosen as a balance between the requirements of different platforms. > >>>>>>>> However, this does require that the start address of physical memory is > >>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this > >>>>>>>> requirement is not fulfilled. > >>>>>>>> > >>>>>>>> Fix this limitation by obtaining the start address from the DTB instead, > >>>>>>>> if available (either explicitly passed, or appended to the kernel). > >>>>>>>> Fall back to the traditional method when needed. > [...] > >>>>>>> Apparently reading physical memory layout from DTB breaks crashdump > >>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is > >>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved > >>>>>>> region is large enough for the crashdump kernel to run completely inside > >>>>>>> it and don't modify anything outside it, just read and dump the remains > >>>>>>> of the crashed kernel. Using the information from DTB makes the > >>>>>>> decompressor place the kernel outside of the dedicated region. > >>>>>>> > >>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > >>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > >>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from > >>>>>>> within __enter_kernel). If I were to suggest something, there need to be > >>>>>>> one more bit of information passed in the DTB telling the decompressor > >>>>>>> to use the old masking technique to determain kernel address. It would > >>>>>>> be set in the DTB loaded along with the crashdump kernel. > [...] > >>>>>> Describing "to use the old masking technique" sounds a bit hackish to me. > >>>>>> I guess it cannot just restrict the /memory node to the reserved region, > >>>>>> as the crashkernel needs to be able to dump the remains of the crashed > >>>>>> kernel, which lie outside this region. > >>>>> > >>>>> Correct. > >>>>> > >>>>>> However, something under /chosen should work. > >>>>> > >>>>> Yet another sticky plaster... > >>>> > >>>> IMHO the old masking technique is the hacky solution covered by > >>>> plasters. > >>> > >>> One line of code is not "covered by plasters". There are no plasters. > >>> It's a solution that works for 99.99% of people, unlike your approach > >>> that has had a stream of issues over the last four months, and has > >>> required many reworks of the code to fix each one. That in itself > >>> speaks volumes about the suitability of the approach. > >> > >> As I have been working with kexec code (patches soon) I would like to > >> defend the DT approach a bit. It allows to avoid zImage relocation when > >> a decompressed kernel is larger than ~128MiB. In such case zImage isn't > >> small either and moving it around takes some time. > > > > ... which is something that has been supported for a very long time, > > before the days of DT. > > How? If a decompressed kernel requires >128M and a bootloader would like > to put a zImage high enough to *avoid* copying it once again, then the > decompressor can't see any memory below the 128M window it starts in and > can't decompress the kernel there. Do you have such a large kernel? It would be rather inefficient as branch instructions could not be used; every function call would have to be indirect. The maximum is +/- 32MB for a branch. > If we do not care about copying > zImage, then, indeed, everything works fine as it is today. You are > most probably right 99% doesn't require 128M kernel, but the case is > IMHO obvious enough, that it should be adressed somehow. If I have a kernel in excess of 4GB... "it should be addressed somehow"!
On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Russell, > > CC devicetree > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > > > Currently, the start address of physical memory is obtained by masking > > > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > > > was chosen as a balance between the requirements of different platforms. > > > > > However, this does require that the start address of physical memory is > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > > > requirement is not fulfilled. > > > > > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > > > if available (either explicitly passed, or appended to the kernel). > > > > > Fall back to the traditional method when needed. > > > > > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > > > i.e. not at a multiple of 128 MiB. > > > > > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > > > --- > > > > > > > > [...] > > > > > > > > Apparently reading physical memory layout from DTB breaks crashdump > > > > kernels. A crashdump kernel is loaded into a region of memory, that is > > > > reserved in the original (i.e. to be crashed) kernel. The reserved > > > > region is large enough for the crashdump kernel to run completely inside > > > > it and don't modify anything outside it, just read and dump the remains > > > > of the crashed kernel. Using the information from DTB makes the > > > > decompressor place the kernel outside of the dedicated region. > > > > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > > > within __enter_kernel). If I were to suggest something, there need to be > > > > one more bit of information passed in the DTB telling the decompressor > > > > to use the old masking technique to determain kernel address. It would > > > > be set in the DTB loaded along with the crashdump kernel. > > > > > > Shouldn't the DTB passed to the crashkernel describe which region of > > > memory is to be used instead? > > > > Definitely not. The crashkernel needs to know where the RAM in the > > machine is, so that it can create a coredump of the crashed kernel. > > So the DTB should describe both ;-) > > > > Describing "to use the old masking technique" sounds a bit hackish to me. > > > I guess it cannot just restrict the /memory node to the reserved region, > > > as the crashkernel needs to be able to dump the remains of the crashed > > > kernel, which lie outside this region. > > > > Correct. > > > > > However, something under /chosen should work. > > > > Yet another sticky plaster... > > IMHO the old masking technique is the hacky solution covered by > plasters. > I think debating which solution is the hacky one will not get us anywhere. The simple reality is that the existing solution works fine for existing platforms, and so any changes in the logic will have to be opt-in in one way or the other. Since U-boot supports EFI boot these days, one potential option is to rely on that. I have some changes implementing this that go on top of this patch, but they don't actually rely on it - it was just to prevent lexical conflicts. The only remaining options imo are a kernel command line option, or a DT property that tells the decompressor to look at the memory nodes. But using the DT memory nodes on all platforms like this patch does is obviously just too risky. On another note, I do think the usable-memory-region property should be implemented for ARM as well - relying on this rounding to ensure that the decompressor does the right thing is too fragile.
It was <2020-05-19 wto 14:12>, when Russell King - ARM Linux admin wrote: > On Tue, May 19, 2020 at 02:49:57PM +0200, Lukasz Stelmach wrote: >> It was <2020-05-19 wto 13:27>, when Russell King - ARM Linux admin wrote: >>> On Tue, May 19, 2020 at 02:20:25PM +0200, Lukasz Stelmach wrote: >>>> It was <2020-05-19 wto 12:43>, when Russell King - ARM Linux admin wrote: >>>>> On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote: >>>>>> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin >>>>>> <linux@armlinux.org.uk> wrote: >>>>>>> On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: >>>>>>>> On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: >>>>>>>>> It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: >>>>>>>>>> Currently, the start address of physical memory is obtained by masking >>>>>>>>>> the program counter with a fixed mask of 0xf8000000. This mask value >>>>>>>>>> was chosen as a balance between the requirements of different platforms. >>>>>>>>>> However, this does require that the start address of physical memory is >>>>>>>>>> a multiple of 128 MiB, precluding booting Linux on platforms where this >>>>>>>>>> requirement is not fulfilled. >>>>>>>>>> >>>>>>>>>> Fix this limitation by obtaining the start address from the DTB instead, >>>>>>>>>> if available (either explicitly passed, or appended to the kernel). >>>>>>>>>> Fall back to the traditional method when needed. >> [...] >>>>>>>>> Apparently reading physical memory layout from DTB breaks crashdump >>>>>>>>> kernels. A crashdump kernel is loaded into a region of memory, that is >>>>>>>>> reserved in the original (i.e. to be crashed) kernel. The reserved >>>>>>>>> region is large enough for the crashdump kernel to run completely inside >>>>>>>>> it and don't modify anything outside it, just read and dump the remains >>>>>>>>> of the crashed kernel. Using the information from DTB makes the >>>>>>>>> decompressor place the kernel outside of the dedicated region. >>>>>>>>> >>>>>>>>> The log below shows that a zImage and DTB are loaded at 0x18eb8000 and >>>>>>>>> 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but >>>>>>>>> it is decompressed to 0x00008000 (see r4 reported before jumping from >>>>>>>>> within __enter_kernel). If I were to suggest something, there need to be >>>>>>>>> one more bit of information passed in the DTB telling the decompressor >>>>>>>>> to use the old masking technique to determain kernel address. It would >>>>>>>>> be set in the DTB loaded along with the crashdump kernel. >> [...] >>>>>>>> Describing "to use the old masking technique" sounds a bit hackish to me. >>>>>>>> I guess it cannot just restrict the /memory node to the reserved region, >>>>>>>> as the crashkernel needs to be able to dump the remains of the crashed >>>>>>>> kernel, which lie outside this region. >>>>>>> >>>>>>> Correct. >>>>>>> >>>>>>>> However, something under /chosen should work. >>>>>>> >>>>>>> Yet another sticky plaster... >>>>>> >>>>>> IMHO the old masking technique is the hacky solution covered by >>>>>> plasters. >>>>> >>>>> One line of code is not "covered by plasters". There are no plasters. >>>>> It's a solution that works for 99.99% of people, unlike your approach >>>>> that has had a stream of issues over the last four months, and has >>>>> required many reworks of the code to fix each one. That in itself >>>>> speaks volumes about the suitability of the approach. >>>> >>>> As I have been working with kexec code (patches soon) I would like to >>>> defend the DT approach a bit. It allows to avoid zImage relocation when >>>> a decompressed kernel is larger than ~128MiB. In such case zImage isn't >>>> small either and moving it around takes some time. >>> >>> ... which is something that has been supported for a very long time, >>> before the days of DT. >> >> How? If a decompressed kernel requires >128M and a bootloader would like >> to put a zImage high enough to *avoid* copying it once again, then the >> decompressor can't see any memory below the 128M window it starts in and >> can't decompress the kernel there. > > Do you have such a large kernel? It would be rather inefficient as > branch instructions could not be used; every function call would have > to be indirect. The maximum is +/- 32MB for a branch. This number includes data, particularly initramfs which may be linked into the kernel. Assuming kernel <32MB (15MB like min ATM) we are left with slightly more than 100MB. Which isn't that much if you would like it to be root file system for your device. Of course initramfs could be loaded separately by a bootloader and yet having both kernel and initramfs in one file has some advantages. I am not here to argue. It's your call. >> If we do not care about copying >> zImage, then, indeed, everything works fine as it is today. You are >> most probably right 99% doesn't require 128M kernel, but the case is >> IMHO obvious enough, that it should be adressed somehow. > > If I have a kernel in excess of 4GB... "it should be addressed somehow"! I believe software and hardware limitations should not be compared this way.
On Tue, May 19, 2020 at 03:56:59PM +0200, Ard Biesheuvel wrote: > On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > Hi Russell, > > > > CC devicetree > > > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > > > > Currently, the start address of physical memory is obtained by masking > > > > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > > > > was chosen as a balance between the requirements of different platforms. > > > > > > However, this does require that the start address of physical memory is > > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > > > > requirement is not fulfilled. > > > > > > > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > > > > if available (either explicitly passed, or appended to the kernel). > > > > > > Fall back to the traditional method when needed. > > > > > > > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > > > > i.e. not at a multiple of 128 MiB. > > > > > > > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > > > > --- > > > > > > > > > > [...] > > > > > > > > > > Apparently reading physical memory layout from DTB breaks crashdump > > > > > kernels. A crashdump kernel is loaded into a region of memory, that is > > > > > reserved in the original (i.e. to be crashed) kernel. The reserved > > > > > region is large enough for the crashdump kernel to run completely inside > > > > > it and don't modify anything outside it, just read and dump the remains > > > > > of the crashed kernel. Using the information from DTB makes the > > > > > decompressor place the kernel outside of the dedicated region. > > > > > > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > > > > within __enter_kernel). If I were to suggest something, there need to be > > > > > one more bit of information passed in the DTB telling the decompressor > > > > > to use the old masking technique to determain kernel address. It would > > > > > be set in the DTB loaded along with the crashdump kernel. > > > > > > > > Shouldn't the DTB passed to the crashkernel describe which region of > > > > memory is to be used instead? > > > > > > Definitely not. The crashkernel needs to know where the RAM in the > > > machine is, so that it can create a coredump of the crashed kernel. > > > > So the DTB should describe both ;-) > > > > > > Describing "to use the old masking technique" sounds a bit hackish to me. > > > > I guess it cannot just restrict the /memory node to the reserved region, > > > > as the crashkernel needs to be able to dump the remains of the crashed > > > > kernel, which lie outside this region. > > > > > > Correct. > > > > > > > However, something under /chosen should work. > > > > > > Yet another sticky plaster... > > > > IMHO the old masking technique is the hacky solution covered by > > plasters. > > > > I think debating which solution is the hacky one will not get us anywhere. > > The simple reality is that the existing solution works fine for > existing platforms, and so any changes in the logic will have to be > opt-in in one way or the other. > > Since U-boot supports EFI boot these days, one potential option is to > rely on that. I have some changes implementing this that go on top of > this patch, but they don't actually rely on it - it was just to > prevent lexical conflicts. > > The only remaining options imo are a kernel command line option, or a > DT property that tells the decompressor to look at the memory nodes. > But using the DT memory nodes on all platforms like this patch does is > obviously just too risky. > > On another note, I do think the usable-memory-region property should > be implemented for ARM as well - relying on this rounding to ensure > that the decompressor does the right thing is too fragile. What is "too fragile" is trying to change this and expecting everything to continue working as it did before. How will switching to EFI help? Won't the crashdump kernel detect EFI and try to get the memory map from EFI, whereby it runs into exactly the same issue that the DT approach does? The current crashkernel situation works precisely because of the 128M masking that is being done.
On Tue, 19 May 2020 at 16:29, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Tue, May 19, 2020 at 03:56:59PM +0200, Ard Biesheuvel wrote: > > On Tue, 19 May 2020 at 13:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > Hi Russell, > > > > > > CC devicetree > > > > > > On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote: > > > > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@samsung.com> wrote: > > > > > > It was <2020-04-29 śro 10:21>, when Geert Uytterhoeven wrote: > > > > > > > Currently, the start address of physical memory is obtained by masking > > > > > > > the program counter with a fixed mask of 0xf8000000. This mask value > > > > > > > was chosen as a balance between the requirements of different platforms. > > > > > > > However, this does require that the start address of physical memory is > > > > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this > > > > > > > requirement is not fulfilled. > > > > > > > > > > > > > > Fix this limitation by obtaining the start address from the DTB instead, > > > > > > > if available (either explicitly passed, or appended to the kernel). > > > > > > > Fall back to the traditional method when needed. > > > > > > > > > > > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM > > > > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space), > > > > > > > i.e. not at a multiple of 128 MiB. > > > > > > > > > > > > > > Suggested-by: Nicolas Pitre <nico@fluxnic.net> > > > > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > > > > > Reviewed-by: Nicolas Pitre <nico@fluxnic.net> > > > > > > > Reviewed-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > > > > > Tested-by: Dmitry Osipenko <digetx@gmail.com> > > > > > > > --- > > > > > > > > > > > > [...] > > > > > > > > > > > > Apparently reading physical memory layout from DTB breaks crashdump > > > > > > kernels. A crashdump kernel is loaded into a region of memory, that is > > > > > > reserved in the original (i.e. to be crashed) kernel. The reserved > > > > > > region is large enough for the crashdump kernel to run completely inside > > > > > > it and don't modify anything outside it, just read and dump the remains > > > > > > of the crashed kernel. Using the information from DTB makes the > > > > > > decompressor place the kernel outside of the dedicated region. > > > > > > > > > > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and > > > > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but > > > > > > it is decompressed to 0x00008000 (see r4 reported before jumping from > > > > > > within __enter_kernel). If I were to suggest something, there need to be > > > > > > one more bit of information passed in the DTB telling the decompressor > > > > > > to use the old masking technique to determain kernel address. It would > > > > > > be set in the DTB loaded along with the crashdump kernel. > > > > > > > > > > Shouldn't the DTB passed to the crashkernel describe which region of > > > > > memory is to be used instead? > > > > > > > > Definitely not. The crashkernel needs to know where the RAM in the > > > > machine is, so that it can create a coredump of the crashed kernel. > > > > > > So the DTB should describe both ;-) > > > > > > > > Describing "to use the old masking technique" sounds a bit hackish to me. > > > > > I guess it cannot just restrict the /memory node to the reserved region, > > > > > as the crashkernel needs to be able to dump the remains of the crashed > > > > > kernel, which lie outside this region. > > > > > > > > Correct. > > > > > > > > > However, something under /chosen should work. > > > > > > > > Yet another sticky plaster... > > > > > > IMHO the old masking technique is the hacky solution covered by > > > plasters. > > > > > > > I think debating which solution is the hacky one will not get us anywhere. > > > > The simple reality is that the existing solution works fine for > > existing platforms, and so any changes in the logic will have to be > > opt-in in one way or the other. > > > > Since U-boot supports EFI boot these days, one potential option is to > > rely on that. I have some changes implementing this that go on top of > > this patch, but they don't actually rely on it - it was just to > > prevent lexical conflicts. > > > > The only remaining options imo are a kernel command line option, or a > > DT property that tells the decompressor to look at the memory nodes. > > But using the DT memory nodes on all platforms like this patch does is > > obviously just too risky. > > > > On another note, I do think the usable-memory-region property should > > be implemented for ARM as well - relying on this rounding to ensure > > that the decompressor does the right thing is too fragile. > > What is "too fragile" is trying to change this and expecting everything > to continue working as it did before. > That is my point. > How will switching to EFI help? Won't the crashdump kernel detect EFI > and try to get the memory map from EFI, whereby it runs into exactly > the same issue that the DT approach does? > No. If you boot from kexec, then the EFI stub is completely circumvented, and things work as before. > The current crashkernel situation works precisely because of the 128M > masking that is being done. > Indeed. That is precisely my point.
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 00602a6fba04733f..c873d3882375f5e5 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -81,11 +81,14 @@ libfdt_objs := fdt_rw.o fdt_ro.o fdt_wip.o fdt.o ifeq ($(CONFIG_ARM_ATAG_DTB_COMPAT),y) OBJS += $(libfdt_objs) atags_to_fdt.o endif +ifeq ($(CONFIG_USE_OF),y) +OBJS += $(libfdt_objs) fdt_get_mem_start.o +endif # -fstack-protector-strong triggers protection checks in this code, # but it is being used too early to link to meaningful stack_chk logic. nossp-flags-$(CONFIG_CC_HAS_STACKPROTECTOR_NONE) := -fno-stack-protector -$(foreach o, $(libfdt_objs) atags_to_fdt.o, \ +$(foreach o, $(libfdt_objs) atags_to_fdt.o fdt_get_mem_start.o, \ $(eval CFLAGS_$(o) := -I $(srctree)/scripts/dtc/libfdt $(nossp-flags-y))) # These were previously generated C files. When you are building the kernel diff --git a/arch/arm/boot/compressed/fdt_get_mem_start.c b/arch/arm/boot/compressed/fdt_get_mem_start.c new file mode 100644 index 0000000000000000..ae71fde731b869d7 --- /dev/null +++ b/arch/arm/boot/compressed/fdt_get_mem_start.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/kernel.h> +#include <linux/libfdt.h> +#include <linux/sizes.h> + +static const void *getprop(const void *fdt, const char *node_path, + const char *property) +{ + int offset = fdt_path_offset(fdt, node_path); + + if (offset == -FDT_ERR_NOTFOUND) + return NULL; + + return fdt_getprop(fdt, offset, property, NULL); +} + +static uint32_t get_addr_size(const void *fdt) +{ + const __be32 *addr_len = getprop(fdt, "/", "#address-cells"); + + if (!addr_len) { + /* default */ + return 1; + } + + return fdt32_to_cpu(*addr_len); +} + +/* + * Get the start of physical memory + */ + +unsigned long fdt_get_mem_start(const void *fdt) +{ + uint32_t addr_size, mem_start; + const __be32 *memory; + + if (!fdt) + return -1; + + if (*(__be32 *)fdt != cpu_to_fdt32(FDT_MAGIC)) + return -1; + + /* Find the first memory node */ + memory = getprop(fdt, "/memory", "reg"); + if (!memory) + return -1; + + /* There may be multiple cells on LPAE platforms */ + addr_size = get_addr_size(fdt); + + mem_start = fdt32_to_cpu(memory[addr_size - 1]); + /* Must be a multiple of 16 MiB for phys/virt patching */ + return round_up(mem_start, SZ_16M); +} diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index e8e1c866e413a287..1d7c86624b3eeddc 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -254,8 +254,58 @@ not_angel: .text #ifdef CONFIG_AUTO_ZRELADDR +#ifdef CONFIG_USE_OF /* - * Find the start of physical memory. As we are executing + * Find the start of physical memory. + * Try the DTB first, if available. + */ + adr r0, LC0 + ldr r1, [r0] @ get absolute LC0 + ldr sp, [r0, #24] @ get stack location + sub r1, r0, r1 @ compute relocation offset + add sp, sp, r1 @ apply relocation + +#ifdef CONFIG_ARM_APPENDED_DTB + /* + * Look for an appended DTB. If found, use it and + * move stack away from it. + */ + ldr r6, [r0, #12] @ get &_edata + add r6, r6, r1 @ relocate it + ldmia r6, {r0, r5} @ get DTB signature and size +#ifndef __ARMEB__ + ldr r1, =0xedfe0dd0 @ sig is 0xd00dfeed big endian + /* convert DTB size to little endian */ + eor r2, r5, r5, ror #16 + bic r2, r2, #0x00ff0000 + mov r5, r5, ror #8 + eor r5, r5, r2, lsr #8 +#else + ldr r1, =0xd00dfeed +#endif + cmp r0, r1 @ do we have a DTB there? + bne 1f + + /* preserve 64-bit alignment */ + add r5, r5, #7 + bic r5, r5, #7 + add sp, sp, r5 @ if so, move stack above DTB + mov r0, r6 @ and extract memory start from DTB + b 2f + +1: +#endif /* CONFIG_ARM_APPENDED_DTB */ + + mov r0, r8 +2: + bl fdt_get_mem_start + mov r4, r0 + cmp r0, #-1 + bne 1f +#endif /* CONFIG_USE_OF */ + + /* + * Fall back to the traditional method. As we are executing * without the MMU on, we are in the physical address space. * We just need to get rid of any offset by aligning the * address. @@ -273,6 +323,8 @@ not_angel: */ mov r4, pc and r4, r4, #0xf8000000 + +1: /* Determine final kernel image address. */ add r4, r4, #TEXT_OFFSET #else