diff mbox

[v24,5/9] arm64: kdump: add kdump support

Message ID 20160819012651.GE20080@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

AKASHI Takahiro Aug. 19, 2016, 1:26 a.m. UTC
James,

On Thu, Aug 18, 2016 at 04:15:48PM +0900, AKASHI Takahiro wrote:
> Hi James, Pratyush,
> 
> Thank you for your testing and reporting an issue.
> I've been on vacation until yesterday.
> 
> On Wed, Aug 10, 2016 at 05:38:05PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 09/08/16 02:56, AKASHI Takahiro wrote:
> > > On crash dump kernel, all the information about primary kernel's system
> > > memory (core image) is available in elf core header.
> > > The primary kernel will set aside this header with reserve_elfcorehdr()
> > > at boot time and inform crash dump kernel of its location via a new
> > > device-tree property, "linux,elfcorehdr".
> > > 
> > > Please note that all other architectures use traditional "elfcorehdr="
> > > kernel parameter for this purpose.
> > > 
> > > Then crash dump kernel will access the primary kernel's memory with
> > > copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> > > not reside in linear mapping on crash dump kernel.
> > > 
> > > We also need our own elfcorehdr_read() here since the header is placed
> > > within crash dump kernel's usable memory.
> > 
> > On Seattle when I panic and boot the kdump kernel, I am unable to read the
> > /proc/vmcore file. Instead I get:
> > nanook@frikadeller:~$ sudo cp /proc/vmcore /
> > [  174.393875] Unhandled fault: synchronous external abort (0x96000210) at
> > 0xffffff80096b6000
> > [  174.402158] Internal error: : 96000210 [#1] PREEMPT SMP
> > [  174.407370] Modules linked in:
> > [  174.410417] CPU: 6 PID: 2059 Comm: cp Tainted: G S      W I     4.8.0-rc1+ #4708
> > [  174.417799] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS
> > ROD1002C 04/08/2016
> > [  174.426396] task: ffffffc0fdec5780 task.stack: ffffffc0f34bc000
> > [  174.432313] PC is at __arch_copy_to_user+0x180/0x280
> > [  174.437274] LR is at copy_oldmem_page+0xac/0xf0
> > [  174.441791] pc : [<ffffff800835e080>] lr : [<ffffff8008095b9c>] pstate: 20000145
> > [  174.449173] sp : ffffffc0f34bfc90
> > [  174.452474] x29: ffffffc0f34bfc90 x28: 0000000000000000
> > [  174.457776] x27: 0000000008000000 x26: 000000000000d000
> > [  174.463077] x25: 0000000000000001 x24: ffffff8008eb5000
> > [  174.468378] x23: 0000000000000000 x22: ffffff80096b6000
> > [  174.473679] x21: 0000000000000001 x20: 0000000030127000
> > [  174.478979] x19: 0000000000001000 x18: 0000007ff7085d60
> > [  174.484279] x17: 0000000000429358 x16: ffffff80081d9e88
> > [  174.489579] x15: 0000007fae377590 x14: 0000000000000000
> > [  174.494880] x13: 0000000000000000 x12: ffffff8008dd1000
> > [  174.500180] x11: ffffff80096b6fff x10: ffffff80096b6fff
> > [  174.505480] x9 : 0000000040000000 x8 : ffffff8008db6000
> > [  174.510781] x7 : ffffff80096b7000 x6 : 0000000030127000
> > [  174.516082] x5 : 0000000030128000 x4 : 0000000000000000
> > [  174.521382] x3 : 00e8000000000713 x2 : 0000000000000f80
> > [  174.526682] x1 : ffffff80096b6000 x0 : 0000000030127000
> > [  174.531982]
> > [  174.533461] Process cp (pid: 2059, stack limit = 0xffffffc0f34bc020)
> > 
> > [  174.848448] [<ffffff800835e080>] __arch_copy_to_user+0x180/0x280
> > [  174.854448] [<ffffff8008245f34>] read_from_oldmem.part.4+0xb4/0xf4
> > [  174.860615] [<ffffff8008246074>] read_vmcore+0x100/0x22c
> > [  174.865919] [<ffffff8008239378>] proc_reg_read+0x64/0x90
> > [  174.871223] [<ffffff80081d7da8>] __vfs_read+0x28/0x108
> > [  174.876348] [<ffffff80081d8ae4>] vfs_read+0x84/0x144
> > [  174.881301] [<ffffff80081d9ecc>] SyS_read+0x44/0xa0
> > [  174.886167] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
> > [  174.891466] Code: 00000000 00000000 00000000 00000000 (a8c12027)
> > [  174.897562] ---[ end trace 00801b2e35b0cd1f ]---
> > 
> > 
> > The offending call is:
> > > copy_oldmem_page(0x8000000, 0x00000000385f8000, 0x1000, 0, 1)
> > 
> > This is trying to access the bottom page of memory. From the efi memory map:
> > > efi:   0x008000000000-0x008001e7ffff [Runtime Data       |RUN|  |WB|WT|WC|UC]*
> > > efi:   0x008001e80000-0x008001ffffff [Conventional Memory|   |  |WB|WT|WC|UC]
> > 
> > This page is 'Runtime Data', and marked as nomap by both the original and kdump
> > kernels, but copy_oldmem_page() doesn't know this.
> > 
> > In this case because we have already parsed the efi memory map again in the
> > kdump kernel and re-marked these regions as nomap, the below hunk fixes the
> > problem for me:
> > =========================%<=========================
> > diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> > index 2dc54d129be1..784d4c30b534 100644
> > --- a/arch/arm64/kernel/crash_dump.c
> > +++ b/arch/arm64/kernel/crash_dump.c
> > @@ -37,6 +37,11 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> >         if (!csize)
> >                 return 0;
> > 
> > +       if (memblock_is_memory(pfn << PAGE_SHIFT) &&
> > +           !memblock_is_map_memory(pfn << PAGE_SHIFT))
> > +               /* skip this nomap memory region, reserved by firmware */
> > +               return 0;
> > +
> >         vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);
> 
> Here I'm wandering why my original code doesn't work.
> If !memblock_is_map_memory(), ioremap_cache() would call __ioremap_caller()
> and return a valid virtual address mapped in vmalloc area.
> 
> >         if (!vaddr)
> >                 return -ENOMEM;
> > =========================%<=========================
> > 
> > With this I can copy the vmcore file, and feed it to crash to read dmesg, task
> > list etc...
> > 
> > This could be a deeper/wider issue, but I can't see any other users of
> > memblock_mark_nomap().
> > Do you think depending on this this 're-learning' is robust enough, or should
> > the nomap ranges be described in the vmcoreinfo elf notes?
> 
> The current kexec-tools identifies all the memory regions from
> /proc/iomem and there is no way for user space tools to distinguish
> "EFI runtime data," or any other nomap memory, from normal "System RAM"
> because all those resources are currently marked as "System RAM."
> 
> So I think that such regions should be marked as, say, "reserved,"
> so that we can exclude those memories from a crush dump file.

Can you try the following change?
If it fixes your problem, I will post it as a patch.

Thanks,
-Takahiro AKASHI

Comments

Pratyush Anand Aug. 19, 2016, 11:22 a.m. UTC | #1
On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Fri, 19 Aug 2016 09:57:52 +0900
> Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> 
> ---
>  arch/arm64/kernel/setup.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 38eda13..38589b5 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
>  
>  	for_each_memblock(memory, region) {
>  		res = alloc_bootmem_low(sizeof(*res));
> -		res->name  = "System RAM";
> +		if (memblock_is_nomap(region)) {
> +			res->name  = "reserved";
> +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +		} else {
> +			res->name  = "System RAM";
> +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +		}
>  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
>  		request_resource(&iomem_resource, res);


It will help kexec-tools to prevent copying  of any unnecessary data. I
think, then you also need to change phys_offset calculation in kexec-tools. That
should be start of either of first "reserved" or "System RAM" block.

~Pratyush
James Morse Aug. 19, 2016, 1:28 p.m. UTC | #2
On 19/08/16 02:26, AKASHI Takahiro wrote:
> Can you try the following change?
> If it fixes your problem, I will post it as a patch.

Almost! This causes booting with acpi=on to fail for the familiar
alignment-fault reasons[2],
details and a suggested fix below.

I think we should have this change as it matches x86's use of acpi, and means we
don't rely on re-parsing the efi memory map to learn which areas of memory
shouldn't be in the vmcore.


> ===8<===
> From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Fri, 19 Aug 2016 09:57:52 +0900
> Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> 
> ---
>  arch/arm64/kernel/setup.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 38eda13..38589b5 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
>  
>  	for_each_memblock(memory, region) {
>  		res = alloc_bootmem_low(sizeof(*res));
> -		res->name  = "System RAM";
> +		if (memblock_is_nomap(region)) {
> +			res->name  = "reserved";

> +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

This causes acpica to choke.  arch/arm64/include/asm/acpi.h:acpi_os_ioremap()
calls page_is_ram(), which expects IORESOURCE_SYSTEM_RAM. From kernel/resource.c:
> /*
>  * This generic page_is_ram() returns true if specified address is
>  * registered as System RAM in iomem_resource list.
>  */
> int __weak page_is_ram(unsigned long pfn)

We are trying to infer information about the EFI memory map by looking through
iomem_resource list generated from memblock.
drivers/firmware/efi/arm-init.c:reserve_regions() adds memory with the WB
attribute to memblock via early_init_dt_add_memory_arch(), so changing
page_is_ram() for memblock_is_memory() is one step closer to checking the
attributes in the efi memory map (which turns out to tricky).

With your v24 on v4.8-rc1 and 'mark reserved memblock regions explicitly', and
this extra hack [0], I can boot, kdump, extract the vmcore (with read() and
mmap()), and pull things out of it with crash.


Thanks,

James


> +		} else {
> +			res->name  = "System RAM";
> +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +		}
>  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  
>  		request_resource(&iomem_resource, res);

[0] whitespace damaged hack for acpi_os_ioremap().
------------------------------------%<------------------------------------
+++ b/arch/arm64/include/asm/acpi.h
@@ -12,6 +12,7 @@
 #ifndef _ASM_ACPI_H
 #define _ASM_ACPI_H

+#include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/psci.h>

@@ -32,7 +33,11 @@
 static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
                                            acpi_size size)
 {
-       if (!page_is_ram(phys >> PAGE_SHIFT))
+       /*
+        * EFI's reserve_regions() call adds memory with the WB attribute
+        * to memblock via early_init_dt_add_memory_arch().
+        */
+       if (!memblock_is_memory(phys))
                return ioremap(phys, size);

        return ioremap_cache(phys, size);
------------------------------------%<------------------------------------

[1] cat /proc/iomem | tail -n 9
8000000000-8001e7ffff : reserved
8001e80000-83ff17ffff : System RAM
  8002080000-8002b0ffff : Kernel code
  8002d90000-8002f6ffff : Kernel data
  80dfe00000-80ffdfffff : Crash kernel
83ff180000-83ff1cffff : reserved
83ff1d0000-83ff21ffff : System RAM
83ff220000-83ffe4ffff : reserved
83ffe50000-83ffffffff : System RAM

[2] Panic on boot without [0]
[    0.037098] ACPI: Core revision 20160422
[    0.041174] Unable to handle kernel paging request at virtual address
fffffc0008f831e7
[    0.049155] pgd = fffffc0008f50000
[    0.052576] [fffffc0008f831e7] *pgd=00000083fffe0003, *pud=00000083fffe0003,
*pmd=00000083fffe0003, *pte=00e80083ff1c0707
[    0.063632] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[    0.069245] Modules linked in:
[    0.072317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1+ #4892
[    0.078891] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive)
(DT)
[    0.086518] task: fffffc0008dc8a80 task.stack: fffffc0008d90000
[    0.092486] PC is at acpi_ns_lookup+0x238/0x378
[    0.097049] LR is at acpi_ds_load1_begin_op+0x88/0x260
[    0.102290] pc : [<fffffc000840fed8>] lr : [<fffffc0008405dcc>] pstate: 60000045
[    0.109740] sp : fffffc0008d93bb0
[    0.113071] x29: fffffc0008d93bb0 x28: 0000000000000001
[    0.118420] x27: 0000000000000000 x26: 000000000000001b
[    0.123768] x25: fffffc0008a813b7 x24: 000000000000001b
[    0.129119] x23: 000000000000001b x22: 0000000000000001
[    0.134466] x21: fffffc0008d93cb8 x20: 0000000000000001
[    0.139814] x19: fffffc0008f831e7 x18: 0000000000000023
[    0.145163] x17: ffffffffffffffff x16: 0000000000000000
[    0.150511] x15: 000000000000038e x14: 0000000000007fff
[    0.155859] x13: ffffffffffffff00 x12: 0000000000000008
[    0.161208] x11: 0000000000000028 x10: 00000000ffffff76
[    0.166556] x9 : 0000000000000000 x8 : fffffe03c00b0140
[    0.171907] x7 : 0000000000000003 x6 : fffffc0008d93cb8
[    0.177255] x5 : fffffe03c0060800 x4 : fffffc0008ee7fe8
[    0.182603] x3 : fffffc0008ee8000 x2 : fffffc0008ee7fe8
[    0.187951] x1 : fffffc0008f831e7 x0 : 0000000000000000
[    0.193296]
[    0.194789] Process swapper/0 (pid: 0, stack limit = 0xfffffc0008d90020)
[    0.483605] Call trace:
[    0.486062] Exception stack(0xfffffc0008d939e0 to 0xfffffc0008d93b10)
[    0.568522] [<fffffc000840fed8>] acpi_ns_lookup+0x238/0x378
[    0.574134] [<fffffc0008405dcc>] acpi_ds_load1_begin_op+0x88/0x260
[    0.580358] [<fffffc0008415e3c>] acpi_ps_build_named_op+0xa8/0x170
[    0.586582] [<fffffc0008416034>] acpi_ps_create_op+0x130/0x230
[    0.592455] [<fffffc0008415988>] acpi_ps_parse_loop+0x174/0x580
[    0.598419] [<fffffc00084168b4>] acpi_ps_parse_aml+0xa8/0x278
[    0.604208] [<fffffc0008411efc>] acpi_ns_one_complete_parse+0x130/0x158
[    0.610871] [<fffffc0008411f48>] acpi_ns_parse_table+0x24/0x44
[    0.616745] [<fffffc00084116fc>] acpi_ns_load_table+0x50/0xe0
[    0.622535] [<fffffc000841ad80>] acpi_tb_load_namespace+0xdc/0x25c
[    0.628764] [<fffffc0008b383f4>] acpi_load_tables+0x3c/0xa8
[    0.634377] [<fffffc0008b37548>] acpi_early_init+0x88/0xbc
[    0.639903] [<fffffc0008b10b30>] start_kernel+0x330/0x394
[    0.645340] [<fffffc0008b10210>] __primary_switched+0x7c/0x8c
[    0.651127] Code: 2a1a03f7 2a0002d6 36380054 321902d6 (b9400260)
[    0.657270] ---[ end trace 0000000000000000 ]---
[    0.661922] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.668673] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
AKASHI Takahiro Aug. 22, 2016, 1:23 a.m. UTC | #3
James,

On Fri, Aug 19, 2016 at 02:28:06PM +0100, James Morse wrote:
> On 19/08/16 02:26, AKASHI Takahiro wrote:
> > Can you try the following change?
> > If it fixes your problem, I will post it as a patch.
> 
> Almost! This causes booting with acpi=on to fail for the familiar
> alignment-fault reasons[2],
> details and a suggested fix below.

Thank you for the fix.
I will merge your hunk to the patch.

-Takahiro AKASHI

> I think we should have this change as it matches x86's use of acpi, and means we
> don't rely on re-parsing the efi memory map to learn which areas of memory
> shouldn't be in the vmcore.
> 
> 
> > ===8<===
> > From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > 
> > ---
> >  arch/arm64/kernel/setup.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 38eda13..38589b5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> >  
> >  	for_each_memblock(memory, region) {
> >  		res = alloc_bootmem_low(sizeof(*res));
> > -		res->name  = "System RAM";
> > +		if (memblock_is_nomap(region)) {
> > +			res->name  = "reserved";
> 
> > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> 
> This causes acpica to choke.  arch/arm64/include/asm/acpi.h:acpi_os_ioremap()
> calls page_is_ram(), which expects IORESOURCE_SYSTEM_RAM. From kernel/resource.c:
> > /*
> >  * This generic page_is_ram() returns true if specified address is
> >  * registered as System RAM in iomem_resource list.
> >  */
> > int __weak page_is_ram(unsigned long pfn)
> 
> We are trying to infer information about the EFI memory map by looking through
> iomem_resource list generated from memblock.
> drivers/firmware/efi/arm-init.c:reserve_regions() adds memory with the WB
> attribute to memblock via early_init_dt_add_memory_arch(), so changing
> page_is_ram() for memblock_is_memory() is one step closer to checking the
> attributes in the efi memory map (which turns out to tricky).
> 
> With your v24 on v4.8-rc1 and 'mark reserved memblock regions explicitly', and
> this extra hack [0], I can boot, kdump, extract the vmcore (with read() and
> mmap()), and pull things out of it with crash.
> 
> 
> Thanks,
> 
> James
> 
> 
> > +		} else {
> > +			res->name  = "System RAM";
> > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > +		}
> >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >  
> >  		request_resource(&iomem_resource, res);
> 
> [0] whitespace damaged hack for acpi_os_ioremap().
> ------------------------------------%<------------------------------------
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -12,6 +12,7 @@
>  #ifndef _ASM_ACPI_H
>  #define _ASM_ACPI_H
> 
> +#include <linux/memblock.h>
>  #include <linux/mm.h>
>  #include <linux/psci.h>
> 
> @@ -32,7 +33,11 @@
>  static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
>                                             acpi_size size)
>  {
> -       if (!page_is_ram(phys >> PAGE_SHIFT))
> +       /*
> +        * EFI's reserve_regions() call adds memory with the WB attribute
> +        * to memblock via early_init_dt_add_memory_arch().
> +        */
> +       if (!memblock_is_memory(phys))
>                 return ioremap(phys, size);
> 
>         return ioremap_cache(phys, size);
> ------------------------------------%<------------------------------------
> 
> [1] cat /proc/iomem | tail -n 9
> 8000000000-8001e7ffff : reserved
> 8001e80000-83ff17ffff : System RAM
>   8002080000-8002b0ffff : Kernel code
>   8002d90000-8002f6ffff : Kernel data
>   80dfe00000-80ffdfffff : Crash kernel
> 83ff180000-83ff1cffff : reserved
> 83ff1d0000-83ff21ffff : System RAM
> 83ff220000-83ffe4ffff : reserved
> 83ffe50000-83ffffffff : System RAM
> 
> [2] Panic on boot without [0]
> [    0.037098] ACPI: Core revision 20160422
> [    0.041174] Unable to handle kernel paging request at virtual address
> fffffc0008f831e7
> [    0.049155] pgd = fffffc0008f50000
> [    0.052576] [fffffc0008f831e7] *pgd=00000083fffe0003, *pud=00000083fffe0003,
> *pmd=00000083fffe0003, *pte=00e80083ff1c0707
> [    0.063632] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> [    0.069245] Modules linked in:
> [    0.072317] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1+ #4892
> [    0.078891] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive)
> (DT)
> [    0.086518] task: fffffc0008dc8a80 task.stack: fffffc0008d90000
> [    0.092486] PC is at acpi_ns_lookup+0x238/0x378
> [    0.097049] LR is at acpi_ds_load1_begin_op+0x88/0x260
> [    0.102290] pc : [<fffffc000840fed8>] lr : [<fffffc0008405dcc>] pstate: 60000045
> [    0.109740] sp : fffffc0008d93bb0
> [    0.113071] x29: fffffc0008d93bb0 x28: 0000000000000001
> [    0.118420] x27: 0000000000000000 x26: 000000000000001b
> [    0.123768] x25: fffffc0008a813b7 x24: 000000000000001b
> [    0.129119] x23: 000000000000001b x22: 0000000000000001
> [    0.134466] x21: fffffc0008d93cb8 x20: 0000000000000001
> [    0.139814] x19: fffffc0008f831e7 x18: 0000000000000023
> [    0.145163] x17: ffffffffffffffff x16: 0000000000000000
> [    0.150511] x15: 000000000000038e x14: 0000000000007fff
> [    0.155859] x13: ffffffffffffff00 x12: 0000000000000008
> [    0.161208] x11: 0000000000000028 x10: 00000000ffffff76
> [    0.166556] x9 : 0000000000000000 x8 : fffffe03c00b0140
> [    0.171907] x7 : 0000000000000003 x6 : fffffc0008d93cb8
> [    0.177255] x5 : fffffe03c0060800 x4 : fffffc0008ee7fe8
> [    0.182603] x3 : fffffc0008ee8000 x2 : fffffc0008ee7fe8
> [    0.187951] x1 : fffffc0008f831e7 x0 : 0000000000000000
> [    0.193296]
> [    0.194789] Process swapper/0 (pid: 0, stack limit = 0xfffffc0008d90020)
> [    0.483605] Call trace:
> [    0.486062] Exception stack(0xfffffc0008d939e0 to 0xfffffc0008d93b10)
> [    0.568522] [<fffffc000840fed8>] acpi_ns_lookup+0x238/0x378
> [    0.574134] [<fffffc0008405dcc>] acpi_ds_load1_begin_op+0x88/0x260
> [    0.580358] [<fffffc0008415e3c>] acpi_ps_build_named_op+0xa8/0x170
> [    0.586582] [<fffffc0008416034>] acpi_ps_create_op+0x130/0x230
> [    0.592455] [<fffffc0008415988>] acpi_ps_parse_loop+0x174/0x580
> [    0.598419] [<fffffc00084168b4>] acpi_ps_parse_aml+0xa8/0x278
> [    0.604208] [<fffffc0008411efc>] acpi_ns_one_complete_parse+0x130/0x158
> [    0.610871] [<fffffc0008411f48>] acpi_ns_parse_table+0x24/0x44
> [    0.616745] [<fffffc00084116fc>] acpi_ns_load_table+0x50/0xe0
> [    0.622535] [<fffffc000841ad80>] acpi_tb_load_namespace+0xdc/0x25c
> [    0.628764] [<fffffc0008b383f4>] acpi_load_tables+0x3c/0xa8
> [    0.634377] [<fffffc0008b37548>] acpi_early_init+0x88/0xbc
> [    0.639903] [<fffffc0008b10b30>] start_kernel+0x330/0x394
> [    0.645340] [<fffffc0008b10210>] __primary_switched+0x7c/0x8c
> [    0.651127] Code: 2a1a03f7 2a0002d6 36380054 321902d6 (b9400260)
> [    0.657270] ---[ end trace 0000000000000000 ]---
> [    0.661922] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.668673] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
>
AKASHI Takahiro Aug. 22, 2016, 1:29 a.m. UTC | #4
On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> > >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > 
> > ---
> >  arch/arm64/kernel/setup.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 38eda13..38589b5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> >  
> >  	for_each_memblock(memory, region) {
> >  		res = alloc_bootmem_low(sizeof(*res));
> > -		res->name  = "System RAM";
> > +		if (memblock_is_nomap(region)) {
> > +			res->name  = "reserved";
> > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > +		} else {
> > +			res->name  = "System RAM";
> > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > +		}
> >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >  
> >  		request_resource(&iomem_resource, res);
> 
> 
> It will help kexec-tools to prevent copying  of any unnecessary data. I
> think, then you also need to change phys_offset calculation in kexec-tools. That
> should be start of either of first "reserved" or "System RAM" block.

Good point, but I'm not sure this is always true.
Is there any system whose ACPI memory is *not* part of DRAM
(so not part of linear mapping)?

Thanks,
-Takahiro AKASHI

> ~Pratyush
Pratyush Anand Aug. 22, 2016, 7:07 a.m. UTC | #5
On 22/08/2016:10:29:20 AM, AKASHI Takahiro wrote:
> On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > On 19/08/2016:10:26:52 AM, AKASHI Takahiro wrote:
> > > >From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
> > > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > Date: Fri, 19 Aug 2016 09:57:52 +0900
> > > Subject: [PATCH] arm64: mark reserved memblock regions explicitly
> > > 
> > > ---
> > >  arch/arm64/kernel/setup.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > index 38eda13..38589b5 100644
> > > --- a/arch/arm64/kernel/setup.c
> > > +++ b/arch/arm64/kernel/setup.c
> > > @@ -205,10 +205,15 @@ static void __init request_standard_resources(void)
> > >  
> > >  	for_each_memblock(memory, region) {
> > >  		res = alloc_bootmem_low(sizeof(*res));
> > > -		res->name  = "System RAM";
> > > +		if (memblock_is_nomap(region)) {
> > > +			res->name  = "reserved";
> > > +			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > > +		} else {
> > > +			res->name  = "System RAM";
> > > +			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > > +		}
> > >  		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> > >  		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> > > -		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > >  
> > >  		request_resource(&iomem_resource, res);
> > 
> > 
> > It will help kexec-tools to prevent copying  of any unnecessary data. I
> > think, then you also need to change phys_offset calculation in kexec-tools. That
> > should be start of either of first "reserved" or "System RAM" block.
> 
> Good point, but I'm not sure this is always true.
> Is there any system whose ACPI memory is *not* part of DRAM
> (so not part of linear mapping)?
> 

Looking into kernel/resource.c:reserve_setup(), it seems that there could be
some none-DRAM area as well, which could be marked as "reserved". So, I think if
we mark nomap region as "reserved" then applications like kexec-tools may not
always identify start of DRAM correctly. Probably, we should give an unique name
to reserved system ram area.

~Pratyush
James Morse Aug. 22, 2016, 1:47 p.m. UTC | #6
On 22/08/16 02:29, AKASHI Takahiro wrote:
> On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
>> It will help kexec-tools to prevent copying  of any unnecessary data. I
>> think, then you also need to change phys_offset calculation in kexec-tools. That
>> should be start of either of first "reserved" or "System RAM" block.
> 
> Good point, but I'm not sure this is always true.

> Is there any system whose ACPI memory is *not* part of DRAM

From the spec, it looks like this is allowed.

What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
the question is what is its type and memory attributes?
The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
firmware.

It is possible these regions have to be mapped non-cacheable, page 40 has a
couple of:
> If no information about the table location exists in the UEFI memory map or
ACPI memory
> descriptors, the table is assumed to be non-cached.

reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
memory map that has a 'WB' attribute to the memblock.memory list (via
early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
this attribute and aren't in the is_reserve_region() list.

If these ACPI regions have the 'WB' attribute, we add them as memory and mark
them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
If they don't have the 'WB' attribute, then then they are left out of memblock
and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.


Thanks,

James

[0] '2.3.6 AArch64 Platforms' of version 2.6 of the UEFI spec at
http://uefi.org/specifications
AKASHI Takahiro Aug. 23, 2016, 12:38 a.m. UTC | #7
On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> On 22/08/16 02:29, AKASHI Takahiro wrote:
> > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> >> think, then you also need to change phys_offset calculation in kexec-tools. That
> >> should be start of either of first "reserved" or "System RAM" block.
> > 
> > Good point, but I'm not sure this is always true.
> 
> > Is there any system whose ACPI memory is *not* part of DRAM
> 
> From the spec, it looks like this is allowed.
> 
> What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> the question is what is its type and memory attributes?

Yes.

> The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> firmware.
> 
> It is possible these regions have to be mapped non-cacheable, page 40 has a
> couple of:
> > If no information about the table location exists in the UEFI memory map or
> ACPI memory
> > descriptors, the table is assumed to be non-cached.
> 
> reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> memory map that has a 'WB' attribute to the memblock.memory list (via
> early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> this attribute and aren't in the is_reserve_region() list.
> 
> If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> If they don't have the 'WB' attribute, then then they are left out of memblock
> and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.

Let's say,
0x1000-0x1fff: reserved (SRAM for UEFI, WB)
0x80000000-0xffffffff: System RAM (DRAM)

If, as Pratyush suggested, "reserved" resources are added to phys_offset
calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
there is no actual mapping around PAGE_OFFSET.
It won't hurt anything, but looks funny.
So we'd better not include "reserved" in phys_offset calculation anyway.
        -> Pratyush

Thanks,
-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James
> 
> [0] '2.3.6 AArch64 Platforms' of version 2.6 of the UEFI spec at
> http://uefi.org/specifications
Pratyush Anand Aug. 23, 2016, 11:23 a.m. UTC | #8
On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > >> think, then you also need to change phys_offset calculation in kexec-tools. That
> > >> should be start of either of first "reserved" or "System RAM" block.
> > > 
> > > Good point, but I'm not sure this is always true.
> > 
> > > Is there any system whose ACPI memory is *not* part of DRAM
> > 
> > From the spec, it looks like this is allowed.
> > 
> > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> > the question is what is its type and memory attributes?
> 
> Yes.
> 
> > The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> > firmware.
> > 
> > It is possible these regions have to be mapped non-cacheable, page 40 has a
> > couple of:
> > > If no information about the table location exists in the UEFI memory map or
> > ACPI memory
> > > descriptors, the table is assumed to be non-cached.
> > 
> > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> > memory map that has a 'WB' attribute to the memblock.memory list (via
> > early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> > this attribute and aren't in the is_reserve_region() list.
> > 
> > If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > If they don't have the 'WB' attribute, then then they are left out of memblock
> > and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.
> 
> Let's say,
> 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> 0x80000000-0xffffffff: System RAM (DRAM)

May be slightly more complicated:
0x80000000-0x80001fff: System RAM (DRAM) for UEFI, WB
0x80002000-0xffffffff: System RAM (DRAM)

Kernel will have phys_offset 0x80000000, however kexec-tools will calculate it
as 0x80002000.

> 
> If, as Pratyush suggested, "reserved" resources are added to phys_offset
> calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> there is no actual mapping around PAGE_OFFSET.
> It won't hurt anything, but looks funny.
> So we'd better not include "reserved" in phys_offset calculation anyway.
>         -> Pratyush

My only concern is that, then we will have different values of phys_offset in
kernel and kexec-tools, which might lead to further confusion.

~Pratyush
Dave Young Aug. 24, 2016, 8:04 a.m. UTC | #9
Ccing uefi people.

On 08/23/16 at 04:53pm, Pratyush Anand wrote:
> On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> > On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > > >> think, then you also need to change phys_offset calculation in kexec-tools. That
> > > >> should be start of either of first "reserved" or "System RAM" block.
> > > > 
> > > > Good point, but I'm not sure this is always true.
> > > 
> > > > Is there any system whose ACPI memory is *not* part of DRAM
> > > 
> > > From the spec, it looks like this is allowed.
> > > 
> > > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, so
> > > the question is what is its type and memory attributes?
> > 
> > Yes.
> > 
> > > The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory or
> > > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by the
> > > firmware.
> > > 
> > > It is possible these regions have to be mapped non-cacheable, page 40 has a
> > > couple of:
> > > > If no information about the table location exists in the UEFI memory map or
> > > ACPI memory
> > > > descriptors, the table is assumed to be non-cached.
> > > 
> > > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in the
> > > memory map that has a 'WB' attribute to the memblock.memory list (via
> > > early_init_dt_add_memory_arch()), it will also mark as no-map regions that have
> > > this attribute and aren't in the is_reserve_region() list.

Looking the arm-init.c, I suspect it missed the some efi ranges as
reserved ranges like runtime code and runtime data etc. But I might be
wrong.

Below is the is_reserve_region, it will regard any regions which is not
in the EFI_* below as normal memory, it does not include the runtime
ranges and other types.

static __init int is_reserve_region(efi_memory_desc_t *md)
{
        switch (md->type) {
        case EFI_LOADER_CODE:
        case EFI_LOADER_DATA:
        case EFI_BOOT_SERVICES_CODE:
        case EFI_BOOT_SERVICES_DATA:
        case EFI_CONVENTIONAL_MEMORY:
        case EFI_PERSISTENT_MEMORY:
                return 0;
        default:
                break;
        }
        return is_normal_ram(md);
}

Let's see the x86 do_add_efi_mem_map, the default case set all other
types as reserved. Shouldn't this be same in all arches though there's
no e820 in arm(64)?

static void __init do_add_efi_memmap(void)
{

[snip]
                switch (md->type) {
                case EFI_LOADER_CODE:
                case EFI_LOADER_DATA:
                case EFI_BOOT_SERVICES_CODE:
                case EFI_BOOT_SERVICES_DATA:
                case EFI_CONVENTIONAL_MEMORY:
                        if (md->attribute & EFI_MEMORY_WB)
                                e820_type = E820_RAM;
                        else
                                e820_type = E820_RESERVED;
                        break;
[snip]
                default:
                        /*
                         * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
                         * EFI_RUNTIME_SERVICES_DATA
                         * EFI_MEMORY_MAPPED_IO
                         * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
                         */
                        e820_type = E820_RESERVED;
                        break;
                }
[snip]
}

> > > 
> > > If these ACPI regions have the 'WB' attribute, we add them as memory and mark
> > > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > > If they don't have the 'WB' attribute, then then they are left out of memblock
> > > and aren't part of DRAM, I don't think these will show up in /proc/iomem at all.
> > 
> > Let's say,
> > 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> > 0x80000000-0xffffffff: System RAM (DRAM)
> 
> May be slightly more complicated:
> 0x80000000-0x80001fff: System RAM (DRAM) for UEFI, WB
> 0x80002000-0xffffffff: System RAM (DRAM)
> 
> Kernel will have phys_offset 0x80000000, however kexec-tools will calculate it
> as 0x80002000.
> 
> > 
> > If, as Pratyush suggested, "reserved" resources are added to phys_offset
> > calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> > there is no actual mapping around PAGE_OFFSET.
> > It won't hurt anything, but looks funny.
> > So we'd better not include "reserved" in phys_offset calculation anyway.
> >         -> Pratyush
> 
> My only concern is that, then we will have different values of phys_offset in
> kernel and kexec-tools, which might lead to further confusion.
> 
> ~Pratyush
James Morse Aug. 24, 2016, 10:25 a.m. UTC | #10
Hi Dave,

On 24/08/16 09:04, Dave Young wrote:
> Looking the arm-init.c, I suspect it missed the some efi ranges as
> reserved ranges like runtime code and runtime data etc. But I might be
> wrong.

This had me confused for too... I think I get it, my understanding is:


> static __init int is_reserve_region(efi_memory_desc_t *md)
> {
>         switch (md->type) {
>         case EFI_LOADER_CODE:
>         case EFI_LOADER_DATA:
>         case EFI_BOOT_SERVICES_CODE:
>         case EFI_BOOT_SERVICES_DATA:
>         case EFI_CONVENTIONAL_MEMORY:
>         case EFI_PERSISTENT_MEMORY:
>                 return 0;

return false - this is the list of region-types to never reserve, regardless of
memory attributes.


>         default:
>                 break;
>         }
>         return is_normal_ram(md);

If its not in the 'never reserve' list above, then we check if the region is
'normal' ram. If it is then it will end up in memblock.memory so we return true,
causing it to be marked nomap too.

reserve_regions() in that same file calls is_normal_ram() directly before adding
all regions with the WB attribute to memblock.memory via
early_init_dt_add_memory_arch().

A runtime region with the WB attribute will be caught by is_reserve_region(),
and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.


> }
> 
> Let's see the x86 do_add_efi_mem_map, the default case set all other
> types as reserved. Shouldn't this be same in all arches though there's
> no e820 in arm(64)?

> static void __init do_add_efi_memmap(void)
> {
> 
> [snip]
>                 switch (md->type) {
>                 case EFI_LOADER_CODE:
>                 case EFI_LOADER_DATA:
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
>                         if (md->attribute & EFI_MEMORY_WB)
>                                 e820_type = E820_RAM;

In this case reserve_regions() will add the memory to memblock.memory because it
has the WB attribute, and not reserve it.


>                         else
>                                 e820_type = E820_RESERVED;

Without the WB attribute, these regions are in neither memblock.memory nor
memblock.nomap.


>                         break;
> [snip]
>                 default:
>                         /*
>                          * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
>                          * EFI_RUNTIME_SERVICES_DATA
>                          * EFI_MEMORY_MAPPED_IO
>                          * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
>                          */
>                         e820_type = E820_RESERVED;
>                         break;

If any other regions has the WB attribute, it will be added to memblock.memory
and memblock.nomap. If it doesn't, it will appear in neither.


>                 }
> [snip]
> }

Does this help at all?


Thanks,

James
Dave Young Aug. 25, 2016, 1:04 a.m. UTC | #11
On 08/24/16 at 11:25am, James Morse wrote:
> Hi Dave,
> 
> On 24/08/16 09:04, Dave Young wrote:
> > Looking the arm-init.c, I suspect it missed the some efi ranges as
> > reserved ranges like runtime code and runtime data etc. But I might be
> > wrong.
> 
> This had me confused for too... I think I get it, my understanding is:
> 
James, thanks for your clarification. 
> 
> > static __init int is_reserve_region(efi_memory_desc_t *md)
> > {
> >         switch (md->type) {
> >         case EFI_LOADER_CODE:
> >         case EFI_LOADER_DATA:
> >         case EFI_BOOT_SERVICES_CODE:
> >         case EFI_BOOT_SERVICES_DATA:
> >         case EFI_CONVENTIONAL_MEMORY:
> >         case EFI_PERSISTENT_MEMORY:
> >                 return 0;
> 
> return false - this is the list of region-types to never reserve, regardless of
> memory attributes.
> 
> 
> >         default:
> >                 break;
> >         }
> >         return is_normal_ram(md);
> 
> If its not in the 'never reserve' list above, then we check if the region is
> 'normal' ram. If it is then it will end up in memblock.memory so we return true,
> causing it to be marked nomap too.
> 
> reserve_regions() in that same file calls is_normal_ram() directly before adding
> all regions with the WB attribute to memblock.memory via
> early_init_dt_add_memory_arch().
> 
> A runtime region with the WB attribute will be caught by is_reserve_region(),
> and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.

Hmm, It is not straitforward like the do_add_efi_memmap. I got it.

BTW, I believe there is same problem in arm as well as arm64, it also
need mark the runtime ranges as "reserved" /proc/iomem. 

> 
> 
> > }
> > 
> > Let's see the x86 do_add_efi_mem_map, the default case set all other
> > types as reserved. Shouldn't this be same in all arches though there's
> > no e820 in arm(64)?
> 
> > static void __init do_add_efi_memmap(void)
> > {
> > 
> > [snip]
> >                 switch (md->type) {
> >                 case EFI_LOADER_CODE:
> >                 case EFI_LOADER_DATA:
> >                 case EFI_BOOT_SERVICES_CODE:
> >                 case EFI_BOOT_SERVICES_DATA:
> >                 case EFI_CONVENTIONAL_MEMORY:
> >                         if (md->attribute & EFI_MEMORY_WB)
> >                                 e820_type = E820_RAM;
> 
> In this case reserve_regions() will add the memory to memblock.memory because it
> has the WB attribute, and not reserve it.
> 
> 
> >                         else
> >                                 e820_type = E820_RESERVED;
> 
> Without the WB attribute, these regions are in neither memblock.memory nor
> memblock.nomap.
> 
> 
> >                         break;
> > [snip]
> >                 default:
> >                         /*
> >                          * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
> >                          * EFI_RUNTIME_SERVICES_DATA
> >                          * EFI_MEMORY_MAPPED_IO
> >                          * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
> >                          */
> >                         e820_type = E820_RESERVED;
> >                         break;
> 
> If any other regions has the WB attribute, it will be added to memblock.memory
> and memblock.nomap. If it doesn't, it will appear in neither.
> 
> 
> >                 }
> > [snip]
> > }
> 
> Does this help at all?
> 

Yes, thanks a lot!
Dave

> 
> Thanks,
> 
> James
diff mbox

Patch

===8<===
From 740563e4a437f0d6ecf6e421c91433f9b8f19041 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date: Fri, 19 Aug 2016 09:57:52 +0900
Subject: [PATCH] arm64: mark reserved memblock regions explicitly

---
 arch/arm64/kernel/setup.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 38eda13..38589b5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -205,10 +205,15 @@  static void __init request_standard_resources(void)
 
 	for_each_memblock(memory, region) {
 		res = alloc_bootmem_low(sizeof(*res));
-		res->name  = "System RAM";
+		if (memblock_is_nomap(region)) {
+			res->name  = "reserved";
+			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+		} else {
+			res->name  = "System RAM";
+			res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+		}
 		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
 		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
-		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
 		request_resource(&iomem_resource, res);