diff mbox series

[XEN,RFC,18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI

Message ID 20210811102423.28908-19-wei.chen@arm.com (mailing list archive)
State New, archived
Headers show
Series Add device tree based NUMA support to Arm64 | expand

Commit Message

Wei Chen Aug. 11, 2021, 10:24 a.m. UTC
EFI can get memory map from EFI system table. But EFI system
table doesn't contain memory NUMA information, EFI depends on
ACPI SRAT or device tree memory node to parse memory blocks'
NUMA mapping.

But in current code, when Xen is booting from EFI, it will
delete all memory nodes in device tree. So in UEFI + DTB
boot, we don't have numa-node-id for memory blocks any more.

So in this patch, we will keep memory nodes in device tree for
NUMA code to parse memory numa-node-id later.

As a side effect, if we still parse boot memory information in
early_scan_node, bootmem.info will calculate memory ranges in
memory nodes twice. So we have to prvent early_scan_node to
parse memory nodes in EFI boot.

As EFI APIs only can be used in Arm64, so we introduced a wrapper
in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/bootfdt.c      |  8 +++++++-
 xen/arch/arm/efi/efi-boot.h | 25 -------------------------
 xen/include/asm-arm/setup.h |  6 ++++++
 3 files changed, 13 insertions(+), 26 deletions(-)

Comments

Julien Grall Aug. 19, 2021, 5:35 p.m. UTC | #1
Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> EFI can get memory map from EFI system table. But EFI system
> table doesn't contain memory NUMA information, EFI depends on
> ACPI SRAT or device tree memory node to parse memory blocks'
> NUMA mapping.
> 
> But in current code, when Xen is booting from EFI, it will
> delete all memory nodes in device tree. So in UEFI + DTB
> boot, we don't have numa-node-id for memory blocks any more.
> 
> So in this patch, we will keep memory nodes in device tree for
> NUMA code to parse memory numa-node-id later.
> 
> As a side effect, if we still parse boot memory information in
> early_scan_node, bootmem.info will calculate memory ranges in
> memory nodes twice. So we have to prvent early_scan_node to

s/prvent/prevent/

> parse memory nodes in EFI boot.
> 
> As EFI APIs only can be used in Arm64, so we introduced a wrapper
> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/bootfdt.c      |  8 +++++++-
>   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>   xen/include/asm-arm/setup.h |  6 ++++++
>   3 files changed, 13 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index 476e32e0f5..7df149dbca 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -11,6 +11,7 @@
>   #include <xen/lib.h>
>   #include <xen/kernel.h>
>   #include <xen/init.h>
> +#include <xen/efi.h>
>   #include <xen/device_tree.h>
>   #include <xen/libfdt/libfdt.h>
>   #include <xen/sort.h>
> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>   {
>       int rc = 0;
>   
> -    if ( device_tree_node_matches(fdt, node, "memory") )
> +    /*
> +     * If system boot from EFI, bootinfo.mem has been set by EFI,

"If the system boot". Although, I would suggest to write:

"If Xen has been booted via UEFI, the memory banks will already be 
populated. So we should skip the parsing."

> +     * so we don't need to parse memory node from DTB.
> +     */
> +    if ( device_tree_node_matches(fdt, node, "memory") &&
> +         !arch_efi_enabled(EFI_BOOT) )

arch_efi_enabled() is going to be less expensive than 
device_tree_node_matches(). So I would suggest to re-order the operands.

>           rc = process_memory_node(fdt, node, name, depth,
>                                    address_cells, size_cells, &bootinfo.mem);
>       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
> index cf9c37153f..d0a9987fa4 100644
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -197,33 +197,8 @@ EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
>       int status;
>       u32 fdt_val32;
>       u64 fdt_val64;
> -    int prev;
>       int num_rsv;
>   
> -    /*
> -     * Delete any memory nodes present.  The EFI memory map is the only
> -     * memory description provided to Xen.
> -     */
> -    prev = 0;
> -    for (;;)
> -    {
> -        const char *type;
> -        int len;
> -
> -        node = fdt_next_node(fdt, prev, NULL);
> -        if ( node < 0 )
> -            break;
> -
> -        type = fdt_getprop(fdt, node, "device_type", &len);
> -        if ( type && strncmp(type, "memory", len) == 0 )
> -        {
> -            fdt_del_node(fdt, node);
> -            continue;
> -        }
> -
> -        prev = node;
> -    }
> -
>      /*
>       * Delete all memory reserve map entries. When booting via UEFI,
>       * kernel will use the UEFI memory map to find reserved regions.
> diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
> index c4b6af6029..e4fb5f0d49 100644
> --- a/xen/include/asm-arm/setup.h
> +++ b/xen/include/asm-arm/setup.h
> @@ -123,6 +123,12 @@ void device_tree_get_reg(const __be32 **cell, u32 address_cells,
>   u32 device_tree_get_u32(const void *fdt, int node,
>                           const char *prop_name, u32 dflt);
>   
> +#if defined(CONFIG_ARM_64)
> +#define arch_efi_enabled(x) efi_enabled(x)
> +#else
> +#define arch_efi_enabled(x) (0)
> +#endif

I would prefer if we introduce CONFIG_EFI that would stub efi_enabled 
for architecture not supporting EFI.

Cheers,
Wei Chen Aug. 20, 2021, 2:18 a.m. UTC | #2
Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 1:35
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for
> NUMA when boot from EFI
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > EFI can get memory map from EFI system table. But EFI system
> > table doesn't contain memory NUMA information, EFI depends on
> > ACPI SRAT or device tree memory node to parse memory blocks'
> > NUMA mapping.
> >
> > But in current code, when Xen is booting from EFI, it will
> > delete all memory nodes in device tree. So in UEFI + DTB
> > boot, we don't have numa-node-id for memory blocks any more.
> >
> > So in this patch, we will keep memory nodes in device tree for
> > NUMA code to parse memory numa-node-id later.
> >
> > As a side effect, if we still parse boot memory information in
> > early_scan_node, bootmem.info will calculate memory ranges in
> > memory nodes twice. So we have to prvent early_scan_node to
> 
> s/prvent/prevent/
> 

Ok

> > parse memory nodes in EFI boot.
> >
> > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/bootfdt.c      |  8 +++++++-
> >   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> >   xen/include/asm-arm/setup.h |  6 ++++++
> >   3 files changed, 13 insertions(+), 26 deletions(-)
> >
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 476e32e0f5..7df149dbca 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -11,6 +11,7 @@
> >   #include <xen/lib.h>
> >   #include <xen/kernel.h>
> >   #include <xen/init.h>
> > +#include <xen/efi.h>
> >   #include <xen/device_tree.h>
> >   #include <xen/libfdt/libfdt.h>
> >   #include <xen/sort.h>
> > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> >   {
> >       int rc = 0;
> >
> > -    if ( device_tree_node_matches(fdt, node, "memory") )
> > +    /*
> > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> 
> "If the system boot". Although, I would suggest to write:
> 
> "If Xen has been booted via UEFI, the memory banks will already be
> populated. So we should skip the parsing."
> 

Yes, that would be better. I will change it in next version.

> > +     * so we don't need to parse memory node from DTB.
> > +     */
> > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > +         !arch_efi_enabled(EFI_BOOT) )
> 
> arch_efi_enabled() is going to be less expensive than
> device_tree_node_matches(). So I would suggest to re-order the operands.
> 

yes.

> >           rc = process_memory_node(fdt, node, name, depth,
> >                                    address_cells, size_cells,
> &bootinfo.mem);
> >       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> > diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
> > index cf9c37153f..d0a9987fa4 100644
> > --- a/xen/arch/arm/efi/efi-boot.h
> > +++ b/xen/arch/arm/efi/efi-boot.h
> > @@ -197,33 +197,8 @@ EFI_STATUS __init
> fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
> >       int status;
> >       u32 fdt_val32;
> >       u64 fdt_val64;
> > -    int prev;
> >       int num_rsv;
> >
> > -    /*
> > -     * Delete any memory nodes present.  The EFI memory map is the only
> > -     * memory description provided to Xen.
> > -     */
> > -    prev = 0;
> > -    for (;;)
> > -    {
> > -        const char *type;
> > -        int len;
> > -
> > -        node = fdt_next_node(fdt, prev, NULL);
> > -        if ( node < 0 )
> > -            break;
> > -
> > -        type = fdt_getprop(fdt, node, "device_type", &len);
> > -        if ( type && strncmp(type, "memory", len) == 0 )
> > -        {
> > -            fdt_del_node(fdt, node);
> > -            continue;
> > -        }
> > -
> > -        prev = node;
> > -    }
> > -
> >      /*
> >       * Delete all memory reserve map entries. When booting via UEFI,
> >       * kernel will use the UEFI memory map to find reserved regions.
> > diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
> > index c4b6af6029..e4fb5f0d49 100644
> > --- a/xen/include/asm-arm/setup.h
> > +++ b/xen/include/asm-arm/setup.h
> > @@ -123,6 +123,12 @@ void device_tree_get_reg(const __be32 **cell, u32
> address_cells,
> >   u32 device_tree_get_u32(const void *fdt, int node,
> >                           const char *prop_name, u32 dflt);
> >
> > +#if defined(CONFIG_ARM_64)
> > +#define arch_efi_enabled(x) efi_enabled(x)
> > +#else
> > +#define arch_efi_enabled(x) (0)
> > +#endif
> 
> I would prefer if we introduce CONFIG_EFI that would stub efi_enabled
> for architecture not supporting EFI.
> 

Yes, that's a good idea.
I also feel a little awkward with the current arch helpers.

> Cheers,
> 
> --
> Julien Grall
Stefano Stabellini Aug. 26, 2021, 11:24 p.m. UTC | #3
On Wed, 11 Aug 2021, Wei Chen wrote:
> EFI can get memory map from EFI system table. But EFI system
> table doesn't contain memory NUMA information, EFI depends on
> ACPI SRAT or device tree memory node to parse memory blocks'
> NUMA mapping.
> 
> But in current code, when Xen is booting from EFI, it will
> delete all memory nodes in device tree. So in UEFI + DTB
> boot, we don't have numa-node-id for memory blocks any more.
> 
> So in this patch, we will keep memory nodes in device tree for
> NUMA code to parse memory numa-node-id later.
> 
> As a side effect, if we still parse boot memory information in
> early_scan_node, bootmem.info will calculate memory ranges in
> memory nodes twice. So we have to prvent early_scan_node to
> parse memory nodes in EFI boot.
> 
> As EFI APIs only can be used in Arm64, so we introduced a wrapper
> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/bootfdt.c      |  8 +++++++-
>  xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>  xen/include/asm-arm/setup.h |  6 ++++++
>  3 files changed, 13 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index 476e32e0f5..7df149dbca 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -11,6 +11,7 @@
>  #include <xen/lib.h>
>  #include <xen/kernel.h>
>  #include <xen/init.h>
> +#include <xen/efi.h>
>  #include <xen/device_tree.h>
>  #include <xen/libfdt/libfdt.h>
>  #include <xen/sort.h>
> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>  {
>      int rc = 0;
>  
> -    if ( device_tree_node_matches(fdt, node, "memory") )
> +    /*
> +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> +     * so we don't need to parse memory node from DTB.
> +     */
> +    if ( device_tree_node_matches(fdt, node, "memory") &&
> +         !arch_efi_enabled(EFI_BOOT) )
>          rc = process_memory_node(fdt, node, name, depth,
>                                   address_cells, size_cells, &bootinfo.mem);
>      else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )


If we are going to use the device tree info for the numa nodes (and
related memory) does it make sense to still rely on the EFI tables for
the memory map?

I wonder if we should just use device tree for memory and ignore EFI
instead. Do you know what Linux does in this regard?
Julien Grall Aug. 27, 2021, 7:41 a.m. UTC | #4
Hi Stefano,

On 27/08/2021 00:24, Stefano Stabellini wrote:
> On Wed, 11 Aug 2021, Wei Chen wrote:
>> EFI can get memory map from EFI system table. But EFI system
>> table doesn't contain memory NUMA information, EFI depends on
>> ACPI SRAT or device tree memory node to parse memory blocks'
>> NUMA mapping.
>>
>> But in current code, when Xen is booting from EFI, it will
>> delete all memory nodes in device tree. So in UEFI + DTB
>> boot, we don't have numa-node-id for memory blocks any more.
>>
>> So in this patch, we will keep memory nodes in device tree for
>> NUMA code to parse memory numa-node-id later.
>>
>> As a side effect, if we still parse boot memory information in
>> early_scan_node, bootmem.info will calculate memory ranges in
>> memory nodes twice. So we have to prvent early_scan_node to
>> parse memory nodes in EFI boot.
>>
>> As EFI APIs only can be used in Arm64, so we introduced a wrapper
>> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
>>
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> ---
>>   xen/arch/arm/bootfdt.c      |  8 +++++++-
>>   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>>   xen/include/asm-arm/setup.h |  6 ++++++
>>   3 files changed, 13 insertions(+), 26 deletions(-)
>>
>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>> index 476e32e0f5..7df149dbca 100644
>> --- a/xen/arch/arm/bootfdt.c
>> +++ b/xen/arch/arm/bootfdt.c
>> @@ -11,6 +11,7 @@
>>   #include <xen/lib.h>
>>   #include <xen/kernel.h>
>>   #include <xen/init.h>
>> +#include <xen/efi.h>
>>   #include <xen/device_tree.h>
>>   #include <xen/libfdt/libfdt.h>
>>   #include <xen/sort.h>
>> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>>   {
>>       int rc = 0;
>>   
>> -    if ( device_tree_node_matches(fdt, node, "memory") )
>> +    /*
>> +     * If system boot from EFI, bootinfo.mem has been set by EFI,
>> +     * so we don't need to parse memory node from DTB.
>> +     */
>> +    if ( device_tree_node_matches(fdt, node, "memory") &&
>> +         !arch_efi_enabled(EFI_BOOT) )
>>           rc = process_memory_node(fdt, node, name, depth,
>>                                    address_cells, size_cells, &bootinfo.mem);
>>       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> 
> 
> If we are going to use the device tree info for the numa nodes (and
> related memory) does it make sense to still rely on the EFI tables for
> the memory map?

Yes. AFAIK, when booting using EFI, the Device-Tree may not contain all 
the reserved regions. Furthermore, we are still too early to know 
whether we boot using ACPI and DT.

> 
> I wonder if we should just use device tree for memory and ignore EFI
> instead. Do you know what Linux does in this regard?
I looked at Linux when I first reviewed this patch because I was 
wondering what happens if the DT and UEFI map disagrees.

Linux and Xen are the same after this patch:
   1) The memory map is coming from UEFI map
   2) NUMA ID is coming from the DT

The commit that introduced the change in Linux is:

commit 500899c2cc3e3f06140373b587a69d30650f2d9d
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Apr 8 15:50:23 2016 -0700

     efi: ARM/arm64: ignore DT memory nodes instead of removing them

     There are two problems with the UEFI stub DT memory node removal
     routine:
     - it deletes nodes as it traverses the tree, which happens to work
       but is not supported, as deletion invalidates the node iterator;
     - deleting memory nodes entirely may discard annotations in the form
       of additional properties on the nodes.

     Since the discovery of DT memory nodes occurs strictly before the
     UEFI init sequence, we can simply clear the memblock memory table
     before parsing the UEFI memory map. This way, it is no longer
     necessary to remove the nodes, so we can remove that logic from the
     stub as well.

     Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
     Acked-by: Steve Capper <steve.capper@arm.com>
     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
     Signed-off-by: David Daney <david.daney@cavium.com>
     Signed-off-by: Will Deacon <will.deacon@arm.com>

Cheers,
Wei Chen Aug. 27, 2021, 9:23 a.m. UTC | #5
Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月27日 7:25
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for
> NUMA when boot from EFI
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > EFI can get memory map from EFI system table. But EFI system
> > table doesn't contain memory NUMA information, EFI depends on
> > ACPI SRAT or device tree memory node to parse memory blocks'
> > NUMA mapping.
> >
> > But in current code, when Xen is booting from EFI, it will
> > delete all memory nodes in device tree. So in UEFI + DTB
> > boot, we don't have numa-node-id for memory blocks any more.
> >
> > So in this patch, we will keep memory nodes in device tree for
> > NUMA code to parse memory numa-node-id later.
> >
> > As a side effect, if we still parse boot memory information in
> > early_scan_node, bootmem.info will calculate memory ranges in
> > memory nodes twice. So we have to prvent early_scan_node to
> > parse memory nodes in EFI boot.
> >
> > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/bootfdt.c      |  8 +++++++-
> >  xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> >  xen/include/asm-arm/setup.h |  6 ++++++
> >  3 files changed, 13 insertions(+), 26 deletions(-)
> >
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 476e32e0f5..7df149dbca 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -11,6 +11,7 @@
> >  #include <xen/lib.h>
> >  #include <xen/kernel.h>
> >  #include <xen/init.h>
> > +#include <xen/efi.h>
> >  #include <xen/device_tree.h>
> >  #include <xen/libfdt/libfdt.h>
> >  #include <xen/sort.h>
> > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> >  {
> >      int rc = 0;
> >
> > -    if ( device_tree_node_matches(fdt, node, "memory") )
> > +    /*
> > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> > +     * so we don't need to parse memory node from DTB.
> > +     */
> > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > +         !arch_efi_enabled(EFI_BOOT) )
> >          rc = process_memory_node(fdt, node, name, depth,
> >                                   address_cells, size_cells,
> &bootinfo.mem);
> >      else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> 
> 
> If we are going to use the device tree info for the numa nodes (and
> related memory) does it make sense to still rely on the EFI tables for
> the memory map?
> 
> I wonder if we should just use device tree for memory and ignore EFI
> instead. Do you know what Linux does in this regard?

We don't use device tree for memory map on EFI boot. We just reply on
device tree to provide memory NUMA node info. Because EFI system table
doesn't contain this kind of data.

I have a quick look into Linux. Linux efi stub has a update_fdt function in:
drivers/firmware/efi/libstub/fdt.c. In this function, efi stub only delete
reserve memory nodes. Usable memory nodes will not be touched in this function.

Before Linux efi_init, early_init_dt_scan will be invoked. This means, if the
efi stub doesn't remove the normal memory nodes, these nodes will be found and
added to memblock by early_init_dt_add_memory_arch();

Later, in efi_init, sytemtable->memory_description will also be parsed,
the memory block will be added to memblock by early_init_dt_add_memory_arch.
So the duplicated memory nodes will be merged/ignored in memblock_add.

In Linux NUMA init, if ACPI is off, the NUMA info will be parsed from DT.

So I think in EFI boot, we use system table to parse memory, but use
DT to parse NUMA info. It doesn't seem particularly strange : )

Cheers,
Wei Chen
Stefano Stabellini Aug. 27, 2021, 11:10 p.m. UTC | #6
On Fri, 27 Aug 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 27/08/2021 00:24, Stefano Stabellini wrote:
> > On Wed, 11 Aug 2021, Wei Chen wrote:
> > > EFI can get memory map from EFI system table. But EFI system
> > > table doesn't contain memory NUMA information, EFI depends on
> > > ACPI SRAT or device tree memory node to parse memory blocks'
> > > NUMA mapping.
> > > 
> > > But in current code, when Xen is booting from EFI, it will
> > > delete all memory nodes in device tree. So in UEFI + DTB
> > > boot, we don't have numa-node-id for memory blocks any more.
> > > 
> > > So in this patch, we will keep memory nodes in device tree for
> > > NUMA code to parse memory numa-node-id later.
> > > 
> > > As a side effect, if we still parse boot memory information in
> > > early_scan_node, bootmem.info will calculate memory ranges in
> > > memory nodes twice. So we have to prvent early_scan_node to
> > > parse memory nodes in EFI boot.
> > > 
> > > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> > > 
> > > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > > ---
> > >   xen/arch/arm/bootfdt.c      |  8 +++++++-
> > >   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> > >   xen/include/asm-arm/setup.h |  6 ++++++
> > >   3 files changed, 13 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > > index 476e32e0f5..7df149dbca 100644
> > > --- a/xen/arch/arm/bootfdt.c
> > > +++ b/xen/arch/arm/bootfdt.c
> > > @@ -11,6 +11,7 @@
> > >   #include <xen/lib.h>
> > >   #include <xen/kernel.h>
> > >   #include <xen/init.h>
> > > +#include <xen/efi.h>
> > >   #include <xen/device_tree.h>
> > >   #include <xen/libfdt/libfdt.h>
> > >   #include <xen/sort.h>
> > > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> > >   {
> > >       int rc = 0;
> > >   -    if ( device_tree_node_matches(fdt, node, "memory") )
> > > +    /*
> > > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> > > +     * so we don't need to parse memory node from DTB.
> > > +     */
> > > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > > +         !arch_efi_enabled(EFI_BOOT) )
> > >           rc = process_memory_node(fdt, node, name, depth,
> > >                                    address_cells, size_cells,
> > > &bootinfo.mem);
> > >       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> > 
> > 
> > If we are going to use the device tree info for the numa nodes (and
> > related memory) does it make sense to still rely on the EFI tables for
> > the memory map?
> 
> Yes. AFAIK, when booting using EFI, the Device-Tree may not contain all the
> reserved regions. Furthermore, we are still too early to know whether we boot
> using ACPI and DT.
> 
> > 
> > I wonder if we should just use device tree for memory and ignore EFI
> > instead. Do you know what Linux does in this regard?
> I looked at Linux when I first reviewed this patch because I was wondering
> what happens if the DT and UEFI map disagrees.
> 
> Linux and Xen are the same after this patch:
>   1) The memory map is coming from UEFI map
>   2) NUMA ID is coming from the DT
> 
> The commit that introduced the change in Linux is:
[...]

Thanks both you and Wei for the investigation :-)
diff mbox series

Patch

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 476e32e0f5..7df149dbca 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -11,6 +11,7 @@ 
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
+#include <xen/efi.h>
 #include <xen/device_tree.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/sort.h>
@@ -335,7 +336,12 @@  static int __init early_scan_node(const void *fdt,
 {
     int rc = 0;
 
-    if ( device_tree_node_matches(fdt, node, "memory") )
+    /*
+     * If system boot from EFI, bootinfo.mem has been set by EFI,
+     * so we don't need to parse memory node from DTB.
+     */
+    if ( device_tree_node_matches(fdt, node, "memory") &&
+         !arch_efi_enabled(EFI_BOOT) )
         rc = process_memory_node(fdt, node, name, depth,
                                  address_cells, size_cells, &bootinfo.mem);
     else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index cf9c37153f..d0a9987fa4 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -197,33 +197,8 @@  EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
     int status;
     u32 fdt_val32;
     u64 fdt_val64;
-    int prev;
     int num_rsv;
 
-    /*
-     * Delete any memory nodes present.  The EFI memory map is the only
-     * memory description provided to Xen.
-     */
-    prev = 0;
-    for (;;)
-    {
-        const char *type;
-        int len;
-
-        node = fdt_next_node(fdt, prev, NULL);
-        if ( node < 0 )
-            break;
-
-        type = fdt_getprop(fdt, node, "device_type", &len);
-        if ( type && strncmp(type, "memory", len) == 0 )
-        {
-            fdt_del_node(fdt, node);
-            continue;
-        }
-
-        prev = node;
-    }
-
    /*
     * Delete all memory reserve map entries. When booting via UEFI,
     * kernel will use the UEFI memory map to find reserved regions.
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index c4b6af6029..e4fb5f0d49 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -123,6 +123,12 @@  void device_tree_get_reg(const __be32 **cell, u32 address_cells,
 u32 device_tree_get_u32(const void *fdt, int node,
                         const char *prop_name, u32 dflt);
 
+#if defined(CONFIG_ARM_64)
+#define arch_efi_enabled(x) efi_enabled(x)
+#else
+#define arch_efi_enabled(x) (0)
+#endif
+
 #endif
 /*
  * Local variables: