diff mbox series

[v3] memblock,arm64: Expand the static memblock memory table

Message ID 20220527091832.63489-1-zhouguanghui1@huawei.com (mailing list archive)
State New
Headers show
Series [v3] memblock,arm64: Expand the static memblock memory table | expand

Commit Message

Zhou Guanghui May 27, 2022, 9:18 a.m. UTC
In a system using HBM, a multi-bit ECC error occurs, and the BIOS
will mark the corresponding area (for example, 2 MB) as unusable.
When the system restarts next time, these areas are not reported
or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
larger number of memblocks.

For example, if the EFI_UNUSABLE_MEMORY type is reported:
...
memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
...

The EFI memory map is parsed to construct the memblock arrays before
the memblock arrays can be resized. As the result, memory regions
beyond INIT_MEMBLOCK_REGIONS are lost.

Allow overriding memblock.memory array size with architecture defined
INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.

Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/memory.h |  9 +++++++++
 mm/memblock.c                   | 14 +++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

Comments

Darren Hart June 6, 2022, 9:30 p.m. UTC | #1
On Fri, May 27, 2022 at 09:18:32AM +0000, Zhou Guanghui wrote:
> In a system using HBM, a multi-bit ECC error occurs, and the BIOS
> will mark the corresponding area (for example, 2 MB) as unusable.
> When the system restarts next time, these areas are not reported
> or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
> in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
> larger number of memblocks.
> 
> For example, if the EFI_UNUSABLE_MEMORY type is reported:
> ...
> memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
> memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
> memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
> memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
> memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
> memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
> memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
> memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
> ...
> 
> The EFI memory map is parsed to construct the memblock arrays before
> the memblock arrays can be resized. As the result, memory regions
> beyond INIT_MEMBLOCK_REGIONS are lost.
> 
> Allow overriding memblock.memory array size with architecture defined
> INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
> INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
> 
> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> Acked-by: Mike Rapoport <rppt@linux.ibm.com>

Tested on Ampere Altra, resolving this issue as observed internally. I didn't
perform the test myself, but representing it here as a point of contact:

Tested-by: Darren Hart <darren@os.amperecomputing.com>
Anshuman Khandual June 7, 2022, 6:42 a.m. UTC | #2
Hello Zhou,

On 5/27/22 14:48, Zhou Guanghui wrote:
> In a system using HBM, a multi-bit ECC error occurs, and the BIOS
> will mark the corresponding area (for example, 2 MB) as unusable.
> When the system restarts next time, these areas are not reported
> or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
> in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
> larger number of memblocks.
> 
> For example, if the EFI_UNUSABLE_MEMORY type is reported:
> ...
> memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
> memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
> memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
> memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
> memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
> memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
> memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
> memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
> memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
> memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
> ...

Although this patch did not mention about a real world system requiring
this support, as been reported on the thread, Ampere Altra does seem to
get benefited. Regardless, it's always better to describe platform test
scenarios in more detail.

> 
> The EFI memory map is parsed to construct the memblock arrays before
> the memblock arrays can be resized. As the result, memory regions
> beyond INIT_MEMBLOCK_REGIONS are lost.
> 
> Allow overriding memblock.memory array size with architecture defined
> INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
> INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.

Right, but first this needs to mention that INIT_MEMBLOCK_MEMORY_REGIONS
(new macro) is being added to replace INIT_MEMBLOCK_REGIONS, representing
max memory regions in the memblock. Platform override comes afterwards.

> 
> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm64/include/asm/memory.h |  9 +++++++++
>  mm/memblock.c                   | 14 +++++++++-----
>  2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0af70d9abede..eda61c0389c4 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -364,6 +364,15 @@ void dump_mem_limit(void);
>  # define INIT_MEMBLOCK_RESERVED_REGIONS	(INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>  #endif
>  
> +/*
> + * memory regions which marked with flag MEMBLOCK_NOMAP may divide a continuous
> + * memory block into multiple parts. As a result, the number of memory regions
> + * is large.
> + */

As mentioned in the previous version's thread,

This comment needs be more specific about this increased static array size, being
applicable ONLY for MEMBLOCK_NOMAP regions on EFI system with EFI_UNUSABLE_MEMORY
tagging/flag support.

> +#ifdef CONFIG_EFI
> +#define INIT_MEMBLOCK_MEMORY_REGIONS	1024

Although 1024 seems adequate as compared to 128 memory regions in the memblock to
handle such error scenarios, but a co-relation with INIT_MEMBLOCK_REGIONS would
be preferred similar to when INIT_MEMBLOCK_RESERVED_REGIONS gets overridden. This
avoid a precedence when random numbers could get assigned in other archs later on.

$git grep INIT_MEMBLOCK_RESERVED_REGIONS arch/
arch/arm64/include/asm/memory.h:# define INIT_MEMBLOCK_RESERVED_REGIONS (INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
arch/loongarch/include/asm/sparsemem.h:#define INIT_MEMBLOCK_RESERVED_REGIONS   (INIT_MEMBLOCK_REGIONS + NR_CPUS)

Something like

#define INIT_MEMBLOCK_MEMORY_REGIONS	(INIT_MEMBLOCK_REGIONS * 8)

> +#endif
> +
>  #include <asm-generic/memory_model.h>
>  
>  #endif /* __ASM_MEMORY_H */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index e4f03a6e8e56..7c63571a69d7 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -29,6 +29,10 @@
>  # define INIT_MEMBLOCK_RESERVED_REGIONS		INIT_MEMBLOCK_REGIONS
>  #endif
>  
> +#ifndef INIT_MEMBLOCK_MEMORY_REGIONS
> +#define INIT_MEMBLOCK_MEMORY_REGIONS		INIT_MEMBLOCK_REGIONS
> +#endif
> +
>  /**
>   * DOC: memblock overview
>   *
> @@ -55,9 +59,9 @@
>   * the allocator metadata. The "memory" and "reserved" types are nicely
>   * wrapped with struct memblock. This structure is statically
>   * initialized at build time. The region arrays are initially sized to
> - * %INIT_MEMBLOCK_REGIONS for "memory" and %INIT_MEMBLOCK_RESERVED_REGIONS
> - * for "reserved". The region array for "physmem" is initially sized to
> - * %INIT_PHYSMEM_REGIONS.
> + * %INIT_MEMBLOCK_MEMORY_REGIONS for "memory" and
> + * %INIT_MEMBLOCK_RESERVED_REGIONS for "reserved". The region array
> + * for "physmem" is initially sized to %INIT_PHYSMEM_REGIONS.
>   * The memblock_allow_resize() enables automatic resizing of the region
>   * arrays during addition of new regions. This feature should be used
>   * with care so that memory allocated for the region array will not
> @@ -102,7 +106,7 @@ unsigned long min_low_pfn;
>  unsigned long max_pfn;
>  unsigned long long max_possible_pfn;
>  
> -static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
> +static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata_memblock;
>  static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock;
>  #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
>  static struct memblock_region memblock_physmem_init_regions[INIT_PHYSMEM_REGIONS];
> @@ -111,7 +115,7 @@ static struct memblock_region memblock_physmem_init_regions[INIT_PHYSMEM_REGIONS
>  struct memblock memblock __initdata_memblock = {
>  	.memory.regions		= memblock_memory_init_regions,
>  	.memory.cnt		= 1,	/* empty dummy entry */
> -	.memory.max		= INIT_MEMBLOCK_REGIONS,
> +	.memory.max		= INIT_MEMBLOCK_MEMORY_REGIONS,
>  	.memory.name		= "memory",
>  
>  	.reserved.regions	= memblock_reserved_init_regions,

- Anshuman
Zhou Guanghui June 13, 2022, 6:03 a.m. UTC | #3
在 2022/6/7 14:43, Anshuman Khandual 写道:
> Hello Zhou,
> 
> On 5/27/22 14:48, Zhou Guanghui wrote:
>> In a system using HBM, a multi-bit ECC error occurs, and the BIOS
>> will mark the corresponding area (for example, 2 MB) as unusable.
>> When the system restarts next time, these areas are not reported
>> or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
>> in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
>> larger number of memblocks.
>>
>> For example, if the EFI_UNUSABLE_MEMORY type is reported:
>> ...
>> memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>> memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>> memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
>> memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>> memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>> memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>> memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
>> memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
>> memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>> memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
>> memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>> memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
>> memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>> memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
>> ...
> 
> Although this patch did not mention about a real world system requiring
> this support, as been reported on the thread, Ampere Altra does seem to
> get benefited. Regardless, it's always better to describe platform test
> scenarios in more detail.
> 

I encountered this scenario on Huawei Ascend ARM64 SoC.

>>
>> The EFI memory map is parsed to construct the memblock arrays before
>> the memblock arrays can be resized. As the result, memory regions
>> beyond INIT_MEMBLOCK_REGIONS are lost.
>>
>> Allow overriding memblock.memory array size with architecture defined
>> INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
>> INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
> 
> Right, but first this needs to mention that INIT_MEMBLOCK_MEMORY_REGIONS
> (new macro) is being added to replace INIT_MEMBLOCK_REGIONS, representing
> max memory regions in the memblock. Platform override comes afterwards.
> 

Add a paragraph before the description,like this?

Add a new macro INIT_MEMBLOCK_MEMORY_REGTIONS to replace 
INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory 
array.

>>
>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
>> ---
>>   arch/arm64/include/asm/memory.h |  9 +++++++++
>>   mm/memblock.c                   | 14 +++++++++-----
>>   2 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index 0af70d9abede..eda61c0389c4 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -364,6 +364,15 @@ void dump_mem_limit(void);
>>   # define INIT_MEMBLOCK_RESERVED_REGIONS	(INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>>   #endif
>>   
>> +/*
>> + * memory regions which marked with flag MEMBLOCK_NOMAP may divide a continuous
>> + * memory block into multiple parts. As a result, the number of memory regions
>> + * is large.
>> + */
> 
> As mentioned in the previous version's thread,
> 
> This comment needs be more specific about this increased static array size, being
> applicable ONLY for MEMBLOCK_NOMAP regions on EFI system with EFI_UNUSABLE_MEMORY
> tagging/flag support.
> 

EFI_UNUSABLE_MEMORY is only one type of the MEMBLOCK_NOMAP region, as 
shown in the is_usable_memory function. However, However, I currently 
have too many memblocks due to this flag.

>> +#ifdef CONFIG_EFI
>> +#define INIT_MEMBLOCK_MEMORY_REGIONS	1024
> 
> Although 1024 seems adequate as compared to 128 memory regions in the memblock to
> handle such error scenarios, but a co-relation with INIT_MEMBLOCK_REGIONS would
> be preferred similar to when INIT_MEMBLOCK_RESERVED_REGIONS gets overridden. This
> avoid a precedence when random numbers could get assigned in other archs later on.
> 
> $git grep INIT_MEMBLOCK_RESERVED_REGIONS arch/
> arch/arm64/include/asm/memory.h:# define INIT_MEMBLOCK_RESERVED_REGIONS (INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
> arch/loongarch/include/asm/sparsemem.h:#define INIT_MEMBLOCK_RESERVED_REGIONS   (INIT_MEMBLOCK_REGIONS + NR_CPUS)
> 
> Something like
> 
> #define INIT_MEMBLOCK_MEMORY_REGIONS	(INIT_MEMBLOCK_REGIONS * 8)
> 

I don't think this is necessary because INIT_MEMBLOCK_REGIONS is not 
configurable. The newly added INIT_MEMBLOCK_MEMORY_REGIONS macro is 
customized for each platform.

> 
> - Anshuman
> 

Thanks!
Anshuman Khandual June 13, 2022, 6:38 a.m. UTC | #4
On 6/13/22 11:33, Zhouguanghui wrote:
> 在 2022/6/7 14:43, Anshuman Khandual 写道:
>> Hello Zhou,
>>
>> On 5/27/22 14:48, Zhou Guanghui wrote:
>>> In a system using HBM, a multi-bit ECC error occurs, and the BIOS
>>> will mark the corresponding area (for example, 2 MB) as unusable.
>>> When the system restarts next time, these areas are not reported
>>> or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
>>> in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
>>> larger number of memblocks.
>>>
>>> For example, if the EFI_UNUSABLE_MEMORY type is reported:
>>> ...
>>> memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>>> memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>> memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
>>> memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>> memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>>> memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>> memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
>>> memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
>>> memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>> memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
>>> memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>> memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
>>> memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>> memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
>>> ...
>>
>> Although this patch did not mention about a real world system requiring
>> this support, as been reported on the thread, Ampere Altra does seem to
>> get benefited. Regardless, it's always better to describe platform test
>> scenarios in more detail.
>>
> 
> I encountered this scenario on Huawei Ascend ARM64 SoC.

Please do mention that in the commit message.

> 
>>>
>>> The EFI memory map is parsed to construct the memblock arrays before
>>> the memblock arrays can be resized. As the result, memory regions
>>> beyond INIT_MEMBLOCK_REGIONS are lost.
>>>
>>> Allow overriding memblock.memory array size with architecture defined
>>> INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
>>> INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
>>
>> Right, but first this needs to mention that INIT_MEMBLOCK_MEMORY_REGIONS
>> (new macro) is being added to replace INIT_MEMBLOCK_REGIONS, representing
>> max memory regions in the memblock. Platform override comes afterwards.
>>
> 
> Add a paragraph before the description,like this?
> 
> Add a new macro INIT_MEMBLOCK_MEMORY_REGTIONS to replace 
> INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory 
> array.

Right.

> 
>>>
>>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>>> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
>>> ---
>>>   arch/arm64/include/asm/memory.h |  9 +++++++++
>>>   mm/memblock.c                   | 14 +++++++++-----
>>>   2 files changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index 0af70d9abede..eda61c0389c4 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -364,6 +364,15 @@ void dump_mem_limit(void);
>>>   # define INIT_MEMBLOCK_RESERVED_REGIONS	(INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>>>   #endif
>>>   
>>> +/*
>>> + * memory regions which marked with flag MEMBLOCK_NOMAP may divide a continuous
>>> + * memory block into multiple parts. As a result, the number of memory regions
>>> + * is large.
>>> + */
>>
>> As mentioned in the previous version's thread,
>>
>> This comment needs be more specific about this increased static array size, being
>> applicable ONLY for MEMBLOCK_NOMAP regions on EFI system with EFI_UNUSABLE_MEMORY
>> tagging/flag support.
>>
> 
> EFI_UNUSABLE_MEMORY is only one type of the MEMBLOCK_NOMAP region, as 
> shown in the is_usable_memory function. However, However, I currently 
> have too many memblocks due to this flag.

Okay, but adding EFI_UNUSABLE_MEMORY context in that comment will be helpful.

> 
>>> +#ifdef CONFIG_EFI
>>> +#define INIT_MEMBLOCK_MEMORY_REGIONS	1024
>>
>> Although 1024 seems adequate as compared to 128 memory regions in the memblock to
>> handle such error scenarios, but a co-relation with INIT_MEMBLOCK_REGIONS would
>> be preferred similar to when INIT_MEMBLOCK_RESERVED_REGIONS gets overridden. This
>> avoid a precedence when random numbers could get assigned in other archs later on.
>>
>> $git grep INIT_MEMBLOCK_RESERVED_REGIONS arch/
>> arch/arm64/include/asm/memory.h:# define INIT_MEMBLOCK_RESERVED_REGIONS (INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>> arch/loongarch/include/asm/sparsemem.h:#define INIT_MEMBLOCK_RESERVED_REGIONS   (INIT_MEMBLOCK_REGIONS + NR_CPUS)
>>
>> Something like
>>
>> #define INIT_MEMBLOCK_MEMORY_REGIONS	(INIT_MEMBLOCK_REGIONS * 8)
>>
> 
> I don't think this is necessary because INIT_MEMBLOCK_REGIONS is not 
> configurable. The newly added INIT_MEMBLOCK_MEMORY_REGIONS macro is 
> customized for each platform.

Even an existing macro INIT_MEMBLOCK_RESERVED_REGIONS still depends on
INIT_MEMBLOCK_REGIONS (arm64, loongarch) ? The point being, although
INIT_MEMBLOCK_REGIONS is not configurable, it still does provide enough
base value, as compared to defining a random number in platforms which
will override INIT_MEMBLOCK_MEMORY_REGIONS. What is your concern in
making it dependent on INIT_MEMBLOCK_REGIONS ?
Zhou Guanghui June 13, 2022, 11:57 a.m. UTC | #5
On 2022/6/13 14:38, Anshuman Khandual wrote:
> 
> 
> On 6/13/22 11:33, Zhouguanghui wrote:
>> 在 2022/6/7 14:43, Anshuman Khandual 写道:
>>> Hello Zhou,
>>>
>>> On 5/27/22 14:48, Zhou Guanghui wrote:
>>>> In a system using HBM, a multi-bit ECC error occurs, and the BIOS
>>>> will mark the corresponding area (for example, 2 MB) as unusable.
>>>> When the system restarts next time, these areas are not reported
>>>> or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase
>>>> in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a
>>>> larger number of memblocks.
>>>>
>>>> For example, if the EFI_UNUSABLE_MEMORY type is reported:
>>>> ...
>>>> memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>>>> memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>>> memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
>>>> memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>>> memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
>>>> memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
>>>> memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
>>>> memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
>>>> memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>>> memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
>>>> memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>>> memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
>>>> memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
>>>> memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
>>>> ...
>>>
>>> Although this patch did not mention about a real world system requiring
>>> this support, as been reported on the thread, Ampere Altra does seem to
>>> get benefited. Regardless, it's always better to describe platform test
>>> scenarios in more detail.
>>>
>>
>> I encountered this scenario on Huawei Ascend ARM64 SoC.
> 
> Please do mention that in the commit message.
> 

I will add this in patch v4.

>>
>>>>
>>>> The EFI memory map is parsed to construct the memblock arrays before
>>>> the memblock arrays can be resized. As the result, memory regions
>>>> beyond INIT_MEMBLOCK_REGIONS are lost.
>>>>
>>>> Allow overriding memblock.memory array size with architecture defined
>>>> INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
>>>> INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
>>>
>>> Right, but first this needs to mention that INIT_MEMBLOCK_MEMORY_REGIONS
>>> (new macro) is being added to replace INIT_MEMBLOCK_REGIONS, representing
>>> max memory regions in the memblock. Platform override comes afterwards.
>>>
>>
>> Add a paragraph before the description,like this?
>>
>> Add a new macro INIT_MEMBLOCK_MEMORY_REGTIONS to replace 
>> INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory 
>> array.
> 
> Right.

I will add this paragraph in patch v4.

> 
>>
>>>>
>>>> Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
>>>> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
>>>> ---
>>>>   arch/arm64/include/asm/memory.h |  9 +++++++++
>>>>   mm/memblock.c                   | 14 +++++++++-----
>>>>   2 files changed, 18 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>>> index 0af70d9abede..eda61c0389c4 100644
>>>> --- a/arch/arm64/include/asm/memory.h
>>>> +++ b/arch/arm64/include/asm/memory.h
>>>> @@ -364,6 +364,15 @@ void dump_mem_limit(void);
>>>>   # define INIT_MEMBLOCK_RESERVED_REGIONS	(INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>>>>   #endif
>>>>   
>>>> +/*
>>>> + * memory regions which marked with flag MEMBLOCK_NOMAP may divide a continuous
>>>> + * memory block into multiple parts. As a result, the number of memory regions
>>>> + * is large.
>>>> + */
>>>
>>> As mentioned in the previous version's thread,
>>>
>>> This comment needs be more specific about this increased static array size, being
>>> applicable ONLY for MEMBLOCK_NOMAP regions on EFI system with EFI_UNUSABLE_MEMORY
>>> tagging/flag support.
>>>
>>
>> EFI_UNUSABLE_MEMORY is only one type of the MEMBLOCK_NOMAP region, as 
>> shown in the is_usable_memory function. However, However, I currently 
>> have too many memblocks due to this flag.
> 
> Okay, but adding EFI_UNUSABLE_MEMORY context in that comment will be helpful.
> 

I'll add it to the comment in patch v4.

>>
>>>> +#ifdef CONFIG_EFI
>>>> +#define INIT_MEMBLOCK_MEMORY_REGIONS	1024
>>>
>>> Although 1024 seems adequate as compared to 128 memory regions in the memblock to
>>> handle such error scenarios, but a co-relation with INIT_MEMBLOCK_REGIONS would
>>> be preferred similar to when INIT_MEMBLOCK_RESERVED_REGIONS gets overridden. This
>>> avoid a precedence when random numbers could get assigned in other archs later on.
>>>
>>> $git grep INIT_MEMBLOCK_RESERVED_REGIONS arch/
>>> arch/arm64/include/asm/memory.h:# define INIT_MEMBLOCK_RESERVED_REGIONS (INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
>>> arch/loongarch/include/asm/sparsemem.h:#define INIT_MEMBLOCK_RESERVED_REGIONS   (INIT_MEMBLOCK_REGIONS + NR_CPUS)
>>>
>>> Something like
>>>
>>> #define INIT_MEMBLOCK_MEMORY_REGIONS	(INIT_MEMBLOCK_REGIONS * 8)
>>>
>>
>> I don't think this is necessary because INIT_MEMBLOCK_REGIONS is not 
>> configurable. The newly added INIT_MEMBLOCK_MEMORY_REGIONS macro is 
>> customized for each platform.
> 
> Even an existing macro INIT_MEMBLOCK_RESERVED_REGIONS still depends on
> INIT_MEMBLOCK_REGIONS (arm64, loongarch) ? The point being, although
> INIT_MEMBLOCK_REGIONS is not configurable, it still does provide enough
> base value, as compared to defining a random number in platforms which
> will override INIT_MEMBLOCK_MEMORY_REGIONS. What is your concern in
> making it dependent on INIT_MEMBLOCK_REGIONS ?
> 

In my opinion, the purpose of adding INIT_MEMBLOCK_MEMORY_REGTIONS is to specify a larger size on different platforms. In the future, the base value of INIT_MEMBLOCK_REGTIONS is adjusted to a larger value (for example, 256). On the arm64 platform, it is not necessary to adjust (256 * 8) INIT_MEMBLOCK_MEMORY_REGTIONS.

Thanks!
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..eda61c0389c4 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -364,6 +364,15 @@  void dump_mem_limit(void);
 # define INIT_MEMBLOCK_RESERVED_REGIONS	(INIT_MEMBLOCK_REGIONS + NR_CPUS + 1)
 #endif
 
+/*
+ * memory regions which marked with flag MEMBLOCK_NOMAP may divide a continuous
+ * memory block into multiple parts. As a result, the number of memory regions
+ * is large.
+ */
+#ifdef CONFIG_EFI
+#define INIT_MEMBLOCK_MEMORY_REGIONS	1024
+#endif
+
 #include <asm-generic/memory_model.h>
 
 #endif /* __ASM_MEMORY_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index e4f03a6e8e56..7c63571a69d7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -29,6 +29,10 @@ 
 # define INIT_MEMBLOCK_RESERVED_REGIONS		INIT_MEMBLOCK_REGIONS
 #endif
 
+#ifndef INIT_MEMBLOCK_MEMORY_REGIONS
+#define INIT_MEMBLOCK_MEMORY_REGIONS		INIT_MEMBLOCK_REGIONS
+#endif
+
 /**
  * DOC: memblock overview
  *
@@ -55,9 +59,9 @@ 
  * the allocator metadata. The "memory" and "reserved" types are nicely
  * wrapped with struct memblock. This structure is statically
  * initialized at build time. The region arrays are initially sized to
- * %INIT_MEMBLOCK_REGIONS for "memory" and %INIT_MEMBLOCK_RESERVED_REGIONS
- * for "reserved". The region array for "physmem" is initially sized to
- * %INIT_PHYSMEM_REGIONS.
+ * %INIT_MEMBLOCK_MEMORY_REGIONS for "memory" and
+ * %INIT_MEMBLOCK_RESERVED_REGIONS for "reserved". The region array
+ * for "physmem" is initially sized to %INIT_PHYSMEM_REGIONS.
  * The memblock_allow_resize() enables automatic resizing of the region
  * arrays during addition of new regions. This feature should be used
  * with care so that memory allocated for the region array will not
@@ -102,7 +106,7 @@  unsigned long min_low_pfn;
 unsigned long max_pfn;
 unsigned long long max_possible_pfn;
 
-static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
+static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_MEMORY_REGIONS] __initdata_memblock;
 static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock;
 #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
 static struct memblock_region memblock_physmem_init_regions[INIT_PHYSMEM_REGIONS];
@@ -111,7 +115,7 @@  static struct memblock_region memblock_physmem_init_regions[INIT_PHYSMEM_REGIONS
 struct memblock memblock __initdata_memblock = {
 	.memory.regions		= memblock_memory_init_regions,
 	.memory.cnt		= 1,	/* empty dummy entry */
-	.memory.max		= INIT_MEMBLOCK_REGIONS,
+	.memory.max		= INIT_MEMBLOCK_MEMORY_REGIONS,
 	.memory.name		= "memory",
 
 	.reserved.regions	= memblock_reserved_init_regions,