diff mbox series

[RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes

Message ID 1615174073-10520-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes | expand

Commit Message

Anshuman Khandual March 8, 2021, 3:27 a.m. UTC
Platforms like arm and arm64 have redefined pfn_valid() because their early
memory sections might have contained memmap holes caused by memblock areas
tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
for struct page backing. This scenario could be captured with a new option
CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
improved to accommodate such platforms. This reduces overall code footprint
and also improves maintainability.

Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
turn had expanded its scope to new platforms like arc and m68k. Rather lets
restrict back the scope for free_unused_memmap() to arm and arm64 platforms
using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.

While here, it exports the symbol memblock_is_map_memory() to build drivers
that depend on pfn_valid() but does not have the required visibility. After
this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
arm and arm64 platforms.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
has been lightly tested on the arm64 platform. The idea to represent this
unique situation on the arm and arm64 platforms with a config option was
proposed by David H during an earlier discussion [2]. This still does not
build on arm platform due to pfn_valid() resolution errors. Nonetheless
wanted to get some early feedback whether the overall approach here, is
acceptable or not.

[1] https://patchwork.kernel.org/project/linux-mm/list/?series=442433 
[2] https://lore.kernel.org/linux-arm-kernel/4b282848-d2d7-6156-4726-ce974b2dff41@redhat.com/

 arch/arm/Kconfig              |  2 +-
 arch/arm/include/asm/page.h   |  4 ----
 arch/arm/mm/init.c            | 13 -----------
 arch/arm64/Kconfig            |  2 +-
 arch/arm64/include/asm/page.h |  2 --
 arch/arm64/mm/init.c          | 41 -----------------------------------
 include/linux/mmzone.h        | 24 +++++++++++++++++++-
 mm/Kconfig                    | 10 +++++++++
 mm/memblock.c                 |  3 ++-
 9 files changed, 37 insertions(+), 64 deletions(-)

Comments

David Hildenbrand March 8, 2021, 8:37 a.m. UTC | #1
On 08.03.21 04:27, Anshuman Khandual wrote:
> Platforms like arm and arm64 have redefined pfn_valid() because their early
> memory sections might have contained memmap holes caused by memblock areas
> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
> for struct page backing. This scenario could be captured with a new option
> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
> improved to accommodate such platforms. This reduces overall code footprint
> and also improves maintainability.
> 
> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
> turn had expanded its scope to new platforms like arc and m68k. Rather lets
> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
> 
> While here, it exports the symbol memblock_is_map_memory() to build drivers
> that depend on pfn_valid() but does not have the required visibility. After
> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
> arm and arm64 platforms.
> 
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
> has been lightly tested on the arm64 platform. The idea to represent this
> unique situation on the arm and arm64 platforms with a config option was
> proposed by David H during an earlier discussion [2]. This still does not
> build on arm platform due to pfn_valid() resolution errors. Nonetheless
> wanted to get some early feedback whether the overall approach here, is
> acceptable or not.

It might make sense to keep the arm variant for now. The arm64 variant 
is where the magic happens and where we missed updates when working on 
the generic variant.

The generic variant really only applies to 64bit targets where we have 
SPARSEMEM. See x86 as an example.

[...]

>   /*
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 47946cec7584..93532994113f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>   }
>   #endif
>   
> +bool memblock_is_map_memory(phys_addr_t addr);
> +
>   #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>   static inline int pfn_valid(unsigned long pfn)
> +{
> +	phys_addr_t addr = PFN_PHYS(pfn);
> +
> +	/*
> +	 * Ensure the upper PAGE_SHIFT bits are clear in the
> +	 * pfn. Else it might lead to false positives when
> +	 * some of the upper bits are set, but the lower bits
> +	 * match a valid pfn.
> +	 */
> +	if (PHYS_PFN(addr) != pfn)
> +		return 0;

I think this should be fine for other archs as well.

> +
> +#ifdef CONFIG_SPARSEMEM

Why do we need the ifdef now? If that's to cover the arm case, then 
please consider the arm64 case only for now.

>   {
>   	struct mem_section *ms;
>   
> @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned long pfn)
>   	 * Traditionally early sections always returned pfn_valid() for
>   	 * the entire section-sized span.
>   	 */
> -	return early_section(ms) || pfn_section_valid(ms, pfn);
> +	if (early_section(ms))
> +		return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
> +			memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> +
> +	return pfn_section_valid(ms, pfn);
> +}
> +#endif
> +	return 1;
>   }
>   #endif
>   
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 24c045b24b95..0ec20f661b3f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP
>   config ARCH_KEEP_MEMBLOCK
>   	bool
>   
> +config HAVE_EARLY_SECTION_MEMMAP_HOLES
> +	depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
> +	def_bool n
> +	help
> +	  Early sections on certain platforms might have portions which are
> +	  not backed with struct page mapping as their memblock entries are
> +	  marked with MEMBLOCK_NOMAP. When subscribed, this option enables
> +	  specific handling for those memory sections in certain situations
> +	  such as pfn_valid().
> +
>   # Keep arch NUMA mapping infrastructure post-init.
>   config NUMA_KEEP_MEMINFO
>   	bool
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..d9fa2e62ab7a 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>   		return false;
>   	return !memblock_is_nomap(&memblock.memory.regions[i]);
>   }
> +EXPORT_SYMBOL(memblock_is_map_memory);
>   
>   int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>   			 unsigned long *start_pfn, unsigned long *end_pfn)
> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void)
>   	unsigned long start, end, prev_end = 0;
>   	int i;
>   
> -	if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
> +	if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
>   	    IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>   		return;
>   
> 

With

commit 1f90a3477df3ff1a91e064af554cdc887c8f9e5e
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Thu Feb 25 17:17:05 2021 -0800

     mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions

(still in -next I think)

You'll also have to take care of pfn_to_online_page().
Mike Rapoport March 8, 2021, 8:55 a.m. UTC | #2
Hi Anshuman,

On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
> Platforms like arm and arm64 have redefined pfn_valid() because their early
> memory sections might have contained memmap holes caused by memblock areas
> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
> for struct page backing. This scenario could be captured with a new option
> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
> improved to accommodate such platforms. This reduces overall code footprint
> and also improves maintainability.

I wonder whether arm64 would still need to free parts of its memmap after
the section size was reduced. Maybe the pain of arm64::pfn_valid() is not
worth the memory savings anymore?

> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
> turn had expanded its scope to new platforms like arc and m68k. Rather lets
> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.

The whole point of 4f5b0c178996 was to let arc and m68k to free unused
memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So
whatever implementation there will be for arm/arm64, please keep arc and
m68k functionally intact.
 
> While here, it exports the symbol memblock_is_map_memory() to build drivers
> that depend on pfn_valid() but does not have the required visibility. After
> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
> arm and arm64 platforms.
>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
> has been lightly tested on the arm64 platform. The idea to represent this
> unique situation on the arm and arm64 platforms with a config option was
> proposed by David H during an earlier discussion [2]. This still does not
> build on arm platform due to pfn_valid() resolution errors. Nonetheless
> wanted to get some early feedback whether the overall approach here, is
> acceptable or not.
> 
> [1] https://patchwork.kernel.org/project/linux-mm/list/?series=442433 
> [2] https://lore.kernel.org/linux-arm-kernel/4b282848-d2d7-6156-4726-ce974b2dff41@redhat.com/
> 
>  arch/arm/Kconfig              |  2 +-
>  arch/arm/include/asm/page.h   |  4 ----
>  arch/arm/mm/init.c            | 13 -----------
>  arch/arm64/Kconfig            |  2 +-
>  arch/arm64/include/asm/page.h |  2 --
>  arch/arm64/mm/init.c          | 41 -----------------------------------
>  include/linux/mmzone.h        | 24 +++++++++++++++++++-
>  mm/Kconfig                    | 10 +++++++++
>  mm/memblock.c                 |  3 ++-
>  9 files changed, 37 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 853aab5ab327..8b1d3089baa6 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -71,7 +71,6 @@ config ARM
>  	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
>  	select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
>  	select HAVE_ARCH_MMAP_RND_BITS if MMU
> -	select HAVE_ARCH_PFN_VALID
>  	select HAVE_ARCH_SECCOMP
>  	select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> @@ -84,6 +83,7 @@ config ARM
>  	select HAVE_DMA_CONTIGUOUS if MMU
>  	select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
> +	select HAVE_EARLY_SECTION_MEMMAP_HOLES
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
>  	select HAVE_EXIT_THREAD
>  	select HAVE_FAST_GUP if ARM_LPAE
> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> index 11b058a72a5b..7e3189083bd7 100644
> --- a/arch/arm/include/asm/page.h
> +++ b/arch/arm/include/asm/page.h
> @@ -153,10 +153,6 @@ extern void copy_page(void *to, const void *from);
>  
>  typedef struct page *pgtable_t;
>  
> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> -extern int pfn_valid(unsigned long);
> -#endif
> -
>  #include <asm/memory.h>
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 828a2561b229..9131ef4e599e 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -121,19 +121,6 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>  	free_area_init(max_zone_pfn);
>  }
>  
> -#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> -int pfn_valid(unsigned long pfn)
> -{
> -	phys_addr_t addr = __pfn_to_phys(pfn);
> -
> -	if (__phys_to_pfn(addr) != pfn)
> -		return 0;
> -
> -	return memblock_is_map_memory(addr);
> -}
> -EXPORT_SYMBOL(pfn_valid);
> -#endif
> -
>  static bool arm_memblock_steal_permitted = true;
>  
>  phys_addr_t __init arm_memblock_steal(phys_addr_t size, phys_addr_t align)
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1f212b47a48a..2ee48bdf9dc1 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -144,7 +144,6 @@ config ARM64
>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_MMAP_RND_BITS
>  	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> -	select HAVE_ARCH_PFN_VALID
>  	select HAVE_ARCH_PREL32_RELOCATIONS
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_STACKLEAK
> @@ -167,6 +166,7 @@ config ARM64
>  		if $(cc-option,-fpatchable-function-entry=2)
>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>  		if DYNAMIC_FTRACE_WITH_REGS
> +	select HAVE_EARLY_SECTION_MEMMAP_HOLES
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>  	select HAVE_FAST_GUP
>  	select HAVE_FTRACE_MCOUNT_RECORD
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..635a030a3826 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -37,8 +37,6 @@ void copy_highpage(struct page *to, struct page *from);
>  
>  typedef struct page *pgtable_t;
>  
> -extern int pfn_valid(unsigned long);
> -
>  #include <asm/memory.h>
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 3685e12aba9b..2cba4347aef2 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -217,47 +217,6 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  	free_area_init(max_zone_pfns);
>  }
>  
> -int pfn_valid(unsigned long pfn)
> -{
> -	phys_addr_t addr = PFN_PHYS(pfn);
> -
> -	/*
> -	 * Ensure the upper PAGE_SHIFT bits are clear in the
> -	 * pfn. Else it might lead to false positives when
> -	 * some of the upper bits are set, but the lower bits
> -	 * match a valid pfn.
> -	 */
> -	if (PHYS_PFN(addr) != pfn)
> -		return 0;
> -
> -#ifdef CONFIG_SPARSEMEM
> -{
> -	struct mem_section *ms;
> -
> -	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> -		return 0;
> -
> -	ms = __pfn_to_section(pfn);
> -	if (!valid_section(ms))
> -		return 0;
> -
> -	/*
> -	 * ZONE_DEVICE memory does not have the memblock entries.
> -	 * memblock_is_map_memory() check for ZONE_DEVICE based
> -	 * addresses will always fail. Even the normal hotplugged
> -	 * memory will never have MEMBLOCK_NOMAP flag set in their
> -	 * memblock entries. Skip memblock search for all non early
> -	 * memory sections covering all of hotplug memory including
> -	 * both normal and ZONE_DEVICE based.
> -	 */
> -	if (!early_section(ms))
> -		return pfn_section_valid(ms, pfn);
> -}
> -#endif
> -	return memblock_is_map_memory(addr);
> -}
> -EXPORT_SYMBOL(pfn_valid);
> -
>  static phys_addr_t memory_limit = PHYS_ADDR_MAX;
>  
>  /*
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 47946cec7584..93532994113f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>  }
>  #endif
>  
> +bool memblock_is_map_memory(phys_addr_t addr);
> +
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
> +{
> +	phys_addr_t addr = PFN_PHYS(pfn);
> +
> +	/*
> +	 * Ensure the upper PAGE_SHIFT bits are clear in the
> +	 * pfn. Else it might lead to false positives when
> +	 * some of the upper bits are set, but the lower bits
> +	 * match a valid pfn.
> +	 */
> +	if (PHYS_PFN(addr) != pfn)
> +		return 0;
> +
> +#ifdef CONFIG_SPARSEMEM
>  {
>  	struct mem_section *ms;
>  
> @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned long pfn)
>  	 * Traditionally early sections always returned pfn_valid() for
>  	 * the entire section-sized span.
>  	 */
> -	return early_section(ms) || pfn_section_valid(ms, pfn);
> +	if (early_section(ms))
> +		return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
> +			memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> +
> +	return pfn_section_valid(ms, pfn);
> +}
> +#endif
> +	return 1;
>  }
>  #endif
>  
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 24c045b24b95..0ec20f661b3f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP
>  config ARCH_KEEP_MEMBLOCK
>  	bool
>  
> +config HAVE_EARLY_SECTION_MEMMAP_HOLES
> +	depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
> +	def_bool n
> +	help
> +	  Early sections on certain platforms might have portions which are
> +	  not backed with struct page mapping as their memblock entries are
> +	  marked with MEMBLOCK_NOMAP. When subscribed, this option enables
> +	  specific handling for those memory sections in certain situations
> +	  such as pfn_valid().
> +
>  # Keep arch NUMA mapping infrastructure post-init.
>  config NUMA_KEEP_MEMINFO
>  	bool
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..d9fa2e62ab7a 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>  		return false;
>  	return !memblock_is_nomap(&memblock.memory.regions[i]);
>  }
> +EXPORT_SYMBOL(memblock_is_map_memory);
>  
>  int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>  			 unsigned long *start_pfn, unsigned long *end_pfn)
> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void)
>  	unsigned long start, end, prev_end = 0;
>  	int i;
>  
> -	if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
> +	if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
>  	    IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>  		return;
>  
> -- 
> 2.20.1
>
Anshuman Khandual March 11, 2021, 4:29 a.m. UTC | #3
On 3/8/21 2:07 PM, David Hildenbrand wrote:
> On 08.03.21 04:27, Anshuman Khandual wrote:
>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>> memory sections might have contained memmap holes caused by memblock areas
>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>> for struct page backing. This scenario could be captured with a new option
>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>> improved to accommodate such platforms. This reduces overall code footprint
>> and also improves maintainability.
>>
>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
>>
>> While here, it exports the symbol memblock_is_map_memory() to build drivers
>> that depend on pfn_valid() but does not have the required visibility. After
>> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
>> arm and arm64 platforms.
>>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux-mm@kvack.org
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
>> has been lightly tested on the arm64 platform. The idea to represent this
>> unique situation on the arm and arm64 platforms with a config option was
>> proposed by David H during an earlier discussion [2]. This still does not
>> build on arm platform due to pfn_valid() resolution errors. Nonetheless
>> wanted to get some early feedback whether the overall approach here, is
>> acceptable or not.
> 
> It might make sense to keep the arm variant for now. The arm64 variant is where the magic happens and where we missed updates when working on the generic variant.

Sure, will drop the changes on arm.

> 
> The generic variant really only applies to 64bit targets where we have SPARSEMEM. See x86 as an example.

Okay.

> 
> [...]
> 
>>   /*
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 47946cec7584..93532994113f 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>>   }
>>   #endif
>>   +bool memblock_is_map_memory(phys_addr_t addr);
>> +
>>   #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>>   static inline int pfn_valid(unsigned long pfn)
>> +{
>> +    phys_addr_t addr = PFN_PHYS(pfn);
>> +
>> +    /*
>> +     * Ensure the upper PAGE_SHIFT bits are clear in the
>> +     * pfn. Else it might lead to false positives when
>> +     * some of the upper bits are set, but the lower bits
>> +     * match a valid pfn.
>> +     */
>> +    if (PHYS_PFN(addr) != pfn)
>> +        return 0;
> 
> I think this should be fine for other archs as well.
> 
>> +
>> +#ifdef CONFIG_SPARSEMEM
> 
> Why do we need the ifdef now? If that's to cover the arm case, then please consider the arm64 case only for now.

Yes, it is not needed.

> 
>>   {
>>       struct mem_section *ms;
>>   @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned long pfn)
>>        * Traditionally early sections always returned pfn_valid() for
>>        * the entire section-sized span.
>>        */
>> -    return early_section(ms) || pfn_section_valid(ms, pfn);
>> +    if (early_section(ms))
>> +        return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
>> +            memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> +
>> +    return pfn_section_valid(ms, pfn);
>> +}
>> +#endif
>> +    return 1;
>>   }
>>   #endif
>>   diff --git a/mm/Kconfig b/mm/Kconfig
>> index 24c045b24b95..0ec20f661b3f 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP
>>   config ARCH_KEEP_MEMBLOCK
>>       bool
>>   +config HAVE_EARLY_SECTION_MEMMAP_HOLES
>> +    depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
>> +    def_bool n
>> +    help
>> +      Early sections on certain platforms might have portions which are
>> +      not backed with struct page mapping as their memblock entries are
>> +      marked with MEMBLOCK_NOMAP. When subscribed, this option enables
>> +      specific handling for those memory sections in certain situations
>> +      such as pfn_valid().
>> +
>>   # Keep arch NUMA mapping infrastructure post-init.
>>   config NUMA_KEEP_MEMINFO
>>       bool
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index afaefa8fc6ab..d9fa2e62ab7a 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>>           return false;
>>       return !memblock_is_nomap(&memblock.memory.regions[i]);
>>   }
>> +EXPORT_SYMBOL(memblock_is_map_memory);
>>     int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>>                unsigned long *start_pfn, unsigned long *end_pfn)
>> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void)
>>       unsigned long start, end, prev_end = 0;
>>       int i;
>>   -    if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
>> +    if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
>>           IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>>           return;
>>  
> 
> With
> 
> commit 1f90a3477df3ff1a91e064af554cdc887c8f9e5e
> Author: Dan Williams <dan.j.williams@intel.com>
> Date:   Thu Feb 25 17:17:05 2021 -0800
> 
>     mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions
> 
> (still in -next I think)

Already has merged mainline.

> 
> You'll also have to take care of pfn_to_online_page().
> 

Something like this would work ?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5ba51a8bdaeb..19812deb807f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -309,6 +309,11 @@ struct page *pfn_to_online_page(unsigned long pfn)
         * Save some code text when online_section() +
         * pfn_section_valid() are sufficient.
         */
+       if (IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES)) {
+               if (early_section(ms) && !memblock_is_map_memory(PFN_PHYS(pfn)))
+                       return NULL;
+       }
+
        if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn))
                return NULL;
Anshuman Khandual March 11, 2021, 7:52 a.m. UTC | #4
On 3/8/21 2:25 PM, Mike Rapoport wrote:
> Hi Anshuman,
> 
> On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>> memory sections might have contained memmap holes caused by memblock areas
>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>> for struct page backing. This scenario could be captured with a new option
>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>> improved to accommodate such platforms. This reduces overall code footprint
>> and also improves maintainability.
> 
> I wonder whether arm64 would still need to free parts of its memmap after

free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
I am not sure whether there still might be some platforms or boards which would
benefit from this. Hence lets just keep this unchanged for now.

> the section size was reduced. Maybe the pain of arm64::pfn_valid() is not
> worth the memory savings anymore?

arm64 pfn_valid() special case was primarily because of MEMBLOCK_NOMAP tagged
memory areas, which are reserved by the firmware.

> 
>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
> 
> The whole point of 4f5b0c178996 was to let arc and m68k to free unused
> memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So
> whatever implementation there will be for arm/arm64, please keep arc and
> m68k functionally intact.

Okay. Will protect free_unused_memmap() on HAVE_EARLY_SECTION_MEMMAP_HOLES
config as well.

diff --git a/mm/memblock.c b/mm/memblock.c
index d9fa2e62ab7a..11b624e94127 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1927,8 +1927,11 @@ static void __init free_unused_memmap(void)
        unsigned long start, end, prev_end = 0;
        int i;
 
-       if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
-           IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+       if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+               return;
+
+       if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) &&
+           !IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID))
                return;
Mike Rapoport March 11, 2021, 8:14 a.m. UTC | #5
On Thu, Mar 11, 2021 at 01:22:53PM +0530, Anshuman Khandual wrote:
> 
> On 3/8/21 2:25 PM, Mike Rapoport wrote:
> > Hi Anshuman,
> > 
> > On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
> >> Platforms like arm and arm64 have redefined pfn_valid() because their early
> >> memory sections might have contained memmap holes caused by memblock areas
> >> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
> >> for struct page backing. This scenario could be captured with a new option
> >> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
> >> improved to accommodate such platforms. This reduces overall code footprint
> >> and also improves maintainability.
> > 
> > I wonder whether arm64 would still need to free parts of its memmap after
> 
> free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
> I am not sure whether there still might be some platforms or boards which would
> benefit from this. Hence lets just keep this unchanged for now.

For builds with VMEMMAP off free_unused_memmap() would still provide some
memory savings. But the question is whether these savings are really
important for somebody? When the section size was 512M no doubt smaller
systems would waste a lot of memory for empty memory map. But with the
section size of 128M and the general increase in memory sizes even on
smaller devices we might be not actually saving anything. OTOH, we need to
keep arm64::pfn_valid() up to date with memory hot(un)plug, ZONE_DEVICE
etc.

Maybe as a compromise we can start with making free_unused_memmap() and
arm64::pfn_valid() only available in "legacy" mode that does not support
memory hotplug, pmem and so on?
 
> > the section size was reduced. Maybe the pain of arm64::pfn_valid() is not
> > worth the memory savings anymore?
> 
> arm64 pfn_valid() special case was primarily because of MEMBLOCK_NOMAP tagged
> memory areas, which are reserved by the firmware.

I'm confused now. AFAIU, pfn_valid() means that there is a struct page for that
pfn and nothing else. Why is that related to MEMBLOCK_NOMAP?

> > 
> >> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
> >> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
> >> turn had expanded its scope to new platforms like arc and m68k. Rather lets
> >> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
> >> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
> > 
> > The whole point of 4f5b0c178996 was to let arc and m68k to free unused
> > memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So
> > whatever implementation there will be for arm/arm64, please keep arc and
> > m68k functionally intact.
> 
> Okay. Will protect free_unused_memmap() on HAVE_EARLY_SECTION_MEMMAP_HOLES
> config as well.
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index d9fa2e62ab7a..11b624e94127 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1927,8 +1927,11 @@ static void __init free_unused_memmap(void)
>         unsigned long start, end, prev_end = 0;
>         int i;
>  
> -       if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
> -           IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> +       if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> +               return;
> +
> +       if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) &&
> +           !IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID))
>                 return;
Will Deacon March 11, 2021, 9:33 a.m. UTC | #6
On Thu, Mar 11, 2021 at 01:22:53PM +0530, Anshuman Khandual wrote:
> On 3/8/21 2:25 PM, Mike Rapoport wrote:
> > On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
> >> Platforms like arm and arm64 have redefined pfn_valid() because their early
> >> memory sections might have contained memmap holes caused by memblock areas
> >> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
> >> for struct page backing. This scenario could be captured with a new option
> >> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
> >> improved to accommodate such platforms. This reduces overall code footprint
> >> and also improves maintainability.
> > 
> > I wonder whether arm64 would still need to free parts of its memmap after
> 
> free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
> I am not sure whether there still might be some platforms or boards which would
> benefit from this. Hence lets just keep this unchanged for now.

In my opinion, unless there's a compelling reason for us to offer all of
these different implementations of the memmap on arm64 then we shouldn't
bother -- it's not like it's fun to maintain! Just use sparsemem vmemmap
and be done with it. Is there some reason we can't do that?

Will
Mike Rapoport March 11, 2021, 10:43 a.m. UTC | #7
On Thu, Mar 11, 2021 at 09:33:02AM +0000, Will Deacon wrote:
> On Thu, Mar 11, 2021 at 01:22:53PM +0530, Anshuman Khandual wrote:
> > On 3/8/21 2:25 PM, Mike Rapoport wrote:
> > > On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
> > >> Platforms like arm and arm64 have redefined pfn_valid() because their early
> > >> memory sections might have contained memmap holes caused by memblock areas
> > >> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
> > >> for struct page backing. This scenario could be captured with a new option
> > >> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
> > >> improved to accommodate such platforms. This reduces overall code footprint
> > >> and also improves maintainability.
> > > 
> > > I wonder whether arm64 would still need to free parts of its memmap after
> > 
> > free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
> > I am not sure whether there still might be some platforms or boards which would
> > benefit from this. Hence lets just keep this unchanged for now.
> 
> In my opinion, unless there's a compelling reason for us to offer all of
> these different implementations of the memmap on arm64 then we shouldn't
> bother -- it's not like it's fun to maintain! Just use sparsemem vmemmap
> and be done with it. Is there some reason we can't do that?

Regardless if the decision to stop supporting other memory models, I think
it's a long due for arm64 to stop using pfn_valid() for anything except to
check whether there is a valid struct page for a pfn.

Something like the completely untested patch below:

From 3a753a56c2d87711f937ba09e4e14e4ad4926c38 Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.ibm.com>
Date: Thu, 11 Mar 2021 12:28:29 +0200
Subject: [PATCH] arm64: decouple check whether pfn is normal memory from
 pfn_valid()

The intended semantics of pfn_valid() is to verify whether there is a
struct page for the pfn in question and nothing else.

Yet, on arm64 it is used to distinguish memory areas that are mapped in the
linear map vs those that require ioremap() to access them.

Introduce a dedicated pfn_is_memory() to perform such check and use it
where appropriate.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm64/include/asm/memory.h | 2 +-
 arch/arm64/include/asm/page.h   | 1 +
 arch/arm64/kvm/mmu.c            | 2 +-
 arch/arm64/mm/init.c            | 6 ++++++
 arch/arm64/mm/ioremap.c         | 4 ++--
 arch/arm64/mm/mmu.c             | 2 +-
 6 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index c759faf7a1ff..778dbfe95d0e 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -346,7 +346,7 @@ static inline void *phys_to_virt(phys_addr_t x)
 
 #define virt_addr_valid(addr)	({					\
 	__typeof__(addr) __addr = __tag_reset(addr);			\
-	__is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));	\
+	__is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));	\
 })
 
 void dump_mem_limit(void);
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..32b485bcc6ff 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from);
 typedef struct page *pgtable_t;
 
 extern int pfn_valid(unsigned long);
+extern int pfn_is_memory(unsigned long);
 
 #include <asm/memory.h>
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..a60069604361 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
-	return !pfn_valid(pfn);
+	return !pfn_is_memory(pfn);
 }
 
 /*
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..77c08853bafc 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -235,6 +235,12 @@ int pfn_valid(unsigned long pfn)
 }
 EXPORT_SYMBOL(pfn_valid);
 
+int pfn_is_memory(unsigned long pfn)
+{
+	return memblock_is_map_memory(PFN_PHYS(pfn));
+}
+EXPORT_SYMBOL(pfn_is_memory);
+
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
 
 /*
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b5e83c46b23e..82a369b22ef5 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
 	/*
 	 * Don't allow RAM to be mapped.
 	 */
-	if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
+	if (WARN_ON(pfn_is_memory(__phys_to_pfn(phys_addr))))
 		return NULL;
 
 	area = get_vm_area_caller(size, VM_IOREMAP, caller);
@@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
 void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
 {
 	/* For normal memory we already have a cacheable mapping. */
-	if (pfn_valid(__phys_to_pfn(phys_addr)))
+	if (pfn_is_memory(__phys_to_pfn(phys_addr)))
 		return (void __iomem *)__phys_to_virt(phys_addr);
 
 	return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 3802cfbdd20d..ee66f2f21b6f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
-	if (!pfn_valid(pfn))
+	if (!pfn_is_memory(pfn))
 		return pgprot_noncached(vma_prot);
 	else if (file->f_flags & O_SYNC)
 		return pgprot_writecombine(vma_prot);
David Hildenbrand March 17, 2021, 11:20 a.m. UTC | #8
On 11.03.21 05:29, Anshuman Khandual wrote:
> 
> 
> On 3/8/21 2:07 PM, David Hildenbrand wrote:
>> On 08.03.21 04:27, Anshuman Khandual wrote:
>>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>>> memory sections might have contained memmap holes caused by memblock areas
>>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>>> for struct page backing. This scenario could be captured with a new option
>>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>>> improved to accommodate such platforms. This reduces overall code footprint
>>> and also improves maintainability.
>>>
>>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
>>>
>>> While here, it exports the symbol memblock_is_map_memory() to build drivers
>>> that depend on pfn_valid() but does not have the required visibility. After
>>> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from both
>>> arm and arm64 platforms.
>>>
>>> Cc: Russell King <linux@armlinux.org.uk>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Mike Rapoport <rppt@kernel.org>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: linux-mm@kvack.org
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] and
>>> has been lightly tested on the arm64 platform. The idea to represent this
>>> unique situation on the arm and arm64 platforms with a config option was
>>> proposed by David H during an earlier discussion [2]. This still does not
>>> build on arm platform due to pfn_valid() resolution errors. Nonetheless
>>> wanted to get some early feedback whether the overall approach here, is
>>> acceptable or not.
>>
>> It might make sense to keep the arm variant for now. The arm64 variant is where the magic happens and where we missed updates when working on the generic variant.
> 
> Sure, will drop the changes on arm.
> 
>>
>> The generic variant really only applies to 64bit targets where we have SPARSEMEM. See x86 as an example.
> 
> Okay.
> 
>>
>> [...]
>>
>>>    /*
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 47946cec7584..93532994113f 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>>>    }
>>>    #endif
>>>    +bool memblock_is_map_memory(phys_addr_t addr);
>>> +
>>>    #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>>>    static inline int pfn_valid(unsigned long pfn)
>>> +{
>>> +    phys_addr_t addr = PFN_PHYS(pfn);
>>> +
>>> +    /*
>>> +     * Ensure the upper PAGE_SHIFT bits are clear in the
>>> +     * pfn. Else it might lead to false positives when
>>> +     * some of the upper bits are set, but the lower bits
>>> +     * match a valid pfn.
>>> +     */
>>> +    if (PHYS_PFN(addr) != pfn)
>>> +        return 0;
>>
>> I think this should be fine for other archs as well.
>>
>>> +
>>> +#ifdef CONFIG_SPARSEMEM
>>
>> Why do we need the ifdef now? If that's to cover the arm case, then please consider the arm64 case only for now.
> 
> Yes, it is not needed.
> 
>>
>>>    {
>>>        struct mem_section *ms;
>>>    @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned long pfn)
>>>         * Traditionally early sections always returned pfn_valid() for
>>>         * the entire section-sized span.
>>>         */
>>> -    return early_section(ms) || pfn_section_valid(ms, pfn);
>>> +    if (early_section(ms))
>>> +        return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
>>> +            memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>>> +
>>> +    return pfn_section_valid(ms, pfn);
>>> +}
>>> +#endif
>>> +    return 1;
>>>    }
>>>    #endif
>>>    diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 24c045b24b95..0ec20f661b3f 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP
>>>    config ARCH_KEEP_MEMBLOCK
>>>        bool
>>>    +config HAVE_EARLY_SECTION_MEMMAP_HOLES
>>> +    depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
>>> +    def_bool n
>>> +    help
>>> +      Early sections on certain platforms might have portions which are
>>> +      not backed with struct page mapping as their memblock entries are
>>> +      marked with MEMBLOCK_NOMAP. When subscribed, this option enables
>>> +      specific handling for those memory sections in certain situations
>>> +      such as pfn_valid().
>>> +
>>>    # Keep arch NUMA mapping infrastructure post-init.
>>>    config NUMA_KEEP_MEMINFO
>>>        bool
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index afaefa8fc6ab..d9fa2e62ab7a 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>>>            return false;
>>>        return !memblock_is_nomap(&memblock.memory.regions[i]);
>>>    }
>>> +EXPORT_SYMBOL(memblock_is_map_memory);
>>>      int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>>>                 unsigned long *start_pfn, unsigned long *end_pfn)
>>> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void)
>>>        unsigned long start, end, prev_end = 0;
>>>        int i;
>>>    -    if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
>>> +    if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
>>>            IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>>>            return;
>>>   
>>
>> With
>>
>> commit 1f90a3477df3ff1a91e064af554cdc887c8f9e5e
>> Author: Dan Williams <dan.j.williams@intel.com>
>> Date:   Thu Feb 25 17:17:05 2021 -0800
>>
>>      mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions
>>
>> (still in -next I think)
> 
> Already has merged mainline.
> 
>>
>> You'll also have to take care of pfn_to_online_page().
>>
> 
> Something like this would work ?
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 5ba51a8bdaeb..19812deb807f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -309,6 +309,11 @@ struct page *pfn_to_online_page(unsigned long pfn)
>           * Save some code text when online_section() +
>           * pfn_section_valid() are sufficient.
>           */
> +       if (IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES)) {
> +               if (early_section(ms) && !memblock_is_map_memory(PFN_PHYS(pfn)))
> +                       return NULL;
> +       }
> +
>          if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn))
>                  return NULL;
> 

Sorry for the late reply, just stumbled over this again.

I think, yes. I do wonder if we then still need the 
CONFIG_HAVE_ARCH_PFN_VALID handling below - are there any custom 
pfn_valid() implementation with SPARSE remaining? I don't think so.
diff mbox series

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 853aab5ab327..8b1d3089baa6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -71,7 +71,6 @@  config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
-	select HAVE_ARCH_PFN_VALID
 	select HAVE_ARCH_SECCOMP
 	select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
@@ -84,6 +83,7 @@  config ARM
 	select HAVE_DMA_CONTIGUOUS if MMU
 	select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
+	select HAVE_EARLY_SECTION_MEMMAP_HOLES
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP if ARM_LPAE
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 11b058a72a5b..7e3189083bd7 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -153,10 +153,6 @@  extern void copy_page(void *to, const void *from);
 
 typedef struct page *pgtable_t;
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
-#endif
-
 #include <asm/memory.h>
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 828a2561b229..9131ef4e599e 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -121,19 +121,6 @@  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
 	free_area_init(max_zone_pfn);
 }
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-int pfn_valid(unsigned long pfn)
-{
-	phys_addr_t addr = __pfn_to_phys(pfn);
-
-	if (__phys_to_pfn(addr) != pfn)
-		return 0;
-
-	return memblock_is_map_memory(addr);
-}
-EXPORT_SYMBOL(pfn_valid);
-#endif
-
 static bool arm_memblock_steal_permitted = true;
 
 phys_addr_t __init arm_memblock_steal(phys_addr_t size, phys_addr_t align)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1f212b47a48a..2ee48bdf9dc1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -144,7 +144,6 @@  config ARM64
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
-	select HAVE_ARCH_PFN_VALID
 	select HAVE_ARCH_PREL32_RELOCATIONS
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_STACKLEAK
@@ -167,6 +166,7 @@  config ARM64
 		if $(cc-option,-fpatchable-function-entry=2)
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
 		if DYNAMIC_FTRACE_WITH_REGS
+	select HAVE_EARLY_SECTION_MEMMAP_HOLES
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..635a030a3826 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -37,8 +37,6 @@  void copy_highpage(struct page *to, struct page *from);
 
 typedef struct page *pgtable_t;
 
-extern int pfn_valid(unsigned long);
-
 #include <asm/memory.h>
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 3685e12aba9b..2cba4347aef2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -217,47 +217,6 @@  static void __init zone_sizes_init(unsigned long min, unsigned long max)
 	free_area_init(max_zone_pfns);
 }
 
-int pfn_valid(unsigned long pfn)
-{
-	phys_addr_t addr = PFN_PHYS(pfn);
-
-	/*
-	 * Ensure the upper PAGE_SHIFT bits are clear in the
-	 * pfn. Else it might lead to false positives when
-	 * some of the upper bits are set, but the lower bits
-	 * match a valid pfn.
-	 */
-	if (PHYS_PFN(addr) != pfn)
-		return 0;
-
-#ifdef CONFIG_SPARSEMEM
-{
-	struct mem_section *ms;
-
-	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
-		return 0;
-
-	ms = __pfn_to_section(pfn);
-	if (!valid_section(ms))
-		return 0;
-
-	/*
-	 * ZONE_DEVICE memory does not have the memblock entries.
-	 * memblock_is_map_memory() check for ZONE_DEVICE based
-	 * addresses will always fail. Even the normal hotplugged
-	 * memory will never have MEMBLOCK_NOMAP flag set in their
-	 * memblock entries. Skip memblock search for all non early
-	 * memory sections covering all of hotplug memory including
-	 * both normal and ZONE_DEVICE based.
-	 */
-	if (!early_section(ms))
-		return pfn_section_valid(ms, pfn);
-}
-#endif
-	return memblock_is_map_memory(addr);
-}
-EXPORT_SYMBOL(pfn_valid);
-
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
 
 /*
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 47946cec7584..93532994113f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1409,8 +1409,23 @@  static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
 }
 #endif
 
+bool memblock_is_map_memory(phys_addr_t addr);
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
+{
+	phys_addr_t addr = PFN_PHYS(pfn);
+
+	/*
+	 * Ensure the upper PAGE_SHIFT bits are clear in the
+	 * pfn. Else it might lead to false positives when
+	 * some of the upper bits are set, but the lower bits
+	 * match a valid pfn.
+	 */
+	if (PHYS_PFN(addr) != pfn)
+		return 0;
+
+#ifdef CONFIG_SPARSEMEM
 {
 	struct mem_section *ms;
 
@@ -1423,7 +1438,14 @@  static inline int pfn_valid(unsigned long pfn)
 	 * Traditionally early sections always returned pfn_valid() for
 	 * the entire section-sized span.
 	 */
-	return early_section(ms) || pfn_section_valid(ms, pfn);
+	if (early_section(ms))
+		return IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ?
+			memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+
+	return pfn_section_valid(ms, pfn);
+}
+#endif
+	return 1;
 }
 #endif
 
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..0ec20f661b3f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -135,6 +135,16 @@  config HAVE_FAST_GUP
 config ARCH_KEEP_MEMBLOCK
 	bool
 
+config HAVE_EARLY_SECTION_MEMMAP_HOLES
+	depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP
+	def_bool n
+	help
+	  Early sections on certain platforms might have portions which are
+	  not backed with struct page mapping as their memblock entries are
+	  marked with MEMBLOCK_NOMAP. When subscribed, this option enables
+	  specific handling for those memory sections in certain situations
+	  such as pfn_valid().
+
 # Keep arch NUMA mapping infrastructure post-init.
 config NUMA_KEEP_MEMINFO
 	bool
diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..d9fa2e62ab7a 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1744,6 +1744,7 @@  bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
 		return false;
 	return !memblock_is_nomap(&memblock.memory.regions[i]);
 }
+EXPORT_SYMBOL(memblock_is_map_memory);
 
 int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
 			 unsigned long *start_pfn, unsigned long *end_pfn)
@@ -1926,7 +1927,7 @@  static void __init free_unused_memmap(void)
 	unsigned long start, end, prev_end = 0;
 	int i;
 
-	if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) ||
+	if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
 	    IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
 		return;