diff mbox series

[v3] arm64: mm: Populate vmemmap at the page level for hotplugged sections

Message ID 20250103085002.27243-1-quic_zhenhuah@quicinc.com (mailing list archive)
State New
Headers show
Series [v3] arm64: mm: Populate vmemmap at the page level for hotplugged sections | expand

Commit Message

Zhenhua Huang Jan. 3, 2025, 8:50 a.m. UTC
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hotplugging granule is always 128M. However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
which added 2M hotplugging granule disrupted the arm64 assumptions.

Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when
pmd_sect() is true, the entire PMD section is cleared, even if there is
other effective subsection. For example pagemap1 and pagemap2 are part
of a single PMD entry and they are hot-added sequentially. Then pagemap1
is removed, vmemmap_free() will clear the entire PMD entry freeing the
struct page metadata for the whole section, even though pagemap2 is still
active.

To address the issue, we need to prevent PMD/PUD/CONT mappings for both
linear and vmemmap for non-boot sections if the size exceeds 2MB
(considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page
configuration.

Cc: stable@vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
---
Hi Catalin and Anshuman,
Based on your review comments, I concluded below patch and tested with my setup.
I have not folded patchset #2 since this patch seems to be enough for backporting..
Please see if you have further suggestions.

 arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Comments

Anshuman Khandual Jan. 6, 2025, 3:11 p.m. UTC | #1
Hello Zhenhua,

On 1/3/25 14:20, Zhenhua Huang wrote:
> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
> vmemmap to populate at the PMD section level which was suitable
> initially since hotplugging granule is always 128M. However,

A small nit s/hotplugging/hot plug/

Also please do mention that 128M is SECTION_SIZE_BITS == 27 on arm64
platform for 4K base pages which is the page size in context here.


> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> which added 2M hotplugging granule disrupted the arm64 assumptions.

A small nit s/hotplugging/hot plug/

Also please do mention that 2M is SUB_SECTION_SIZE.

> 
> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when
> pmd_sect() is true, the entire PMD section is cleared, even if there is
> other effective subsection. For example pagemap1 and pagemap2 are part
> of a single PMD entry and they are hot-added sequentially. Then pagemap1
> is removed, vmemmap_free() will clear the entire PMD entry freeing the
> struct page metadata for the whole section, even though pagemap2 is still
> active.

I guess pagemap1/2 here indicates the struct pages virtual regions for two
different vmemmap mapped sub sections covered via a single PMD entry ? But
please do update the above paragraph appropriately because pagemap<N> might
be confused with /proc/<pid>/pagemap mechanism.

Also please do mention that similar problems exist with linear mapping for
16K (PMD = 32M) and 64K (PMD = 512M) base pages as their block mappings
exceed SUBSECTION_SIZE. Hence tearing down the entire PMD mapping too will
leave other subsections unmapped in the linear mapping.

> 
> To address the issue, we need to prevent PMD/PUD/CONT mappings for both
> linear and vmemmap for non-boot sections if the size exceeds 2MB

s/if the size/if corresponding size on the given base page/

> (considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page
> configuration.

PMD block in 4K page size config as it's PMD_SIZE matches the SUBSECTION_SIZE
but only for linear mapping.


> 
> Cc: stable@vger.kernel.org # v5.4+
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> ---
> Hi Catalin and Anshuman,
> Based on your review comments, I concluded below patch and tested with my setup.
> I have not folded patchset #2 since this patch seems to be enough for backporting..
> Please see if you have further suggestions.
> 
>  arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e2739b69e11b..2b4d23f01d85 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -42,9 +42,11 @@
>  #include <asm/pgalloc.h>
>  #include <asm/kfence.h>
>  
> -#define NO_BLOCK_MAPPINGS	BIT(0)
> +#define NO_PMD_BLOCK_MAPPINGS	BIT(0)
>  #define NO_CONT_MAPPINGS	BIT(1)
>  #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
> +#define NO_PUD_BLOCK_MAPPINGS	BIT(3)  /* Hotplug case: do not want block mapping for PUD */
> +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)


Should not NO_PMD_BLOCK_MAPPINGS and NO_PUD_BLOCK_MAPPINGS be adjacent bits
for better readability ? But that will also cause some additional churn.

>  
>  u64 kimage_voffset __ro_after_init;
>  EXPORT_SYMBOL(kimage_voffset);
> @@ -254,7 +256,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~PMD_MASK) == 0 &&
> -		    (flags & NO_BLOCK_MAPPINGS) == 0) {
> +		    (flags & NO_PMD_BLOCK_MAPPINGS) == 0) {

Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS
as it will now contain NO_PMD_BLOCK_MAPPINGS.

>  			pmd_set_huge(pmdp, phys, prot);
>  
>  			/*
> @@ -356,10 +358,11 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
>  
>  		/*
>  		 * For 4K granule only, attempt to put down a 1GB block
> +		 * Hotplug case: do not attempt 1GB block
>  		 */
>  		if (pud_sect_supported() &&
>  		   ((addr | next | phys) & ~PUD_MASK) == 0 &&
> -		    (flags & NO_BLOCK_MAPPINGS) == 0) {
> +		   (flags & NO_PUD_BLOCK_MAPPINGS) == 0) {

Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS
as it will now contain NO_PUD_BLOCK_MAPPINGS.

>  			pud_set_huge(pudp, phys, prot);
>  
>  			/*
> @@ -1175,9 +1178,16 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
>  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  		struct vmem_altmap *altmap)
>  {
> +	unsigned long start_pfn;
> +	struct mem_section *ms;
> +
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>  
> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
> +	start_pfn = page_to_pfn((struct page *)start);
> +	ms = __pfn_to_section(start_pfn);
> +
> +	/* Hotplugged section not support hugepages */

Please update the comment as

	/*
	 * Hotplugged section does not support hugepages as
	 * PMD_SIZE (hence PUD_SIZE) section mapping covers
	 * struct page range that exceeds a SUBSECTION_SIZE
	 * i.e 2MB - for all available base page sizes.
	 */

> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms))
>  		return vmemmap_populate_basepages(start, end, node, altmap);
>  	else
>  		return vmemmap_populate_hugepages(start, end, node, altmap);
> @@ -1339,9 +1349,24 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  		    struct mhp_params *params)
>  {
>  	int ret, flags = NO_EXEC_MAPPINGS;
> +	unsigned long start_pfn = page_to_pfn((struct page *)start);
> +	struct mem_section *ms = __pfn_to_section(start_pfn);
>  
>  	VM_BUG_ON(!mhp_range_allowed(start, size, true));
>  
> +	/* Should not be invoked by early section */
> +	WARN_ON(early_section(ms));
> +
> +	if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))
> +	/*
> +	 * As per subsection granule is 2M, allow PMD block mapping in
> +	 * case 4K PAGES.
> +	 * Other cases forbid section mapping.
> +	 */

IIUC subsection size is 2M regardless of the page size. But with 4K pages
on arm64, PMD_SIZE happen to be 2M. Hence there is no problem in creating
linear mappings at PMD block level, which will not be the case with other
page sizes i.e 16K and 64K.

include/linux/mmzone.h

#define SUBSECTION_SHIFT 21
#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)

Please update the comment with following changes but above IS_ENABLED()
statement as it talks about all page size configs.

	/*
	 * 4K base page's PMD_SIZE matches SUBSECTION_SIZE i.e 2MB. Hence
	 * PMD section mapping can be allowed, but only for 4K base pages.
	 * Where as PMD_SIZE (hence PUD_SIZE) for other page sizes exceed
	 * SUBSECTION_SIZE.
	 */
> +		flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> +	else
> +		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> +
>  	if (can_set_direct_map())
>  		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>
Zhenhua Huang Jan. 7, 2025, 7:38 a.m. UTC | #2
Thanks Anshuman for your detailed review!

On 2025/1/6 23:11, Anshuman Khandual wrote:
> Hello Zhenhua,
> 
> On 1/3/25 14:20, Zhenhua Huang wrote:
>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
>> vmemmap to populate at the PMD section level which was suitable
>> initially since hotplugging granule is always 128M. However,
> 
> A small nit s/hotplugging/hot plug/
> 
> Also please do mention that 128M is SECTION_SIZE_BITS == 27 on arm64
> platform for 4K base pages which is the page size in context here.
> 
> 
>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> which added 2M hotplugging granule disrupted the arm64 assumptions.
> 
> A small nit s/hotplugging/hot plug/
> 
> Also please do mention that 2M is SUB_SECTION_SIZE.
> 
>>
>> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when
>> pmd_sect() is true, the entire PMD section is cleared, even if there is
>> other effective subsection. For example pagemap1 and pagemap2 are part
>> of a single PMD entry and they are hot-added sequentially. Then pagemap1
>> is removed, vmemmap_free() will clear the entire PMD entry freeing the
>> struct page metadata for the whole section, even though pagemap2 is still
>> active.
> 
> I guess pagemap1/2 here indicates the struct pages virtual regions for two
> different vmemmap mapped sub sections covered via a single PMD entry ? But
> please do update the above paragraph appropriately because pagemap<N> might
> be confused with /proc/<pid>/pagemap mechanism.
> 
> Also please do mention that similar problems exist with linear mapping for
> 16K (PMD = 32M) and 64K (PMD = 512M) base pages as their block mappings
> exceed SUBSECTION_SIZE. Hence tearing down the entire PMD mapping too will
> leave other subsections unmapped in the linear mapping.

Make sense to me, will update comments.

> 
>>
>> To address the issue, we need to prevent PMD/PUD/CONT mappings for both
>> linear and vmemmap for non-boot sections if the size exceeds 2MB
> 
> s/if the size/if corresponding size on the given base page/
> 
>> (considering sub-section is 2MB). We only permit 2MB blocks in a 4KB page
>> configuration.
> 
> PMD block in 4K page size config as it's PMD_SIZE matches the SUBSECTION_SIZE
> but only for linear mapping.
> 
> 
>>
>> Cc: stable@vger.kernel.org # v5.4+
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
>> ---
>> Hi Catalin and Anshuman,
>> Based on your review comments, I concluded below patch and tested with my setup.
>> I have not folded patchset #2 since this patch seems to be enough for backporting..
>> Please see if you have further suggestions.
>>
>>   arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++++++++++----
>>   1 file changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index e2739b69e11b..2b4d23f01d85 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -42,9 +42,11 @@
>>   #include <asm/pgalloc.h>
>>   #include <asm/kfence.h>
>>   
>> -#define NO_BLOCK_MAPPINGS	BIT(0)
>> +#define NO_PMD_BLOCK_MAPPINGS	BIT(0)
>>   #define NO_CONT_MAPPINGS	BIT(1)
>>   #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
>> +#define NO_PUD_BLOCK_MAPPINGS	BIT(3)  /* Hotplug case: do not want block mapping for PUD */
>> +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
> 
> 
> Should not NO_PMD_BLOCK_MAPPINGS and NO_PUD_BLOCK_MAPPINGS be adjacent bits
> for better readability ? But that will also cause some additional churn.

Agree, I see these macros are only used in this file, it's safe to update.

> 
>>   
>>   u64 kimage_voffset __ro_after_init;
>>   EXPORT_SYMBOL(kimage_voffset);
>> @@ -254,7 +256,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
>>   
>>   		/* try section mapping first */
>>   		if (((addr | next | phys) & ~PMD_MASK) == 0 &&
>> -		    (flags & NO_BLOCK_MAPPINGS) == 0) {
>> +		    (flags & NO_PMD_BLOCK_MAPPINGS) == 0) {
> 
> Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS
> as it will now contain NO_PMD_BLOCK_MAPPINGS.

Hmm... do you want me to include these in comments? The code appears to 
be clear as it is.

> 
>>   			pmd_set_huge(pmdp, phys, prot);
>>   
>>   			/*
>> @@ -356,10 +358,11 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
>>   
>>   		/*
>>   		 * For 4K granule only, attempt to put down a 1GB block
>> +		 * Hotplug case: do not attempt 1GB block
>>   		 */
>>   		if (pud_sect_supported() &&
>>   		   ((addr | next | phys) & ~PUD_MASK) == 0 &&
>> -		    (flags & NO_BLOCK_MAPPINGS) == 0) {
>> +		   (flags & NO_PUD_BLOCK_MAPPINGS) == 0) {
> 
> Behavior will remain unchanged for all existing users of NO_BLOCK_MAPPINGS
> as it will now contain NO_PUD_BLOCK_MAPPINGS.
> 

Ditto.

>>   			pud_set_huge(pudp, phys, prot);
>>   
>>   			/*
>> @@ -1175,9 +1178,16 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
>>   int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>   		struct vmem_altmap *altmap)
>>   {
>> +	unsigned long start_pfn;
>> +	struct mem_section *ms;
>> +
>>   	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>   
>> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>> +	start_pfn = page_to_pfn((struct page *)start);
>> +	ms = __pfn_to_section(start_pfn);
>> +
>> +	/* Hotplugged section not support hugepages */
> 
> Please update the comment as
> 
> 	/*
> 	 * Hotplugged section does not support hugepages as
> 	 * PMD_SIZE (hence PUD_SIZE) section mapping covers
> 	 * struct page range that exceeds a SUBSECTION_SIZE
> 	 * i.e 2MB - for all available base page sizes.
> 	 */
> 
>> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms))
>>   		return vmemmap_populate_basepages(start, end, node, altmap);
>>   	else
>>   		return vmemmap_populate_hugepages(start, end, node, altmap);
>> @@ -1339,9 +1349,24 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>   		    struct mhp_params *params)
>>   {
>>   	int ret, flags = NO_EXEC_MAPPINGS;
>> +	unsigned long start_pfn = page_to_pfn((struct page *)start);
>> +	struct mem_section *ms = __pfn_to_section(start_pfn);
>>   
>>   	VM_BUG_ON(!mhp_range_allowed(start, size, true));
>>   
>> +	/* Should not be invoked by early section */
>> +	WARN_ON(early_section(ms));
>> +
>> +	if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>> +	/*
>> +	 * As per subsection granule is 2M, allow PMD block mapping in
>> +	 * case 4K PAGES.
>> +	 * Other cases forbid section mapping.
>> +	 */
> 
> IIUC subsection size is 2M regardless of the page size. But with 4K pages
> on arm64, PMD_SIZE happen to be 2M. Hence there is no problem in creating
> linear mappings at PMD block level, which will not be the case with other
> page sizes i.e 16K and 64K.

Yes.

> 
> include/linux/mmzone.h
> 
> #define SUBSECTION_SHIFT 21
> #define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT)
> 
> Please update the comment with following changes but above IS_ENABLED()
> statement as it talks about all page size configs.
> 
> 	/*
> 	 * 4K base page's PMD_SIZE matches SUBSECTION_SIZE i.e 2MB. Hence
> 	 * PMD section mapping can be allowed, but only for 4K base pages.
> 	 * Where as PMD_SIZE (hence PUD_SIZE) for other page sizes exceed
> 	 * SUBSECTION_SIZE.
> 	 */

Nice comments! Thanks.

>> +		flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>> +	else
>> +		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>> +
>>   	if (can_set_direct_map())
>>   		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>>   
>
diff mbox series

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e2739b69e11b..2b4d23f01d85 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -42,9 +42,11 @@ 
 #include <asm/pgalloc.h>
 #include <asm/kfence.h>
 
-#define NO_BLOCK_MAPPINGS	BIT(0)
+#define NO_PMD_BLOCK_MAPPINGS	BIT(0)
 #define NO_CONT_MAPPINGS	BIT(1)
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
+#define NO_PUD_BLOCK_MAPPINGS	BIT(3)  /* Hotplug case: do not want block mapping for PUD */
+#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
 
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
@@ -254,7 +256,7 @@  static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
 
 		/* try section mapping first */
 		if (((addr | next | phys) & ~PMD_MASK) == 0 &&
-		    (flags & NO_BLOCK_MAPPINGS) == 0) {
+		    (flags & NO_PMD_BLOCK_MAPPINGS) == 0) {
 			pmd_set_huge(pmdp, phys, prot);
 
 			/*
@@ -356,10 +358,11 @@  static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
+		 * Hotplug case: do not attempt 1GB block
 		 */
 		if (pud_sect_supported() &&
 		   ((addr | next | phys) & ~PUD_MASK) == 0 &&
-		    (flags & NO_BLOCK_MAPPINGS) == 0) {
+		   (flags & NO_PUD_BLOCK_MAPPINGS) == 0) {
 			pud_set_huge(pudp, phys, prot);
 
 			/*
@@ -1175,9 +1178,16 @@  int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap)
 {
+	unsigned long start_pfn;
+	struct mem_section *ms;
+
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
-	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+	start_pfn = page_to_pfn((struct page *)start);
+	ms = __pfn_to_section(start_pfn);
+
+	/* Hotplugged section not support hugepages */
+	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || !early_section(ms))
 		return vmemmap_populate_basepages(start, end, node, altmap);
 	else
 		return vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1339,9 +1349,24 @@  int arch_add_memory(int nid, u64 start, u64 size,
 		    struct mhp_params *params)
 {
 	int ret, flags = NO_EXEC_MAPPINGS;
+	unsigned long start_pfn = page_to_pfn((struct page *)start);
+	struct mem_section *ms = __pfn_to_section(start_pfn);
 
 	VM_BUG_ON(!mhp_range_allowed(start, size, true));
 
+	/* Should not be invoked by early section */
+	WARN_ON(early_section(ms));
+
+	if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+	/*
+	 * As per subsection granule is 2M, allow PMD block mapping in
+	 * case 4K PAGES.
+	 * Other cases forbid section mapping.
+	 */
+		flags |= NO_PUD_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
+	else
+		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
+
 	if (can_set_direct_map())
 		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;