diff mbox series

[v2,1/2] arm64: mm: vmemmap populate to page level if not section aligned

Message ID 20241209094227.1529977-2-quic_zhenhuah@quicinc.com (mailing list archive)
State New
Headers show
Series Fix subsection vmemmap_populate logic | expand

Commit Message

Zhenhua Huang Dec. 9, 2024, 9:42 a.m. UTC
Commit c1cc1552616d ("arm64: MMU initialisation")
optimizes the vmemmap to populate at the PMD section level. However, if
start or end is not aligned to a section boundary, such as when a
subsection is hot added, populating the entire section is wasteful. For
instance, if only one subsection hot-added, the entire section's struct
page metadata will still be populated.In such cases, it is more effective
to populate at page granularity.

This change also addresses mismatch issues during vmemmap_free(): When
pmd_sect() is true, the entire PMD section is cleared, even if there is
other effective subsection. For example, pagemap1 and pagemap2 are part
of a single PMD entry and they are hot-added sequentially. Then pagemap1
is removed, vmemmap_free() will clear the entire PMD entry, freeing the
struct page metadata for the whole section, even though pagemap2 is still
active.

Fixes: c1cc1552616d ("arm64: MMU initialisation")
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
---
 arch/arm64/mm/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Catalin Marinas Dec. 20, 2024, 6:30 p.m. UTC | #1
On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
> Commit c1cc1552616d ("arm64: MMU initialisation")
> optimizes the vmemmap to populate at the PMD section level. However, if
> start or end is not aligned to a section boundary, such as when a
> subsection is hot added, populating the entire section is wasteful. For
> instance, if only one subsection hot-added, the entire section's struct
> page metadata will still be populated.In such cases, it is more effective
> to populate at page granularity.

OK, so from the vmemmap perspective, we waste up to 2MB memory that has
been allocated even if a 2MB hot-plugged subsection required only 32KB
of struct page. I don't mind this much really. I hope all those
subsections are not scattered around to amplify this waste.

> This change also addresses mismatch issues during vmemmap_free(): When
> pmd_sect() is true, the entire PMD section is cleared, even if there is
> other effective subsection. For example, pagemap1 and pagemap2 are part
> of a single PMD entry and they are hot-added sequentially. Then pagemap1
> is removed, vmemmap_free() will clear the entire PMD entry, freeing the
> struct page metadata for the whole section, even though pagemap2 is still
> active.

I think that's the bigger issue. We can't unplug a subsection only.
Looking at unmap_hotplug_pmd_range(), it frees a 2MB vmemmap section but
that may hold struct page for the equivalent of 128MB of memory. So any
struct page accesses for the other subsections will fault.

> Fixes: c1cc1552616d ("arm64: MMU initialisation")

I wouldn't add a fix for the first commit adding arm64 support, we did
not even have memory hotplug at the time (added later in 5.7 by commit
bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
sub-section hotplug"). That commit broke some arm64 assumptions.

> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> ---
>  arch/arm64/mm/mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e2739b69e11b..fd59ee44960e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  {
>  	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>  
> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
> +	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
> +	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
>  		return vmemmap_populate_basepages(start, end, node, altmap);
>  	else
>  		return vmemmap_populate_hugepages(start, end, node, altmap);

An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
nuking the whole vmemmap pmd section if it's not empty. Not sure how
easy that is, whether we have the necessary information (I haven't
looked in detail).

A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
that's possible, the problem isn't solved by this patch.
Zhenhua Huang Dec. 24, 2024, 9:32 a.m. UTC | #2
Thanks Catalin for review!
Merry Christmas.

On 2024/12/21 2:30, Catalin Marinas wrote:
> On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
>> Commit c1cc1552616d ("arm64: MMU initialisation")
>> optimizes the vmemmap to populate at the PMD section level. However, if
>> start or end is not aligned to a section boundary, such as when a
>> subsection is hot added, populating the entire section is wasteful. For
>> instance, if only one subsection hot-added, the entire section's struct
>> page metadata will still be populated.In such cases, it is more effective
>> to populate at page granularity.
> 
> OK, so from the vmemmap perspective, we waste up to 2MB memory that has
> been allocated even if a 2MB hot-plugged subsection required only 32KB
> of struct page. I don't mind this much really. I hope all those
> subsections are not scattered around to amplify this waste.
> 
>> This change also addresses mismatch issues during vmemmap_free(): When
>> pmd_sect() is true, the entire PMD section is cleared, even if there is
>> other effective subsection. For example, pagemap1 and pagemap2 are part
>> of a single PMD entry and they are hot-added sequentially. Then pagemap1
>> is removed, vmemmap_free() will clear the entire PMD entry, freeing the
>> struct page metadata for the whole section, even though pagemap2 is still
>> active.
> 
> I think that's the bigger issue. We can't unplug a subsection only.
> Looking at unmap_hotplug_pmd_range(), it frees a 2MB vmemmap section but
> that may hold struct page for the equivalent of 128MB of memory. So any
> struct page accesses for the other subsections will fault.

Exactly! That's what the patch aims to address.

> 
>> Fixes: c1cc1552616d ("arm64: MMU initialisation")
> 
> I wouldn't add a fix for the first commit adding arm64 support, we did
> not even have memory hotplug at the time (added later in 5.7 by commit
> bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
> been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
> sub-section hotplug"). That commit broke some arm64 assumptions.

Shall we add ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") 
because it broke arm64 assumptions ?

> 
>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
>> ---
>>   arch/arm64/mm/mmu.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index e2739b69e11b..fd59ee44960e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>   {
>>   	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>   
>> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>> +	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
>> +	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
>>   		return vmemmap_populate_basepages(start, end, node, altmap);
>>   	else
>>   		return vmemmap_populate_hugepages(start, end, node, altmap);
> 
> An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
> nuking the whole vmemmap pmd section if it's not empty. Not sure how
> easy that is, whether we have the necessary information (I haven't
> looked in detail).
> 
> A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
> that's possible, the problem isn't solved by this patch.

Indeed, seems there is no guarantee that plug size must be equal to 
unplug size...

I have two ideas:
1. Completely disable this PMD mapping optimization since there is no 
guarantee we must align 128M memory for hotplug ..

2. If we want to take this optimization.
I propose adding one argument to vmemmap_free to indicate if the entire 
section is freed(based on subsection map). Vmemmap_free is a common 
function and might affect other architectures... The process would be:
vmemmap_free
	unmap_hotplug_range //In unmap_hotplug_pmd_range() as you mentioned:if 
whole section is freed, proceed as usual. Otherwise, *just clear out 
struct page content but do not free*.
	free_empty_tables // will be called only if entire section is freed

On the populate side,
else if (vmemmap_check_pmd(pmd, node, addr, next)) //implement this function
	continue;	//Buffer still exists, just abort..

Could you please comment further whether #2 is feasible ?

>
Catalin Marinas Dec. 24, 2024, 2:09 p.m. UTC | #3
On Tue, Dec 24, 2024 at 05:32:06PM +0800, Zhenhua Huang wrote:
> Thanks Catalin for review!
> Merry Christmas.

Merry Christmas to you too!

> On 2024/12/21 2:30, Catalin Marinas wrote:
> > On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
> > > Fixes: c1cc1552616d ("arm64: MMU initialisation")
> > 
> > I wouldn't add a fix for the first commit adding arm64 support, we did
> > not even have memory hotplug at the time (added later in 5.7 by commit
> > bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
> > been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
> > sub-section hotplug"). That commit broke some arm64 assumptions.
> 
> Shall we add ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> because it broke arm64 assumptions ?

Yes, I think that would be better. And a cc stable to 5.4 (the above
commit appeared in 5.3).

> > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > index e2739b69e11b..fd59ee44960e 100644
> > > --- a/arch/arm64/mm/mmu.c
> > > +++ b/arch/arm64/mm/mmu.c
> > > @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> > >   {
> > >   	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
> > > -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
> > > +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
> > > +	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
> > > +	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
> > >   		return vmemmap_populate_basepages(start, end, node, altmap);
> > >   	else
> > >   		return vmemmap_populate_hugepages(start, end, node, altmap);
> > 
> > An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
> > nuking the whole vmemmap pmd section if it's not empty. Not sure how
> > easy that is, whether we have the necessary information (I haven't
> > looked in detail).
> > 
> > A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
> > that's possible, the problem isn't solved by this patch.
> 
> Indeed, seems there is no guarantee that plug size must be equal to unplug
> size...
> 
> I have two ideas:
> 1. Completely disable this PMD mapping optimization since there is no
> guarantee we must align 128M memory for hotplug ..

I'd be in favour of this, at least if CONFIG_MEMORY_HOTPLUG is enabled.
I think the only advantage here is that we don't allocate a full 2MB
block for vmemmap when only plugging in a sub-section.

> 2. If we want to take this optimization.
> I propose adding one argument to vmemmap_free to indicate if the entire
> section is freed(based on subsection map). Vmemmap_free is a common function
> and might affect other architectures... The process would be:
> vmemmap_free
> 	unmap_hotplug_range //In unmap_hotplug_pmd_range() as you mentioned:if
> whole section is freed, proceed as usual. Otherwise, *just clear out struct
> page content but do not free*.
> 	free_empty_tables // will be called only if entire section is freed
> 
> On the populate side,
> else if (vmemmap_check_pmd(pmd, node, addr, next)) //implement this function
> 	continue;	//Buffer still exists, just abort..
> 
> Could you please comment further whether #2 is feasible ?

vmemmap_free() already gets start/end, so it could at least check the
alignment and avoid freeing if it's not unplugging a full section. It
does leave a 2MB vmemmap block in place when freeing the last subsection
but it's safer than freeing valid struct page entries. In addition, it
could query the memory hotplug state with something like
find_memory_block() and figure out whether the section is empty.

Anyway, I'll be off until the new year, maybe I get other ideas by then.
Zhenhua Huang Dec. 25, 2024, 9:59 a.m. UTC | #4
On 2024/12/24 22:09, Catalin Marinas wrote:
> On Tue, Dec 24, 2024 at 05:32:06PM +0800, Zhenhua Huang wrote:
>> Thanks Catalin for review!
>> Merry Christmas.
> 
> Merry Christmas to you too!
> 
>> On 2024/12/21 2:30, Catalin Marinas wrote:
>>> On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
>>>> Fixes: c1cc1552616d ("arm64: MMU initialisation")
>>>
>>> I wouldn't add a fix for the first commit adding arm64 support, we did
>>> not even have memory hotplug at the time (added later in 5.7 by commit
>>> bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
>>> been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
>>> sub-section hotplug"). That commit broke some arm64 assumptions.
>>
>> Shall we add ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> because it broke arm64 assumptions ?
> 
> Yes, I think that would be better. And a cc stable to 5.4 (the above
> commit appeared in 5.3).

Got it, Thanks.

> 
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index e2739b69e11b..fd59ee44960e 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>    {
>>>>    	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>>>> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>>>> +	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
>>>> +	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
>>>>    		return vmemmap_populate_basepages(start, end, node, altmap);
>>>>    	else
>>>>    		return vmemmap_populate_hugepages(start, end, node, altmap);
>>>
>>> An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
>>> nuking the whole vmemmap pmd section if it's not empty. Not sure how
>>> easy that is, whether we have the necessary information (I haven't
>>> looked in detail).
>>>
>>> A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
>>> that's possible, the problem isn't solved by this patch.
>>
>> Indeed, seems there is no guarantee that plug size must be equal to unplug
>> size...
>>
>> I have two ideas:
>> 1. Completely disable this PMD mapping optimization since there is no
>> guarantee we must align 128M memory for hotplug ..
> 
> I'd be in favour of this, at least if CONFIG_MEMORY_HOTPLUG is enabled.
> I think the only advantage here is that we don't allocate a full 2MB
> block for vmemmap when only plugging in a sub-section.

Yeah..
In my opinion, w/o subsection hotplugging support, it is definitely 
beneficial. However, w/ subsection hotplugging support, it may lead to 
memory overhead and necessitate special logic in codes since we always 
use a full 2MB block..

> 
>> 2. If we want to take this optimization.
>> I propose adding one argument to vmemmap_free to indicate if the entire
>> section is freed(based on subsection map). Vmemmap_free is a common function
>> and might affect other architectures... The process would be:
>> vmemmap_free
>> 	unmap_hotplug_range //In unmap_hotplug_pmd_range() as you mentioned:if
>> whole section is freed, proceed as usual. Otherwise, *just clear out struct
>> page content but do not free*.
>> 	free_empty_tables // will be called only if entire section is freed
>>
>> On the populate side,
>> else if (vmemmap_check_pmd(pmd, node, addr, next)) //implement this function
>> 	continue;	//Buffer still exists, just abort..
>>
>> Could you please comment further whether #2 is feasible ?
> 
> vmemmap_free() already gets start/end, so it could at least check the
> alignment and avoid freeing if it's not unplugging a full section. It
> does leave a 2MB vmemmap block in place when freeing the last subsection
> but it's safer than freeing valid struct page entries. In addition, it
> could query the memory hotplug state with something like
> find_memory_block() and figure out whether the section is empty.
> 

Do you mean that we need not clear struct page entries of subsection 
until the entire section fully unplugged ? That seems feasible.

BTW, You're right, I went through codes again, only export 
is_subsection_map_empty() for query is another option..
	page_to_pfn() to get pfn
	__nr_to_section() to get mem_section
	last call is_subsection_map_empty() we can get subsection hotplug 
status per section
w/ this approach, we need not to do changes for func vmemmap_free

> Anyway, I'll be off until the new year, maybe I get other ideas by then.
> 

Sure, Happy Holiday! I will prepare both of patches and wait for your 
further comments :
diff mbox series

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e2739b69e11b..fd59ee44960e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,7 +1177,9 @@  int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 {
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
-	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
+	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
 		return vmemmap_populate_basepages(start, end, node, altmap);
 	else
 		return vmemmap_populate_hugepages(start, end, node, altmap);