diff mbox series

mm, memory_hotplug: update pcp lists everytime onlining a memory block

Message ID 1596372896-15336-1-git-send-email-charante@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series mm, memory_hotplug: update pcp lists everytime onlining a memory block | expand

Commit Message

Charan Teja Kalla Aug. 2, 2020, 12:54 p.m. UTC
When onlining a first memory block in a zone, pcp lists are not updated
thus pcp struct will have the default setting of ->high = 0,->batch = 1.
This means till the second memory block in a zone(if it have) is onlined
the pcp lists of this zone will not contain any pages because pcp's
->count is always greater than ->high thus free_pcppages_bulk() is
called to free batch size(=1) pages every time system wants to add a
page to the pcp list through free_unref_page(). To put this in a word,
system is not using benefits offered by the pcp lists when there is a
single onlineable memory block in a zone. Correct this by always
updating the pcp lists when memory block is onlined.

Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
---
 mm/memory_hotplug.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

David Hildenbrand Aug. 3, 2020, 8:05 a.m. UTC | #1
On 02.08.20 14:54, Charan Teja Reddy wrote:
> When onlining a first memory block in a zone, pcp lists are not updated
> thus pcp struct will have the default setting of ->high = 0,->batch = 1.
> This means till the second memory block in a zone(if it have) is onlined
> the pcp lists of this zone will not contain any pages because pcp's
> ->count is always greater than ->high thus free_pcppages_bulk() is
> called to free batch size(=1) pages every time system wants to add a
> page to the pcp list through free_unref_page(). To put this in a word,
> system is not using benefits offered by the pcp lists when there is a
> single onlineable memory block in a zone. Correct this by always
> updating the pcp lists when memory block is onlined.

I guess such setups are rare ... but I can imagine it being the case
with virtio-mem in the future ... not 100% if this performance
optimization is really relevant in practice ... how did you identify this?

> 
> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
> ---
>  mm/memory_hotplug.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index dcdf327..7f62d69 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -854,8 +854,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  	node_states_set_node(nid, &arg);
>  	if (need_zonelists_rebuild)
>  		build_all_zonelists(NULL);
> -	else
> -		zone_pcp_update(zone);
> +	zone_pcp_update(zone);
>  
>  	init_per_zone_wmark_min();
>  
> 

Does, in general, look sane to me.

Reviewed-by: David Hildenbrand <david@redhat.com>
Charan Teja Kalla Aug. 3, 2020, 1:28 p.m. UTC | #2
Thanks David for the comments.

On 8/3/2020 1:35 PM, David Hildenbrand wrote:
> On 02.08.20 14:54, Charan Teja Reddy wrote:
>> When onlining a first memory block in a zone, pcp lists are not updated
>> thus pcp struct will have the default setting of ->high = 0,->batch = 1.
>> This means till the second memory block in a zone(if it have) is onlined
>> the pcp lists of this zone will not contain any pages because pcp's
>> ->count is always greater than ->high thus free_pcppages_bulk() is
>> called to free batch size(=1) pages every time system wants to add a
>> page to the pcp list through free_unref_page(). To put this in a word,
>> system is not using benefits offered by the pcp lists when there is a
>> single onlineable memory block in a zone. Correct this by always
>> updating the pcp lists when memory block is onlined.
> 
> I guess such setups are rare ... but I can imagine it being the case
> with virtio-mem in the future ... not 100% if this performance
> optimization is really relevant in practice ... how did you identify this?

Even the Snapdragon hardware that I had tested on contain multiple
onlineable memory blocks. But we have the use case in which we online
single memory block and once it is filled then online the next block. In
the step where single block is onlined, we observed the below pageset
params.
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
Once the second block is onlined then only seeing some sane values as below.
    cpu: 0
              count: 32
              high:  378
              batch: 63

In the above case, till the second block is onlined, no page is held in
the pcp list. So, updating the pcp params every time when onlining the
memory block is required, as an example in the usecase that I had
mentioned.

> 
>>
>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
>> ---
>>  mm/memory_hotplug.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index dcdf327..7f62d69 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -854,8 +854,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>>  	node_states_set_node(nid, &arg);
>>  	if (need_zonelists_rebuild)
>>  		build_all_zonelists(NULL);
>> -	else
>> -		zone_pcp_update(zone);
>> +	zone_pcp_update(zone);
>>  
>>  	init_per_zone_wmark_min();
>>  
>>
> 
> Does, in general, look sane to me.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks for the ACK.

>
Vlastimil Babka Aug. 3, 2020, 1:55 p.m. UTC | #3
On 8/2/20 2:54 PM, Charan Teja Reddy wrote:
> When onlining a first memory block in a zone, pcp lists are not updated
> thus pcp struct will have the default setting of ->high = 0,->batch = 1.
> This means till the second memory block in a zone(if it have) is onlined
> the pcp lists of this zone will not contain any pages because pcp's
> ->count is always greater than ->high thus free_pcppages_bulk() is
> called to free batch size(=1) pages every time system wants to add a
> page to the pcp list through free_unref_page(). To put this in a word,
> system is not using benefits offered by the pcp lists when there is a
> single onlineable memory block in a zone. Correct this by always
> updating the pcp lists when memory block is onlined.
> 
> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>

Makes sense to me.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memory_hotplug.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index dcdf327..7f62d69 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -854,8 +854,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  	node_states_set_node(nid, &arg);
>  	if (need_zonelists_rebuild)
>  		build_all_zonelists(NULL);
> -	else
> -		zone_pcp_update(zone);
> +	zone_pcp_update(zone);
>  
>  	init_per_zone_wmark_min();
>  
>
David Hildenbrand Aug. 3, 2020, 2 p.m. UTC | #4
On 03.08.20 15:28, Charan Teja Kalla wrote:
> Thanks David for the comments.
> 
> On 8/3/2020 1:35 PM, David Hildenbrand wrote:
>> On 02.08.20 14:54, Charan Teja Reddy wrote:
>>> When onlining a first memory block in a zone, pcp lists are not updated
>>> thus pcp struct will have the default setting of ->high = 0,->batch = 1.
>>> This means till the second memory block in a zone(if it have) is onlined
>>> the pcp lists of this zone will not contain any pages because pcp's
>>> ->count is always greater than ->high thus free_pcppages_bulk() is
>>> called to free batch size(=1) pages every time system wants to add a
>>> page to the pcp list through free_unref_page(). To put this in a word,
>>> system is not using benefits offered by the pcp lists when there is a
>>> single onlineable memory block in a zone. Correct this by always
>>> updating the pcp lists when memory block is onlined.
>>
>> I guess such setups are rare ... but I can imagine it being the case
>> with virtio-mem in the future ... not 100% if this performance
>> optimization is really relevant in practice ... how did you identify this?
> 
> Even the Snapdragon hardware that I had tested on contain multiple
> onlineable memory blocks. But we have the use case in which we online
> single memory block and once it is filled then online the next block. In
> the step where single block is onlined, we observed the below pageset

Out of interest, why? Is it to optimize energy consumption?
Michal Hocko Aug. 3, 2020, 3:46 p.m. UTC | #5
On Sun 02-08-20 18:24:56, Charan Teja Reddy wrote:
> When onlining a first memory block in a zone, pcp lists are not updated
> thus pcp struct will have the default setting of ->high = 0,->batch = 1.
> This means till the second memory block in a zone(if it have) is onlined
> the pcp lists of this zone will not contain any pages because pcp's
> ->count is always greater than ->high thus free_pcppages_bulk() is
> called to free batch size(=1) pages every time system wants to add a
> page to the pcp list through free_unref_page(). To put this in a word,
> system is not using benefits offered by the pcp lists when there is a
> single onlineable memory block in a zone. Correct this by always
> updating the pcp lists when memory block is onlined.

Yes this seems like an ancient bug
Fixes: 1f522509c77a ("mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset")

Just nobody has noticed because a single block memory zone is really
rare.
 
> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks

> ---
>  mm/memory_hotplug.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index dcdf327..7f62d69 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -854,8 +854,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  	node_states_set_node(nid, &arg);
>  	if (need_zonelists_rebuild)
>  		build_all_zonelists(NULL);
> -	else
> -		zone_pcp_update(zone);
> +	zone_pcp_update(zone);
>  
>  	init_per_zone_wmark_min();
>  
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
>
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index dcdf327..7f62d69 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -854,8 +854,7 @@  int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	node_states_set_node(nid, &arg);
 	if (need_zonelists_rebuild)
 		build_all_zonelists(NULL);
-	else
-		zone_pcp_update(zone);
+	zone_pcp_update(zone);
 
 	init_per_zone_wmark_min();