diff mbox series

[RFC,1/5] mm, page_alloc: clean up pageset high and batch update

Message ID 20200907163628.26495-2-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series disable pcplists during page isolation | expand

Commit Message

Vlastimil Babka Sept. 7, 2020, 4:36 p.m. UTC
The updates to pcplists' high and batch valued are handled by multiple
functions that make the calculations hard to follow. Consolidate everything
to pageset_set_high_and_batch() and remove pageset_set_batch() and
pageset_set_high() wrappers.

The only special case using one of the removed wrappers was:
build_all_zonelists_init()
  setup_pageset()
    pageset_set_batch()
which was hardcoding batch as 0, so we can just open-code a call to
pageset_update() with constant parameters instead.

No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 51 +++++++++++++++++++------------------------------
 1 file changed, 20 insertions(+), 31 deletions(-)

Comments

Oscar Salvador Sept. 10, 2020, 8:31 a.m. UTC | #1
On Mon, Sep 07, 2020 at 06:36:24PM +0200, Vlastimil Babka wrote:
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

>  	for_each_possible_cpu(cpu)
> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> +		setup_pageset(&per_cpu(boot_pageset, cpu));

This is not really anything important but I realized we have like 7 functions
messing with pcp lists, and everytime I try to follow them my head spins.

Since setup_pageset is only being called here, could we replace it by the
pageset_init and pageset_update?

(As I said, not important and probably a matter of taste. I just think that
having so many mini functions around is not always cool,
e.g: setup_zone_pageset->zone_pageset_init)

> -/*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> - */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> -{
> -	unsigned long batch = max(1UL, high / 4);
> -	if ((high / 4) > (PAGE_SHIFT * 8))
> -		batch = PAGE_SHIFT * 8;
> -
> -	pageset_update(&p->pcp, high, batch);
> +	pageset_update(&p->pcp, 0, 1);
>  }

Could we restore the comment we had in pageset_set_high, and maybe
update it to match this new function? I think it would be useful.
>  
>  static void pageset_set_high_and_batch(struct zone *zone,
> -				       struct per_cpu_pageset *pcp)
> +				       struct per_cpu_pageset *p)
>  {
> -	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> -			(zone_managed_pages(zone) /
> -				percpu_pagelist_fraction));
> -	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> +	unsigned long new_high;
> +	unsigned long new_batch;
> +	int fraction = READ_ONCE(percpu_pagelist_fraction);

Why the READ_ONCE? In case there is a parallel update so things to get
messed up?

as I said, I'd appreciate a comment in pageset_set_high_and_batch to be
restored and updated, otherwise:

Reviewed-by: Oscar Salvador <osalvador@suse.de> 

Thanks
Oscar Salvador Sept. 10, 2020, 8:34 a.m. UTC | #2
On Thu, Sep 10, 2020 at 10:31:20AM +0200, Oscar Salvador wrote:
> On Mon, Sep 07, 2020 at 06:36:24PM +0200, Vlastimil Babka wrote:
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> >  	for_each_possible_cpu(cpu)
> > -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> > +		setup_pageset(&per_cpu(boot_pageset, cpu));
> 
> This is not really anything important but I realized we have like 7 functions
> messing with pcp lists, and everytime I try to follow them my head spins.
> 
> Since setup_pageset is only being called here, could we replace it by the
> pageset_init and pageset_update?
> 
> (As I said, not important and probably a matter of taste. I just think that
> having so many mini functions around is not always cool,
> e.g: setup_zone_pageset->zone_pageset_init)

Sorry, I did not see that you just did that in Patch#3, bleh.
David Hildenbrand Sept. 10, 2020, 10:47 a.m. UTC | #3
On 10.09.20 10:31, Oscar Salvador wrote:
> On Mon, Sep 07, 2020 at 06:36:24PM +0200, Vlastimil Babka wrote:
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
>>  	for_each_possible_cpu(cpu)
>> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
>> +		setup_pageset(&per_cpu(boot_pageset, cpu));
> 
> This is not really anything important but I realized we have like 7 functions
> messing with pcp lists, and everytime I try to follow them my head spins.
> 
> Since setup_pageset is only being called here, could we replace it by the
> pageset_init and pageset_update?

Had the same thought, so +1.

>> -/*
>> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
>> - * to the value high for the pageset p.
>> - */
>> -static void pageset_set_high(struct per_cpu_pageset *p,
>> -				unsigned long high)
>> -{
>> -	unsigned long batch = max(1UL, high / 4);
>> -	if ((high / 4) > (PAGE_SHIFT * 8))
>> -		batch = PAGE_SHIFT * 8;
>> -
>> -	pageset_update(&p->pcp, high, batch);
>> +	pageset_update(&p->pcp, 0, 1);
>>  }
> 
> Could we restore the comment we had in pageset_set_high, and maybe
> update it to match this new function? I think it would be useful.

At least I didn't really understand what "pageset_set_high() sets the
high water mark for hot per_cpu_pagelist to the value high for the
pageset p." was trying to tell me.

I think the only valuable information is the "hot", meaning it is in use
and we have to be careful when updating, right?

>>  
>>  static void pageset_set_high_and_batch(struct zone *zone,
>> -				       struct per_cpu_pageset *pcp)
>> +				       struct per_cpu_pageset *p)
>>  {
>> -	if (percpu_pagelist_fraction)
>> -		pageset_set_high(pcp,
>> -			(zone_managed_pages(zone) /
>> -				percpu_pagelist_fraction));
>> -	else
>> -		pageset_set_batch(pcp, zone_batchsize(zone));
>> +	unsigned long new_high;
>> +	unsigned long new_batch;
>> +	int fraction = READ_ONCE(percpu_pagelist_fraction);
> 
> Why the READ_ONCE? In case there is a parallel update so things to get
> messed up?

Agreed, this is an actual change in the code. If this is a fix, separate
patch?

Apart from that, looks much better to me!
Vlastimil Babka Sept. 17, 2020, 4:05 p.m. UTC | #4
On 9/10/20 10:31 AM, Oscar Salvador wrote:
> On Mon, Sep 07, 2020 at 06:36:24PM +0200, Vlastimil Babka wrote:
> 
>> -/*
>> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
>> - * to the value high for the pageset p.
>> - */
>> -static void pageset_set_high(struct per_cpu_pageset *p,
>> -				unsigned long high)
>> -{
>> -	unsigned long batch = max(1UL, high / 4);
>> -	if ((high / 4) > (PAGE_SHIFT * 8))
>> -		batch = PAGE_SHIFT * 8;
>> -
>> -	pageset_update(&p->pcp, high, batch);
>> +	pageset_update(&p->pcp, 0, 1);
>>  }
> 
> Could we restore the comment we had in pageset_set_high, and maybe
> update it to match this new function? I think it would be useful.

Same as David, I didn't find the comment useful at all. But I can try writing a
better one instead.

>>  
>>  static void pageset_set_high_and_batch(struct zone *zone,
>> -				       struct per_cpu_pageset *pcp)
>> +				       struct per_cpu_pageset *p)
>>  {
>> -	if (percpu_pagelist_fraction)
>> -		pageset_set_high(pcp,
>> -			(zone_managed_pages(zone) /
>> -				percpu_pagelist_fraction));
>> -	else
>> -		pageset_set_batch(pcp, zone_batchsize(zone));
>> +	unsigned long new_high;
>> +	unsigned long new_batch;
>> +	int fraction = READ_ONCE(percpu_pagelist_fraction);
> 
> Why the READ_ONCE? In case there is a parallel update so things to get
> messed up?

Yeah, I think online of a new zone can race with sysctl update to
percpu_pagelist_fraction, because online doesn't take pcp_batch_high_lock...
until patch 5. But you're right this should be separate.

> as I said, I'd appreciate a comment in pageset_set_high_and_batch to be
> restored and updated, otherwise:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de> 

Thanks!

> Thanks
>
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..0b516208afda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5834,7 +5834,7 @@  static void build_zonelists(pg_data_t *pgdat)
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_pageset(struct per_cpu_pageset *p);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
 
@@ -5902,7 +5902,7 @@  build_all_zonelists_init(void)
 	 * (a chicken-egg dilemma).
 	 */
 	for_each_possible_cpu(cpu)
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+		setup_pageset(&per_cpu(boot_pageset, cpu));
 
 	mminit_verify_zonelist();
 	cpuset_init_current_mems_allowed();
@@ -6218,12 +6218,6 @@  static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
 	pcp->batch = batch;
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
-{
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
-}
-
 static void pageset_init(struct per_cpu_pageset *p)
 {
 	struct per_cpu_pages *pcp;
@@ -6236,35 +6230,30 @@  static void pageset_init(struct per_cpu_pageset *p)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+static void setup_pageset(struct per_cpu_pageset *p)
 {
 	pageset_init(p);
-	pageset_set_batch(p, batch);
-}
-
-/*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
-		batch = PAGE_SHIFT * 8;
-
-	pageset_update(&p->pcp, high, batch);
+	pageset_update(&p->pcp, 0, 1);
 }
 
 static void pageset_set_high_and_batch(struct zone *zone,
-				       struct per_cpu_pageset *pcp)
+				       struct per_cpu_pageset *p)
 {
-	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
-			(zone_managed_pages(zone) /
-				percpu_pagelist_fraction));
-	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
+	unsigned long new_high;
+	unsigned long new_batch;
+	int fraction = READ_ONCE(percpu_pagelist_fraction);
+
+	if (fraction) {
+		new_high = zone_managed_pages(zone) / fraction;
+		new_batch = max(1UL, new_high / 4);
+		if ((new_high / 4) > (PAGE_SHIFT * 8))
+			new_batch = PAGE_SHIFT * 8;
+	} else {
+		new_batch = zone_batchsize(zone);
+		new_high = 6 * new_batch;
+		new_batch = max(1UL, 1 * new_batch);
+	}
+	pageset_update(&p->pcp, new_high, new_batch);
 }
 
 static void __meminit zone_pageset_init(struct zone *zone, int cpu)