diff mbox series

[1/2] mm/page_alloc: add same penalty is enough to get round-robin order

Message ID 20220123013537.20491-1-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series [1/2] mm/page_alloc: add same penalty is enough to get round-robin order | expand

Commit Message

Wei Yang Jan. 23, 2022, 1:35 a.m. UTC
To make node order in round-robin in the same distance group, we add a
penalty to the first node we got in each round.

To get a round-robin order in the same distance group, we don't need to
decrease the penalty since:

  * find_next_best_node() always iterates node in the same order
  * distance matters more then penalty in find_next_best_node()
  * in nodes with the same distance, the first one would be picked up

So it is fine to increase same penalty when we get the first node in the
same distance group.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/page_alloc.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Andrew Morton March 21, 2022, 9:17 p.m. UTC | #1
Could someone please help review this series?

Thanks.
Wei Yang March 26, 2022, 3:48 p.m. UTC | #2
On Mon, Mar 21, 2022 at 02:17:23PM -0700, Andrew Morton wrote:
>Could someone please help review this series?
>

Andrew,

Sorry for my mistake, I forgot one element in my emulation code for Patch 2.
Current kernel could handle the situation described in commit log properly.

So Patch 2 is not necessary. Sorry for the disturbance.

>Thanks.
Vlastimil Babka April 5, 2022, 5:11 p.m. UTC | #3
On 1/23/22 02:35, Wei Yang wrote:
> To make node order in round-robin in the same distance group, we add a
> penalty to the first node we got in each round.
> 
> To get a round-robin order in the same distance group, we don't need to
> decrease the penalty since:
> 
>   * find_next_best_node() always iterates node in the same order
>   * distance matters more then penalty in find_next_best_node()
>   * in nodes with the same distance, the first one would be picked up
> 
> So it is fine to increase same penalty when we get the first node in the
> same distance group.

With that logic I'm not even sure if we need nr_online_nodes as penalty or
it could be just 1. Would you know?

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  mm/page_alloc.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..f27afd517652 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6245,13 +6245,12 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
>  static void build_zonelists(pg_data_t *pgdat)
>  {
>  	static int node_order[MAX_NUMNODES];
> -	int node, load, nr_nodes = 0;
> +	int node, nr_nodes = 0;
>  	nodemask_t used_mask = NODE_MASK_NONE;
>  	int local_node, prev_node;
>  
>  	/* NUMA-aware ordering of nodes */
>  	local_node = pgdat->node_id;
> -	load = nr_online_nodes;
>  	prev_node = local_node;
>  
>  	memset(node_order, 0, sizeof(node_order));
> @@ -6263,11 +6262,10 @@ static void build_zonelists(pg_data_t *pgdat)
>  		 */
>  		if (node_distance(local_node, node) !=
>  		    node_distance(local_node, prev_node))
> -			node_load[node] += load;
> +			node_load[node] += nr_online_nodes;
>  
>  		node_order[nr_nodes++] = node;
>  		prev_node = node;
> -		load--;
>  	}
>  
>  	build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
Wei Yang April 6, 2022, 11:47 p.m. UTC | #4
On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>On 1/23/22 02:35, Wei Yang wrote:
>> To make node order in round-robin in the same distance group, we add a
>> penalty to the first node we got in each round.
>> 
>> To get a round-robin order in the same distance group, we don't need to
>> decrease the penalty since:
>> 
>>   * find_next_best_node() always iterates node in the same order
>>   * distance matters more then penalty in find_next_best_node()
>>   * in nodes with the same distance, the first one would be picked up
>> 
>> So it is fine to increase same penalty when we get the first node in the
>> same distance group.
>
>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>it could be just 1. Would you know?

Yes, it has the same effect.

[    0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
[    0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
[    0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
[    0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
[    0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
[    0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
[    0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
[    0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2

Do you prefer to set it to 1?
Vlastimil Babka April 7, 2022, 9:53 a.m. UTC | #5
On 4/7/22 01:47, Wei Yang wrote:
> On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>>On 1/23/22 02:35, Wei Yang wrote:
>>> To make node order in round-robin in the same distance group, we add a
>>> penalty to the first node we got in each round.
>>> 
>>> To get a round-robin order in the same distance group, we don't need to
>>> decrease the penalty since:
>>> 
>>>   * find_next_best_node() always iterates node in the same order
>>>   * distance matters more then penalty in find_next_best_node()
>>>   * in nodes with the same distance, the first one would be picked up
>>> 
>>> So it is fine to increase same penalty when we get the first node in the
>>> same distance group.
>>
>>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>>it could be just 1. Would you know?
> 
> Yes, it has the same effect.

Good.

> [    0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
> [    0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
> [    0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
> [    0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
> [    0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
> [    0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
> [    0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
> [    0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2
> 
> Do you prefer to set it to 1?

Yeah I think it's worth simplyfing as much as feasible, so the code is more
obvious. I think we can also then remove the MAX_NODE_LOAD #define and usage.

Also please Cc at least Oscar and David (added to Cc now) on v2 as they have
been active in memory hotplug area recently.

Thanks,
Vlastimil
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..f27afd517652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6245,13 +6245,12 @@  static void build_thisnode_zonelists(pg_data_t *pgdat)
 static void build_zonelists(pg_data_t *pgdat)
 {
 	static int node_order[MAX_NUMNODES];
-	int node, load, nr_nodes = 0;
+	int node, nr_nodes = 0;
 	nodemask_t used_mask = NODE_MASK_NONE;
 	int local_node, prev_node;
 
 	/* NUMA-aware ordering of nodes */
 	local_node = pgdat->node_id;
-	load = nr_online_nodes;
 	prev_node = local_node;
 
 	memset(node_order, 0, sizeof(node_order));
@@ -6263,11 +6262,10 @@  static void build_zonelists(pg_data_t *pgdat)
 		 */
 		if (node_distance(local_node, node) !=
 		    node_distance(local_node, prev_node))
-			node_load[node] += load;
+			node_load[node] += nr_online_nodes;
 
 		node_order[nr_nodes++] = node;
 		prev_node = node;
-		load--;
 	}
 
 	build_zonelists_in_node_order(pgdat, node_order, nr_nodes);