diff mbox

[RESEND,RFC,2/2] gfp: use the best near online node if the target node is offline

Message ID 1429869513-2906-2-git-send-email-guz.fnst@cn.fujitsu.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Gu Zheng April 24, 2015, 9:58 a.m. UTC
Since the change to the cpu <--> mapping (map the cpu to the physical
node for all possible at the boot), the node of cpu may be not present,
so we use the best near online node if the node is not present in the low
level allocation APIs.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
 include/linux/gfp.h |   28 +++++++++++++++++++++++++++-
 1 files changed, 27 insertions(+), 1 deletions(-)

Comments

Andrew Morton April 24, 2015, 8:01 p.m. UTC | #1
On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:

> Since the change to the cpu <--> mapping (map the cpu to the physical
> node for all possible at the boot), the node of cpu may be not present,
> so we use the best near online node if the node is not present in the low
> level allocation APIs.
> 
> ...
>
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -298,9 +298,31 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>  }
>  
> +static int find_near_online_node(int node)
> +{
> +	int n, val;
> +	int min_val = INT_MAX;
> +	int best_node = -1;
> +
> +	for_each_online_node(n) {
> +		val = node_distance(node, n);
> +
> +		if (val < min_val) {
> +			min_val = val;
> +			best_node = n;
> +		}
> +	}
> +
> +	return best_node;
> +}

This should be `inline' if it's in a header file.

But it is far too large to be inlined anyway - please move it to a .c file.

And please document it.  A critical thing to describe is how we
determine whether a node is "near".  There are presumably multiple ways
in which we could decide that a node is "near" (number of hops, minimum
latency, ...).  Which one did you choose, and why?

>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> +	/* Offline node, use the best near online node */
> +	if (!node_online(nid))
> +		nid = find_near_online_node(nid);
> +
>  	/* Unknown node is current node */
>  	if (nid < 0)
>  		nid = numa_node_id();
> @@ -311,7 +333,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> +	/* Offline node, use the best near online node */
> +	if (!node_online(nid))
> +		nid = find_near_online_node(nid);
> +
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>  
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }

Ouch.  These functions are called very frequently, and adding overhead
to them is a big deal.  And the patch even adds overhead to non-x86
architectures which don't benefit from it!

Is there no way this problem can be fixed somewhere else?  Preferably
by fixing things up at hotplug time.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
KAMEZAWA Hiroyuki April 27, 2015, 9:44 a.m. UTC | #2
On 2015/04/25 5:01, Andrew Morton wrote:
> On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>
>> Since the change to the cpu <--> mapping (map the cpu to the physical
>> node for all possible at the boot), the node of cpu may be not present,
>> so we use the best near online node if the node is not present in the low
>> level allocation APIs.
>>
>> ...
>>
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -298,9 +298,31 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>>   	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>>   }
>>
>> +static int find_near_online_node(int node)
>> +{
>> +	int n, val;
>> +	int min_val = INT_MAX;
>> +	int best_node = -1;
>> +
>> +	for_each_online_node(n) {
>> +		val = node_distance(node, n);
>> +
>> +		if (val < min_val) {
>> +			min_val = val;
>> +			best_node = n;
>> +		}
>> +	}
>> +
>> +	return best_node;
>> +}
>
> This should be `inline' if it's in a header file.
>
> But it is far too large to be inlined anyway - please move it to a .c file.
>
> And please document it.  A critical thing to describe is how we
> determine whether a node is "near".  There are presumably multiple ways
> in which we could decide that a node is "near" (number of hops, minimum
> latency, ...).  Which one did you choose, and why?
>
>>   static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>   						unsigned int order)
>>   {
>> +	/* Offline node, use the best near online node */
>> +	if (!node_online(nid))
>> +		nid = find_near_online_node(nid);
>> +
>>   	/* Unknown node is current node */
>>   	if (nid < 0)
>>   		nid = numa_node_id();
>> @@ -311,7 +333,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>   static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>   						unsigned int order)
>>   {
>> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>> +	/* Offline node, use the best near online node */
>> +	if (!node_online(nid))
>> +		nid = find_near_online_node(nid);

In above VM_BUG_ON(), !node_online(nid) is the bug.

>> +
>> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>
>>   	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>   }
>
> Ouch.  These functions are called very frequently, and adding overhead
> to them is a big deal.  And the patch even adds overhead to non-x86
> architectures which don't benefit from it!
>
> Is there no way this problem can be fixed somewhere else?  Preferably
> by fixing things up at hotplug time.

I agree. the results should be cached. If necessary, in per-cpu line.


Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gu Zheng April 27, 2015, 10:19 a.m. UTC | #3
Hi Andrew,

On 04/25/2015 04:01 AM, Andrew Morton wrote:

> On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> 
>> Since the change to the cpu <--> mapping (map the cpu to the physical
>> node for all possible at the boot), the node of cpu may be not present,
>> so we use the best near online node if the node is not present in the low
>> level allocation APIs.
>>
>> ...
>>
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -298,9 +298,31 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>>  	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>>  }
>>  
>> +static int find_near_online_node(int node)
>> +{
>> +	int n, val;
>> +	int min_val = INT_MAX;
>> +	int best_node = -1;
>> +
>> +	for_each_online_node(n) {
>> +		val = node_distance(node, n);
>> +
>> +		if (val < min_val) {
>> +			min_val = val;
>> +			best_node = n;
>> +		}
>> +	}
>> +
>> +	return best_node;
>> +}
> 
> This should be `inline' if it's in a header file.
> 
> But it is far too large to be inlined anyway - please move it to a .c file.

Agree.

> 
> And please document it.  A critical thing to describe is how we
> determine whether a node is "near".  There are presumably multiple ways
> in which we could decide that a node is "near" (number of hops, minimum
> latency, ...).  Which one did you choose, and why?

It just reuse the dropped code in PATCH 1/2, based on the node_distance table,
which is a arch special defined one, and the data mostly comes from the
firmware info, e.g. SLIT table.

> 
>>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>> +	/* Offline node, use the best near online node */
>> +	if (!node_online(nid))
>> +		nid = find_near_online_node(nid);
>> +
>>  	/* Unknown node is current node */
>>  	if (nid < 0)
>>  		nid = numa_node_id();
>> @@ -311,7 +333,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>  						unsigned int order)
>>  {
>> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>> +	/* Offline node, use the best near online node */
>> +	if (!node_online(nid))
>> +		nid = find_near_online_node(nid);
>> +
>> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>  
>>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>  }
> 
> Ouch.  These functions are called very frequently, and adding overhead
> to them is a big deal.  And the patch even adds overhead to non-x86
> architectures which don't benefit from it!
> 
> Is there no way this problem can be fixed somewhere else?  Preferably
> by fixing things up at hotplug time.

As Kame suggested, maintaining a per-cpu cache about the alternative-node
only for x86 arch seems a good choice.

Regards,
Gu

> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gu Zheng April 27, 2015, 10:26 a.m. UTC | #4
Hi Kame-san,

On 04/27/2015 05:44 PM, Kamezawa Hiroyuki wrote:

> On 2015/04/25 5:01, Andrew Morton wrote:
>> On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>
>>> Since the change to the cpu <--> mapping (map the cpu to the physical
>>> node for all possible at the boot), the node of cpu may be not present,
>>> so we use the best near online node if the node is not present in the low
>>> level allocation APIs.
>>>
>>> ...
>>>
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -298,9 +298,31 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
>>>       return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
>>>   }
>>>
>>> +static int find_near_online_node(int node)
>>> +{
>>> +    int n, val;
>>> +    int min_val = INT_MAX;
>>> +    int best_node = -1;
>>> +
>>> +    for_each_online_node(n) {
>>> +        val = node_distance(node, n);
>>> +
>>> +        if (val < min_val) {
>>> +            min_val = val;
>>> +            best_node = n;
>>> +        }
>>> +    }
>>> +
>>> +    return best_node;
>>> +}
>>
>> This should be `inline' if it's in a header file.
>>
>> But it is far too large to be inlined anyway - please move it to a .c file.
>>
>> And please document it.  A critical thing to describe is how we
>> determine whether a node is "near".  There are presumably multiple ways
>> in which we could decide that a node is "near" (number of hops, minimum
>> latency, ...).  Which one did you choose, and why?
>>
>>>   static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>>                           unsigned int order)
>>>   {
>>> +    /* Offline node, use the best near online node */
>>> +    if (!node_online(nid))
>>> +        nid = find_near_online_node(nid);
>>> +
>>>       /* Unknown node is current node */
>>>       if (nid < 0)
>>>           nid = numa_node_id();
>>> @@ -311,7 +333,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>>>   static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>>>                           unsigned int order)
>>>   {
>>> -    VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>> +    /* Offline node, use the best near online node */
>>> +    if (!node_online(nid))
>>> +        nid = find_near_online_node(nid);
> 
> In above VM_BUG_ON(), !node_online(nid) is the bug.


But it will be possible here with the change in PATCH 1/2.

> 
>>> +
>>> +    VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>>
>>>       return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>>>   }
>>
>> Ouch.  These functions are called very frequently, and adding overhead
>> to them is a big deal.  And the patch even adds overhead to non-x86
>> architectures which don't benefit from it!
>>
>> Is there no way this problem can be fixed somewhere else?  Preferably
>> by fixing things up at hotplug time.
> 
> I agree. the results should be cached. If necessary, in per-cpu line.

Sounds great, will try this way.

Regards,
Gu

> 
> 
> Thanks,
> -Kame
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 97a9373..19684a8 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -298,9 +298,31 @@  __alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
 }
 
+static int find_near_online_node(int node)
+{
+	int n, val;
+	int min_val = INT_MAX;
+	int best_node = -1;
+
+	for_each_online_node(n) {
+		val = node_distance(node, n);
+
+		if (val < min_val) {
+			min_val = val;
+			best_node = n;
+		}
+	}
+
+	return best_node;
+}
+
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
+	/* Offline node, use the best near online node */
+	if (!node_online(nid))
+		nid = find_near_online_node(nid);
+
 	/* Unknown node is current node */
 	if (nid < 0)
 		nid = numa_node_id();
@@ -311,7 +333,11 @@  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
-	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+	/* Offline node, use the best near online node */
+	if (!node_online(nid))
+		nid = find_near_online_node(nid);
+
+	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
 
 	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
 }