diff mbox series

mm/slub: skip node in case there is no slab to acquire

Message ID 20181108011204.9491-1-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/slub: skip node in case there is no slab to acquire | expand

Commit Message

Wei Yang Nov. 8, 2018, 1:12 a.m. UTC
for_each_zone_zonelist() iterates the zonelist one by one, which means
it will iterate on zones on the same node. While get_partial_node()
checks available slab on node base instead of zone.

This patch skip a node in case get_partial_node() fails to acquire slab
on that node.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/slub.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Andrew Morton Nov. 9, 2018, 8:48 p.m. UTC | #1
On Thu,  8 Nov 2018 09:12:04 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:

> for_each_zone_zonelist() iterates the zonelist one by one, which means
> it will iterate on zones on the same node. While get_partial_node()
> checks available slab on node base instead of zone.
> 
> This patch skip a node in case get_partial_node() fails to acquire slab
> on that node.

This is rather hard to follow.

I *think* the patch is a performance optimization: prevent
get_any_partial() from checking a node which get_partial_node() has
already looked at?

Could we please have a more complete changelog?

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1873,7 +1873,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
>   * Get a page from somewhere. Search in increasing NUMA distances.
>   */
>  static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
> -		struct kmem_cache_cpu *c)
> +		struct kmem_cache_cpu *c, int except)
>  {
>  #ifdef CONFIG_NUMA
>  	struct zonelist *zonelist;
> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
>  	enum zone_type high_zoneidx = gfp_zone(flags);
>  	void *object;
>  	unsigned int cpuset_mems_cookie;
> +	nodemask_t nmask = node_states[N_MEMORY];
> +
> +	node_clear(except, nmask);

And please add a comment describing what's happening here and why it is
done.  Adding a sentence to the block comment over get_any_partial()
would be suitable.
Wei Yang Nov. 9, 2018, 11:47 p.m. UTC | #2
On Fri, Nov 09, 2018 at 12:48:06PM -0800, Andrew Morton wrote:
>On Thu,  8 Nov 2018 09:12:04 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> for_each_zone_zonelist() iterates the zonelist one by one, which means
>> it will iterate on zones on the same node. While get_partial_node()
>> checks available slab on node base instead of zone.
>> 
>> This patch skip a node in case get_partial_node() fails to acquire slab
>> on that node.
>
>This is rather hard to follow.
>
>I *think* the patch is a performance optimization: prevent
>get_any_partial() from checking a node which get_partial_node() has
>already looked at?

You are right :-)

>
>Could we please have a more complete changelog?

Hmm... I would like to.

But I am not sure which part makes you hard to follow. If you would like
to tell me the pain point, I am glad to think about how to make it more
obvious.

>
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1873,7 +1873,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
>>   * Get a page from somewhere. Search in increasing NUMA distances.
>>   */
>>  static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
>> -		struct kmem_cache_cpu *c)
>> +		struct kmem_cache_cpu *c, int except)
>>  {
>>  #ifdef CONFIG_NUMA
>>  	struct zonelist *zonelist;
>> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
>>  	enum zone_type high_zoneidx = gfp_zone(flags);
>>  	void *object;
>>  	unsigned int cpuset_mems_cookie;
>> +	nodemask_t nmask = node_states[N_MEMORY];
>> +
>> +	node_clear(except, nmask);
>
>And please add a comment describing what's happening here and why it is
>done.  Adding a sentence to the block comment over get_any_partial()
>would be suitable.
>

Sure, I would address this in next spin.
Michal Hocko Nov. 13, 2018, 1:17 p.m. UTC | #3
On Thu 08-11-18 09:12:04, Wei Yang wrote:
> for_each_zone_zonelist() iterates the zonelist one by one, which means
> it will iterate on zones on the same node. While get_partial_node()
> checks available slab on node base instead of zone.
> 
> This patch skip a node in case get_partial_node() fails to acquire slab
> on that node.

If this is an optimization then it should be accompanied by some
numbers.

> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
>  	enum zone_type high_zoneidx = gfp_zone(flags);
>  	void *object;
>  	unsigned int cpuset_mems_cookie;
> +	nodemask_t nmask = node_states[N_MEMORY];

This will allocate a large bitmask on the stack and that is no-go for
something that might be called from a potentially deep call stack
already. Also are you sure that the micro-optimization offsets the
copying overhead?
Wei Yang Nov. 13, 2018, 1:26 p.m. UTC | #4
On Tue, Nov 13, 2018 at 02:17:51PM +0100, Michal Hocko wrote:
>On Thu 08-11-18 09:12:04, Wei Yang wrote:
>> for_each_zone_zonelist() iterates the zonelist one by one, which means
>> it will iterate on zones on the same node. While get_partial_node()
>> checks available slab on node base instead of zone.
>> 
>> This patch skip a node in case get_partial_node() fails to acquire slab
>> on that node.
>
>If this is an optimization then it should be accompanied by some
>numbers.

Let me try to get some test result.

Do you have some suggestion on the test suite? Is kernel build a proper
test?

>
>> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
>>  	enum zone_type high_zoneidx = gfp_zone(flags);
>>  	void *object;
>>  	unsigned int cpuset_mems_cookie;
>> +	nodemask_t nmask = node_states[N_MEMORY];
>
>This will allocate a large bitmask on the stack and that is no-go for
>something that might be called from a potentially deep call stack
>already. Also are you sure that the micro-optimization offsets the
>copying overhead?
>

You are right. I didn't pay attention to this.

I got one other idea to achieve this effect, like the one in
get_page_from_freelist().

In get_page_from_freelist(), we use last_pgdat_dirty_limit to track the
last node out of dirty limit. I am willing to borrow this idea in
get_any_partial() to skip a node.

Well, let me do some tests to see whether this is visible.

>-- 
>Michal Hocko
>SUSE Labs
Michal Hocko Nov. 13, 2018, 1:34 p.m. UTC | #5
On Tue 13-11-18 13:26:24, Wei Yang wrote:
> On Tue, Nov 13, 2018 at 02:17:51PM +0100, Michal Hocko wrote:
> >On Thu 08-11-18 09:12:04, Wei Yang wrote:
> >> for_each_zone_zonelist() iterates the zonelist one by one, which means
> >> it will iterate on zones on the same node. While get_partial_node()
> >> checks available slab on node base instead of zone.
> >> 
> >> This patch skip a node in case get_partial_node() fails to acquire slab
> >> on that node.
> >
> >If this is an optimization then it should be accompanied by some
> >numbers.
> 
> Let me try to get some test result.
> 
> Do you have some suggestion on the test suite? Is kernel build a proper
> test?

Make sure that the workload is hitting hard on this particular code path
that it matters. I am not aware of any such workload but others might
know better.

In any case, if you are up to optimize something then you should
evaluate what kind of workload might benefit from it. If there is no
workload then it is likely not worth bothering. Some changes might look
like obvious improvements but then they might add a maintenance burden
or they might be even wrong for other reasons. Recent patches you have
posted show both issues.

I would encourage you to look at a practical issues instead. Throwing
random patches by reading the code without having a larger picture is
usually not the best way to go.

[...]

> In get_page_from_freelist(), we use last_pgdat_dirty_limit to track the
> last node out of dirty limit. I am willing to borrow this idea in
> get_any_partial() to skip a node.
> 
> Well, let me do some tests to see whether this is visible.

See the above. Each and every change has its cost and patches make sense
only when both the future maintenance cost and the review cost are payed
off.
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index e3629cd7aff1..97a480b5dfb9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1873,7 +1873,7 @@  static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
  * Get a page from somewhere. Search in increasing NUMA distances.
  */
 static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
-		struct kmem_cache_cpu *c)
+		struct kmem_cache_cpu *c, int except)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
@@ -1882,6 +1882,9 @@  static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
 	enum zone_type high_zoneidx = gfp_zone(flags);
 	void *object;
 	unsigned int cpuset_mems_cookie;
+	nodemask_t nmask = node_states[N_MEMORY];
+
+	node_clear(except, nmask);
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -1908,7 +1911,8 @@  static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
 	do {
 		cpuset_mems_cookie = read_mems_allowed_begin();
 		zonelist = node_zonelist(mempolicy_slab_node(), flags);
-		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+		for_each_zone_zonelist_nodemask(zone, z, zonelist,
+						high_zoneidx, &nmask) {
 			struct kmem_cache_node *n;
 
 			n = get_node(s, zone_to_nid(zone));
@@ -1926,6 +1930,7 @@  static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
 					 */
 					return object;
 				}
+				node_clear(zone_to_nid(zone), nmask);
 			}
 		}
 	} while (read_mems_allowed_retry(cpuset_mems_cookie));
@@ -1951,7 +1956,7 @@  static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 	if (object || node != NUMA_NO_NODE)
 		return object;
 
-	return get_any_partial(s, flags, c);
+	return get_any_partial(s, flags, c, searchnode);
 }
 
 #ifdef CONFIG_PREEMPT