diff mbox series

[v2,2/6] slub: Use alloc_pages_node() in alloc_slab_page()

Message ID 20231228085748.1083901-3-willy@infradead.org (mailing list archive)
State New
Headers show
Series Remove some lruvec page accounting functions | expand

Commit Message

Matthew Wilcox Dec. 28, 2023, 8:57 a.m. UTC
For no apparent reason, we were open-coding alloc_pages_node() in
this function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/slub.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

David Rientjes Dec. 28, 2023, 8:37 p.m. UTC | #1
On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote:

> For no apparent reason, we were open-coding alloc_pages_node() in
> this function.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: David Rientjes <rientjes@google.com>
Vlastimil Babka Jan. 2, 2024, 11:06 a.m. UTC | #2
On 12/28/23 09:57, Matthew Wilcox (Oracle) wrote:
> For no apparent reason, we were open-coding alloc_pages_node() in
> this function.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 35aa706dc318..342545775df6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
>  	struct slab *slab;
>  	unsigned int order = oo_order(oo);
>  
> -	if (node == NUMA_NO_NODE)
> -		folio = (struct folio *)alloc_pages(flags, order);
> -	else
> -		folio = (struct folio *)__alloc_pages_node(node, flags, order);
> -
> +	folio = (struct folio *)alloc_pages_node(node, flags, order);
>  	if (!folio)
>  		return NULL;
>
Christoph Lameter (Ampere) July 9, 2024, 5:12 p.m. UTC | #3
On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote:

> For no apparent reason, we were open-coding alloc_pages_node() in
> this function.

The reason is that alloc_pages() follow memory policies, cgroup restrictions 
etc etc and alloc_pages_node does not.

With this patch cgroup restrictions memory policies etc etc no longer work 
in the slab allocator.

Please revert this patch.

> diff --git a/mm/slub.c b/mm/slub.c
> index 35aa706dc318..342545775df6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
> 	struct slab *slab;
> 	unsigned int order = oo_order(oo);
>
> -	if (node == NUMA_NO_NODE)
> -		folio = (struct folio *)alloc_pages(flags, order);
> -	else
> -		folio = (struct folio *)__alloc_pages_node(node, flags, order);
> -
> +	folio = (struct folio *)alloc_pages_node(node, flags, order);
> 	if (!folio)
> 		return NULL;
>
> -- 
> 2.43.0
>
>
Vlastimil Babka July 10, 2024, 10:35 a.m. UTC | #4
On 7/9/24 7:12 PM, Christoph Lameter (Ampere) wrote:
> On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote:
> 
>> For no apparent reason, we were open-coding alloc_pages_node() in
>> this function.
> 
> The reason is that alloc_pages() follow memory policies, cgroup restrictions 
> etc etc and alloc_pages_node does not.
> 
> With this patch cgroup restrictions memory policies etc etc no longer work 
> in the slab allocator.

The only difference is memory policy from get_task_policy(), and the rest is
the same, right?

> Please revert this patch.

But this only affects new slab page allocation, while getting objects from
existing slabs isn't subject to memory policies, so now it's at least
consistent? Do you have some use case where it matters?

>> diff --git a/mm/slub.c b/mm/slub.c
>> index 35aa706dc318..342545775df6 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
>> 	struct slab *slab;
>> 	unsigned int order = oo_order(oo);
>>
>> -	if (node == NUMA_NO_NODE)
>> -		folio = (struct folio *)alloc_pages(flags, order);
>> -	else
>> -		folio = (struct folio *)__alloc_pages_node(node, flags, order);
>> -
>> +	folio = (struct folio *)alloc_pages_node(node, flags, order);
>> 	if (!folio)
>> 		return NULL;
>>
>> -- 
>> 2.43.0
>>
>>
Christoph Lameter (Ampere) July 10, 2024, 4:43 p.m. UTC | #5
On Wed, 10 Jul 2024, Vlastimil Babka wrote:

>> With this patch cgroup restrictions memory policies etc etc no longer work
>> in the slab allocator.
>
> The only difference is memory policy from get_task_policy(), and the rest is
> the same, right?

There are also the cpuset/cgroup restrictions via the zonelists that are 
bypassed by removing alloc_pages()

> But this only affects new slab page allocation, while getting objects from
> existing slabs isn't subject to memory policies, so now it's at least
> consistent? Do you have some use case where it matters?

Yes this means you cannot redirect kmalloc based kernel metadata 
allocation when creating f.e. cgroups for another NUMA node. This affects 
all kernel metadata allocation during syscalls that used to be 
controllable via numa methods.

SLAB implemented memory allocations policies per object. SLUB moved that 
to implement these policies only when allocating a page frame. If this 
patch is left in then there wont be any support for memory allocation 
policies left in the slab allocators.

We have some internal patches now that implement memory policies on a per 
object basis for SLUB here.

This is a 10-15% regression on various benchmarks when objects like the 
scheduler statistics structures are misplaced.
Vlastimil Babka July 11, 2024, 7:54 a.m. UTC | #6
On 7/10/24 6:43 PM, Christoph Lameter (Ampere) wrote:
> On Wed, 10 Jul 2024, Vlastimil Babka wrote:
> 
>>> With this patch cgroup restrictions memory policies etc etc no longer work
>>> in the slab allocator.
>>
>> The only difference is memory policy from get_task_policy(), and the rest is
>> the same, right?
> 
> There are also the cpuset/cgroup restrictions via the zonelists that are 
> bypassed by removing alloc_pages()

AFAICS cpusets are handled on a level that's reached by both paths, i.e.
prepare_alloc_pages(), and I see nothing that would make switching to
alloc_pages_node() bypass it. Am I missing something?

>> But this only affects new slab page allocation, while getting objects from
>> existing slabs isn't subject to memory policies, so now it's at least
>> consistent? Do you have some use case where it matters?
> 
> Yes this means you cannot redirect kmalloc based kernel metadata 
> allocation when creating f.e. cgroups for another NUMA node. This affects 
> all kernel metadata allocation during syscalls that used to be 
> controllable via numa methods.
> 
> SLAB implemented memory allocations policies per object. SLUB moved that 
> to implement these policies only when allocating a page frame. If this 
> patch is left in then there wont be any support for memory allocation 
> policies left in the slab allocators.
> 
> We have some internal patches now that implement memory policies on a per 
> object basis for SLUB here.
> 
> This is a 10-15% regression on various benchmarks when objects like the 
> scheduler statistics structures are misplaced.

I believe it would be best if you submitted a patch with with all that
reasoning. Thanks!
Christoph Lameter (Ampere) July 11, 2024, 6:04 p.m. UTC | #7
On Thu, 11 Jul 2024, Vlastimil Babka wrote:

>> There are also the cpuset/cgroup restrictions via the zonelists that are
>> bypassed by removing alloc_pages()
>
> AFAICS cpusets are handled on a level that's reached by both paths, i.e.
> prepare_alloc_pages(), and I see nothing that would make switching to
> alloc_pages_node() bypass it. Am I missing something?

You are correct. cpuset/cgroup restrictions also apply to 
alloc_pages_node().

>> We have some internal patches now that implement memory policies on a per
>> object basis for SLUB here.
>>
>> This is a 10-15% regression on various benchmarks when objects like the
>> scheduler statistics structures are misplaced.
>
> I believe it would be best if you submitted a patch with with all that
> reasoning. Thanks!

Turns out those performance issues are related to the issue that NUMA 
locality is only considered at the folio level for slab allocation. 
Individual object allocations are not subject to it.

The performance issue comes about in the following way:

Two kernel threads run on the same cpu using the same slab cache. One of 
them keeps on allocating from a different node via kmalloc_node() and the 
other is using kmalloc(). Then the kmalloc_node() thread will always 
ensure that the per cpu slab is from the other node.

The other thread will use kmalloc() which does not check which node the 
per cpu slab is from. Therefore the kmallc thread can continually be 
served objects that are not local. That is not good and causes 
misplacement of objects.

But that issue is something separate from this commit here and we see the 
same regression before this commit.

This patch still needs to be reverted since the rationale for the patch 
is not right and it disables memory policy support. Results in the 
strange situation that memory policies are used in get_any_partial() in 
slub but not during allocation anymore.
Vlastimil Babka July 12, 2024, 7:47 a.m. UTC | #8
On 7/11/24 8:04 PM, Christoph Lameter (Ampere) wrote:
> On Thu, 11 Jul 2024, Vlastimil Babka wrote:
> 
>>> There are also the cpuset/cgroup restrictions via the zonelists that are
>>> bypassed by removing alloc_pages()
>>
>> AFAICS cpusets are handled on a level that's reached by both paths, i.e.
>> prepare_alloc_pages(), and I see nothing that would make switching to
>> alloc_pages_node() bypass it. Am I missing something?
> 
> You are correct. cpuset/cgroup restrictions also apply to 
> alloc_pages_node().
> 
>>> We have some internal patches now that implement memory policies on a per
>>> object basis for SLUB here.
>>>
>>> This is a 10-15% regression on various benchmarks when objects like the
>>> scheduler statistics structures are misplaced.
>>
>> I believe it would be best if you submitted a patch with with all that
>> reasoning. Thanks!

I still believe that :)

> Turns out those performance issues are related to the issue that NUMA 
> locality is only considered at the folio level for slab allocation. 
> Individual object allocations are not subject to it.
> 
> The performance issue comes about in the following way:
> 
> Two kernel threads run on the same cpu using the same slab cache. One of 
> them keeps on allocating from a different node via kmalloc_node() and the 
> other is using kmalloc(). Then the kmalloc_node() thread will always 
> ensure that the per cpu slab is from the other node.
> 
> The other thread will use kmalloc() which does not check which node the 
> per cpu slab is from. Therefore the kmallc thread can continually be 
> served objects that are not local. That is not good and causes 
> misplacement of objects.
> 
> But that issue is something separate from this commit here and we see the 
> same regression before this commit.
> 
> This patch still needs to be reverted since the rationale for the patch 
> is not right and it disables memory policy support. Results in the 
> strange situation that memory policies are used in get_any_partial() in 
> slub but not during allocation anymore.
>
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index 35aa706dc318..342545775df6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2187,11 +2187,7 @@  static inline struct slab *alloc_slab_page(gfp_t flags, int node,
 	struct slab *slab;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
-		folio = (struct folio *)alloc_pages(flags, order);
-	else
-		folio = (struct folio *)__alloc_pages_node(node, flags, order);
-
+	folio = (struct folio *)alloc_pages_node(node, flags, order);
 	if (!folio)
 		return NULL;