Message ID | 20231228085748.1083901-3-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Remove some lruvec page accounting functions | expand |
On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote: > For no apparent reason, we were open-coding alloc_pages_node() in > this function. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Rientjes <rientjes@google.com>
On 12/28/23 09:57, Matthew Wilcox (Oracle) wrote: > For no apparent reason, we were open-coding alloc_pages_node() in > this function. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/slub.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 35aa706dc318..342545775df6 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, > struct slab *slab; > unsigned int order = oo_order(oo); > > - if (node == NUMA_NO_NODE) > - folio = (struct folio *)alloc_pages(flags, order); > - else > - folio = (struct folio *)__alloc_pages_node(node, flags, order); > - > + folio = (struct folio *)alloc_pages_node(node, flags, order); > if (!folio) > return NULL; >
On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote: > For no apparent reason, we were open-coding alloc_pages_node() in > this function. The reason is that alloc_pages() follow memory policies, cgroup restrictions etc etc and alloc_pages_node does not. With this patch cgroup restrictions memory policies etc etc no longer work in the slab allocator. Please revert this patch. > diff --git a/mm/slub.c b/mm/slub.c > index 35aa706dc318..342545775df6 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, > struct slab *slab; > unsigned int order = oo_order(oo); > > - if (node == NUMA_NO_NODE) > - folio = (struct folio *)alloc_pages(flags, order); > - else > - folio = (struct folio *)__alloc_pages_node(node, flags, order); > - > + folio = (struct folio *)alloc_pages_node(node, flags, order); > if (!folio) > return NULL; > > -- > 2.43.0 > >
On 7/9/24 7:12 PM, Christoph Lameter (Ampere) wrote: > On Thu, 28 Dec 2023, Matthew Wilcox (Oracle) wrote: > >> For no apparent reason, we were open-coding alloc_pages_node() in >> this function. > > The reason is that alloc_pages() follow memory policies, cgroup restrictions > etc etc and alloc_pages_node does not. > > With this patch cgroup restrictions memory policies etc etc no longer work > in the slab allocator. The only difference is memory policy from get_task_policy(), and the rest is the same, right? > Please revert this patch. But this only affects new slab page allocation, while getting objects from existing slabs isn't subject to memory policies, so now it's at least consistent? Do you have some use case where it matters? >> diff --git a/mm/slub.c b/mm/slub.c >> index 35aa706dc318..342545775df6 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, >> struct slab *slab; >> unsigned int order = oo_order(oo); >> >> - if (node == NUMA_NO_NODE) >> - folio = (struct folio *)alloc_pages(flags, order); >> - else >> - folio = (struct folio *)__alloc_pages_node(node, flags, order); >> - >> + folio = (struct folio *)alloc_pages_node(node, flags, order); >> if (!folio) >> return NULL; >> >> -- >> 2.43.0 >> >>
On Wed, 10 Jul 2024, Vlastimil Babka wrote: >> With this patch cgroup restrictions memory policies etc etc no longer work >> in the slab allocator. > > The only difference is memory policy from get_task_policy(), and the rest is > the same, right? There are also the cpuset/cgroup restrictions via the zonelists that are bypassed by removing alloc_pages() > But this only affects new slab page allocation, while getting objects from > existing slabs isn't subject to memory policies, so now it's at least > consistent? Do you have some use case where it matters? Yes this means you cannot redirect kmalloc based kernel metadata allocation when creating f.e. cgroups for another NUMA node. This affects all kernel metadata allocation during syscalls that used to be controllable via numa methods. SLAB implemented memory allocations policies per object. SLUB moved that to implement these policies only when allocating a page frame. If this patch is left in then there wont be any support for memory allocation policies left in the slab allocators. We have some internal patches now that implement memory policies on a per object basis for SLUB here. This is a 10-15% regression on various benchmarks when objects like the scheduler statistics structures are misplaced.
On 7/10/24 6:43 PM, Christoph Lameter (Ampere) wrote: > On Wed, 10 Jul 2024, Vlastimil Babka wrote: > >>> With this patch cgroup restrictions memory policies etc etc no longer work >>> in the slab allocator. >> >> The only difference is memory policy from get_task_policy(), and the rest is >> the same, right? > > There are also the cpuset/cgroup restrictions via the zonelists that are > bypassed by removing alloc_pages() AFAICS cpusets are handled on a level that's reached by both paths, i.e. prepare_alloc_pages(), and I see nothing that would make switching to alloc_pages_node() bypass it. Am I missing something? >> But this only affects new slab page allocation, while getting objects from >> existing slabs isn't subject to memory policies, so now it's at least >> consistent? Do you have some use case where it matters? > > Yes this means you cannot redirect kmalloc based kernel metadata > allocation when creating f.e. cgroups for another NUMA node. This affects > all kernel metadata allocation during syscalls that used to be > controllable via numa methods. > > SLAB implemented memory allocations policies per object. SLUB moved that > to implement these policies only when allocating a page frame. If this > patch is left in then there wont be any support for memory allocation > policies left in the slab allocators. > > We have some internal patches now that implement memory policies on a per > object basis for SLUB here. > > This is a 10-15% regression on various benchmarks when objects like the > scheduler statistics structures are misplaced. I believe it would be best if you submitted a patch with with all that reasoning. Thanks!
On Thu, 11 Jul 2024, Vlastimil Babka wrote: >> There are also the cpuset/cgroup restrictions via the zonelists that are >> bypassed by removing alloc_pages() > > AFAICS cpusets are handled on a level that's reached by both paths, i.e. > prepare_alloc_pages(), and I see nothing that would make switching to > alloc_pages_node() bypass it. Am I missing something? You are correct. cpuset/cgroup restrictions also apply to alloc_pages_node(). >> We have some internal patches now that implement memory policies on a per >> object basis for SLUB here. >> >> This is a 10-15% regression on various benchmarks when objects like the >> scheduler statistics structures are misplaced. > > I believe it would be best if you submitted a patch with with all that > reasoning. Thanks! Turns out those performance issues are related to the issue that NUMA locality is only considered at the folio level for slab allocation. Individual object allocations are not subject to it. The performance issue comes about in the following way: Two kernel threads run on the same cpu using the same slab cache. One of them keeps on allocating from a different node via kmalloc_node() and the other is using kmalloc(). Then the kmalloc_node() thread will always ensure that the per cpu slab is from the other node. The other thread will use kmalloc() which does not check which node the per cpu slab is from. Therefore the kmallc thread can continually be served objects that are not local. That is not good and causes misplacement of objects. But that issue is something separate from this commit here and we see the same regression before this commit. This patch still needs to be reverted since the rationale for the patch is not right and it disables memory policy support. Results in the strange situation that memory policies are used in get_any_partial() in slub but not during allocation anymore.
On 7/11/24 8:04 PM, Christoph Lameter (Ampere) wrote: > On Thu, 11 Jul 2024, Vlastimil Babka wrote: > >>> There are also the cpuset/cgroup restrictions via the zonelists that are >>> bypassed by removing alloc_pages() >> >> AFAICS cpusets are handled on a level that's reached by both paths, i.e. >> prepare_alloc_pages(), and I see nothing that would make switching to >> alloc_pages_node() bypass it. Am I missing something? > > You are correct. cpuset/cgroup restrictions also apply to > alloc_pages_node(). > >>> We have some internal patches now that implement memory policies on a per >>> object basis for SLUB here. >>> >>> This is a 10-15% regression on various benchmarks when objects like the >>> scheduler statistics structures are misplaced. >> >> I believe it would be best if you submitted a patch with with all that >> reasoning. Thanks! I still believe that :) > Turns out those performance issues are related to the issue that NUMA > locality is only considered at the folio level for slab allocation. > Individual object allocations are not subject to it. > > The performance issue comes about in the following way: > > Two kernel threads run on the same cpu using the same slab cache. One of > them keeps on allocating from a different node via kmalloc_node() and the > other is using kmalloc(). Then the kmalloc_node() thread will always > ensure that the per cpu slab is from the other node. > > The other thread will use kmalloc() which does not check which node the > per cpu slab is from. Therefore the kmallc thread can continually be > served objects that are not local. That is not good and causes > misplacement of objects. > > But that issue is something separate from this commit here and we see the > same regression before this commit. > > This patch still needs to be reverted since the rationale for the patch > is not right and it disables memory policy support. Results in the > strange situation that memory policies are used in get_any_partial() in > slub but not during allocation anymore. >
diff --git a/mm/slub.c b/mm/slub.c index 35aa706dc318..342545775df6 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2187,11 +2187,7 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, struct slab *slab; unsigned int order = oo_order(oo); - if (node == NUMA_NO_NODE) - folio = (struct folio *)alloc_pages(flags, order); - else - folio = (struct folio *)__alloc_pages_node(node, flags, order); - + folio = (struct folio *)alloc_pages_node(node, flags, order); if (!folio) return NULL;
For no apparent reason, we were open-coding alloc_pages_node() in this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/slub.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)