Message ID | 20240331021926.2732572-4-xiongwei.song@windriver.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | SLUB: improve filling cpu partial a bit in get_partial_node() | expand |
On 3/31/24 4:19 AM, xiongwei.song@windriver.com wrote: > From: Xiongwei Song <xiongwei.song@windriver.com> > > The break conditions can be more readable and simple. > > We can check if we need to fill cpu partial after getting the first > partial slab. If kmem_cache_has_cpu_partial() returns true, we fill > cpu partial from next iteration, or break up the loop. > > Then we can remove the preprocessor condition of > CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make > compiler silent. > > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> > --- > mm/slub.c | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 590cc953895d..ec91c7435d4e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s, > if (!partial) { > partial = slab; > stat(s, ALLOC_FROM_PARTIAL); > - } else { > - put_cpu_partial(s, slab, 0); > - stat(s, CPU_PARTIAL_NODE); > - partial_slabs++; > + > + /* Fill cpu partial if needed from next iteration, or break */ > + if (kmem_cache_has_cpu_partial(s)) That kinda puts back the check removed in patch 1, although only in the first iteration. Still not ideal. > + continue; > + else > + break; > } > -#ifdef CONFIG_SLUB_CPU_PARTIAL > - if (partial_slabs > s->cpu_partial_slabs / 2) > - break; > -#else > - break; > -#endif I'd suggest intead of the changes done in this patch, only change this part above to: if ((slub_get_cpu_partial(s) == 0) || (partial_slabs > slub_get_cpu_partial(s) / 2)) break; That gets rid of the #ifdef and also fixes a weird corner case that if we set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here. It could be tempting to use >= instead of > to achieve the same effect but that would have unintended performance effects that would best be evaluated separately. > > + put_cpu_partial(s, slab, 0); > + stat(s, CPU_PARTIAL_NODE); > + partial_slabs++; > + > + if (partial_slabs > slub_get_cpu_partial(s) / 2) > + break; > } > spin_unlock_irqrestore(&n->list_lock, flags); > return partial;
> > On 3/31/24 4:19 AM, xiongwei.song@windriver.com wrote: > > From: Xiongwei Song <xiongwei.song@windriver.com> > > > > The break conditions can be more readable and simple. > > > > We can check if we need to fill cpu partial after getting the first > > partial slab. If kmem_cache_has_cpu_partial() returns true, we fill > > cpu partial from next iteration, or break up the loop. > > > > Then we can remove the preprocessor condition of > > CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make > > compiler silent. > > > > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> > > --- > > mm/slub.c | 22 ++++++++++++---------- > > 1 file changed, 12 insertions(+), 10 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 590cc953895d..ec91c7435d4e 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s, > > if (!partial) { > > partial = slab; > > stat(s, ALLOC_FROM_PARTIAL); > > - } else { > > - put_cpu_partial(s, slab, 0); > > - stat(s, CPU_PARTIAL_NODE); > > - partial_slabs++; > > + > > + /* Fill cpu partial if needed from next iteration, or break */ > > + if (kmem_cache_has_cpu_partial(s)) > > That kinda puts back the check removed in patch 1, although only in the > first iteration. Still not ideal. > > > + continue; > > + else > > + break; > > } > > -#ifdef CONFIG_SLUB_CPU_PARTIAL > > - if (partial_slabs > s->cpu_partial_slabs / 2) > > - break; > > -#else > > - break; > > -#endif > > I'd suggest intead of the changes done in this patch, only change this part > above to: > > if ((slub_get_cpu_partial(s) == 0) || > (partial_slabs > slub_get_cpu_partial(s) / 2)) > break; > > That gets rid of the #ifdef and also fixes a weird corner case that if we > set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here. Oh, yes. Will update. > > It could be tempting to use >= instead of > to achieve the same effect but > that would have unintended performance effects that would best be evaluated > separately. I can run a test to measure Amean changes. But in terms of x86 assembly, there should not be extra instructions with ">=". Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">". No more instructions involved. So there should not be performance effects on x86. Thanks, Xiongwei > > > > > + put_cpu_partial(s, slab, 0); > > + stat(s, CPU_PARTIAL_NODE); > > + partial_slabs++; > > + > > + if (partial_slabs > slub_get_cpu_partial(s) / 2) > > + break; > > } > > spin_unlock_irqrestore(&n->list_lock, flags); > > return partial;
On 4/3/24 2:37 AM, Song, Xiongwei wrote: >> >> >> It could be tempting to use >= instead of > to achieve the same effect but >> that would have unintended performance effects that would best be evaluated >> separately. > > I can run a test to measure Amean changes. But in terms of x86 assembly, there > should not be extra instructions with ">=". > > Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">". > No more instructions involved. So there should not be performance effects on x86. Right, I didn't mean the code of the test, but how the difference of the comparison affects how many cpu partial slabs would be put on the cpu partial list here. > Thanks, > Xiongwei > >> >> > >> > + put_cpu_partial(s, slab, 0); >> > + stat(s, CPU_PARTIAL_NODE); >> > + partial_slabs++; >> > + >> > + if (partial_slabs > slub_get_cpu_partial(s) / 2) >> > + break; >> > } >> > spin_unlock_irqrestore(&n->list_lock, flags); >> > return partial; >
> > On 4/3/24 2:37 AM, Song, Xiongwei wrote: > >> > >> > >> It could be tempting to use >= instead of > to achieve the same effect but > >> that would have unintended performance effects that would best be evaluated > >> separately. > > > > I can run a test to measure Amean changes. But in terms of x86 assembly, there > > should not be extra instructions with ">=". > > > > Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">". > > No more instructions involved. So there should not be performance effects on x86. > > Right, I didn't mean the code of the test, but how the difference of the > comparison affects how many cpu partial slabs would be put on the cpu > partial list here. Got it. Will do measurement for it. Thanks, Xiongwei
diff --git a/mm/slub.c b/mm/slub.c index 590cc953895d..ec91c7435d4e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s, if (!partial) { partial = slab; stat(s, ALLOC_FROM_PARTIAL); - } else { - put_cpu_partial(s, slab, 0); - stat(s, CPU_PARTIAL_NODE); - partial_slabs++; + + /* Fill cpu partial if needed from next iteration, or break */ + if (kmem_cache_has_cpu_partial(s)) + continue; + else + break; } -#ifdef CONFIG_SLUB_CPU_PARTIAL - if (partial_slabs > s->cpu_partial_slabs / 2) - break; -#else - break; -#endif + put_cpu_partial(s, slab, 0); + stat(s, CPU_PARTIAL_NODE); + partial_slabs++; + + if (partial_slabs > slub_get_cpu_partial(s) / 2) + break; } spin_unlock_irqrestore(&n->list_lock, flags); return partial;