[3/4] mm/slub: simplify get_partial_node()

Message ID	20240331021926.2732572-4-xiongwei.song@windriver.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: <xiongwei.song@windriver.com> To: <vbabka@suse.cz>, <rientjes@google.com>, <cl@linux.com>, <penberg@kernel.org>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>, <roman.gushchin@linux.dev>, <42.hyeyoo@gmail.com> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <chengming.zhou@linux.dev> Subject: [PATCH 3/4] mm/slub: simplify get_partial_node() Date: Sun, 31 Mar 2024 10:19:25 +0800 Message-ID: <20240331021926.2732572-4-xiongwei.song@windriver.com> In-Reply-To: <20240331021926.2732572-1-xiongwei.song@windriver.com> References: <20240331021926.2732572-1-xiongwei.song@windriver.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	SLUB: improve filling cpu partial a bit in get_partial_node() \| expand [0/4] SLUB: improve filling cpu partial a bit in get_partial_node() [1/4] mm/slub: remove the check of !kmem_cache_has_cpu_partial() [2/4] mm/slub: add slub_get_cpu_partial() helper [3/4] mm/slub: simplify get_partial_node() [4/4] mm/slub: don't read slab->cpu_partial_slabs directly

Message ID

20240331021926.2732572-4-xiongwei.song@windriver.com (mailing list archive)

State

New

Headers

From: <xiongwei.song@windriver.com>
To: <vbabka@suse.cz>, <rientjes@google.com>, <cl@linux.com>,
        <penberg@kernel.org>, <iamjoonsoo.kim@lge.com>,
        <akpm@linux-foundation.org>, <roman.gushchin@linux.dev>,
        <42.hyeyoo@gmail.com>
CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
        <chengming.zhou@linux.dev>
Subject: [PATCH 3/4] mm/slub: simplify get_partial_node()
Date: Sun, 31 Mar 2024 10:19:25 +0800
Message-ID: <20240331021926.2732572-4-xiongwei.song@windriver.com>
In-Reply-To: <20240331021926.2732572-1-xiongwei.song@windriver.com>
References: <20240331021926.2732572-1-xiongwei.song@windriver.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

SLUB: improve filling cpu partial a bit in get_partial_node() | expand

Commit Message

Song, Xiongwei March 31, 2024, 2:19 a.m. UTC

From: Xiongwei Song <xiongwei.song@windriver.com>

The break conditions can be more readable and simple.

We can check if we need to fill cpu partial after getting the first
partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
cpu partial from next iteration, or break up the loop.

Then we can remove the preprocessor condition of
CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
compiler silent.

Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
---
 mm/slub.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Comments

Vlastimil Babka April 2, 2024, 9:41 a.m. UTC | #1

On 3/31/24 4:19 AM, xiongwei.song@windriver.com wrote:
> From: Xiongwei Song <xiongwei.song@windriver.com>
> 
> The break conditions can be more readable and simple.
> 
> We can check if we need to fill cpu partial after getting the first
> partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
> cpu partial from next iteration, or break up the loop.
> 
> Then we can remove the preprocessor condition of
> CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
> compiler silent.
> 
> Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
> ---
>  mm/slub.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 590cc953895d..ec91c7435d4e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
>  		if (!partial) {
>  			partial = slab;
>  			stat(s, ALLOC_FROM_PARTIAL);
> -		} else {
> -			put_cpu_partial(s, slab, 0);
> -			stat(s, CPU_PARTIAL_NODE);
> -			partial_slabs++;
> +
> +			/* Fill cpu partial if needed from next iteration, or break */
> +			if (kmem_cache_has_cpu_partial(s))

That kinda puts back the check removed in patch 1, although only in the
first iteration. Still not ideal.

> +				continue;
> +			else
> +				break;
>  		}
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> -		if (partial_slabs > s->cpu_partial_slabs / 2)
> -			break;
> -#else
> -		break;
> -#endif

I'd suggest intead of the changes done in this patch, only change this part
above to:

	if ((slub_get_cpu_partial(s) == 0) ||
	    (partial_slabs > slub_get_cpu_partial(s) / 2))
		break;

That gets rid of the #ifdef and also fixes a weird corner case that if we
set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here.

It could be tempting to use >= instead of > to achieve the same effect but
that would have unintended performance effects that would best be evaluated
separately.

>  
> +		put_cpu_partial(s, slab, 0);
> +		stat(s, CPU_PARTIAL_NODE);
> +		partial_slabs++;
> +
> +		if (partial_slabs > slub_get_cpu_partial(s) / 2)
> +			break;
>  	}
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  	return partial;

Song, Xiongwei April 3, 2024, 12:37 a.m. UTC | #2

> 
> On 3/31/24 4:19 AM, xiongwei.song@windriver.com wrote:
> > From: Xiongwei Song <xiongwei.song@windriver.com>
> >
> > The break conditions can be more readable and simple.
> >
> > We can check if we need to fill cpu partial after getting the first
> > partial slab. If kmem_cache_has_cpu_partial() returns true, we fill
> > cpu partial from next iteration, or break up the loop.
> >
> > Then we can remove the preprocessor condition of
> > CONFIG_SLUB_CPU_PARTIAL. Use dummy slub_get_cpu_partial() to make
> > compiler silent.
> >
> > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
> > ---
> >  mm/slub.c | 22 ++++++++++++----------
> >  1 file changed, 12 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 590cc953895d..ec91c7435d4e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2614,18 +2614,20 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> >               if (!partial) {
> >                       partial = slab;
> >                       stat(s, ALLOC_FROM_PARTIAL);
> > -             } else {
> > -                     put_cpu_partial(s, slab, 0);
> > -                     stat(s, CPU_PARTIAL_NODE);
> > -                     partial_slabs++;
> > +
> > +                     /* Fill cpu partial if needed from next iteration, or break */
> > +                     if (kmem_cache_has_cpu_partial(s))
> 
> That kinda puts back the check removed in patch 1, although only in the
> first iteration. Still not ideal.
> 
> > +                             continue;
> > +                     else
> > +                             break;
> >               }
> > -#ifdef CONFIG_SLUB_CPU_PARTIAL
> > -             if (partial_slabs > s->cpu_partial_slabs / 2)
> > -                     break;
> > -#else
> > -             break;
> > -#endif
> 
> I'd suggest intead of the changes done in this patch, only change this part
> above to:
> 
>         if ((slub_get_cpu_partial(s) == 0) ||
>             (partial_slabs > slub_get_cpu_partial(s) / 2))
>                 break;
> 
> That gets rid of the #ifdef and also fixes a weird corner case that if we
> set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here.

Oh, yes. Will update.

> 
> It could be tempting to use >= instead of > to achieve the same effect but
> that would have unintended performance effects that would best be evaluated
> separately.

I can run a test to measure Amean changes. But in terms of x86 assembly, there 
should not be extra  instructions with ">=".

Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
No more instructions involved. So there should not be performance effects on x86.

Thanks,
Xiongwei

> 
> >
> > +             put_cpu_partial(s, slab, 0);
> > +             stat(s, CPU_PARTIAL_NODE);
> > +             partial_slabs++;
> > +
> > +             if (partial_slabs > slub_get_cpu_partial(s) / 2)
> > +                     break;
> >       }
> >       spin_unlock_irqrestore(&n->list_lock, flags);
> >       return partial;

Vlastimil Babka April 3, 2024, 7:25 a.m. UTC | #3

On 4/3/24 2:37 AM, Song, Xiongwei wrote:
>> 
>> 
>> It could be tempting to use >= instead of > to achieve the same effect but
>> that would have unintended performance effects that would best be evaluated
>> separately.
> 
> I can run a test to measure Amean changes. But in terms of x86 assembly, there 
> should not be extra  instructions with ">=".
> 
> Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
> No more instructions involved. So there should not be performance effects on x86.

Right, I didn't mean the code of the test, but how the difference of the
comparison affects how many cpu partial slabs would be put on the cpu
partial list here.

> Thanks,
> Xiongwei
> 
>> 
>> >
>> > +             put_cpu_partial(s, slab, 0);
>> > +             stat(s, CPU_PARTIAL_NODE);
>> > +             partial_slabs++;
>> > +
>> > +             if (partial_slabs > slub_get_cpu_partial(s) / 2)
>> > +                     break;
>> >       }
>> >       spin_unlock_irqrestore(&n->list_lock, flags);
>> >       return partial;
>

Song, Xiongwei April 3, 2024, 11:15 a.m. UTC | #4

> 
> On 4/3/24 2:37 AM, Song, Xiongwei wrote:
> >>
> >>
> >> It could be tempting to use >= instead of > to achieve the same effect but
> >> that would have unintended performance effects that would best be evaluated
> >> separately.
> >
> > I can run a test to measure Amean changes. But in terms of x86 assembly, there
> > should not be extra  instructions with ">=".
> >
> > Did a simple test, for ">=" it uses "jle" instruction, while "jl" instruction is used for ">".
> > No more instructions involved. So there should not be performance effects on x86.
> 
> Right, I didn't mean the code of the test, but how the difference of the
> comparison affects how many cpu partial slabs would be put on the cpu
> partial list here.

Got it. Will do measurement for it.

Thanks,
Xiongwei

diff --git a/mm/slub.c b/mm/slub.c
index 590cc953895d..ec91c7435d4e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2614,18 +2614,20 @@  static struct slab *get_partial_node(struct kmem_cache *s,
 		if (!partial) {
 			partial = slab;
 			stat(s, ALLOC_FROM_PARTIAL);
-		} else {
-			put_cpu_partial(s, slab, 0);
-			stat(s, CPU_PARTIAL_NODE);
-			partial_slabs++;
+
+			/* Fill cpu partial if needed from next iteration, or break */
+			if (kmem_cache_has_cpu_partial(s))
+				continue;
+			else
+				break;
 		}
-#ifdef CONFIG_SLUB_CPU_PARTIAL
-		if (partial_slabs > s->cpu_partial_slabs / 2)
-			break;
-#else
-		break;
-#endif
 
+		put_cpu_partial(s, slab, 0);
+		stat(s, CPU_PARTIAL_NODE);
+		partial_slabs++;
+
+		if (partial_slabs > slub_get_cpu_partial(s) / 2)
+			break;
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
 	return partial;

[3/4] mm/slub: simplify get_partial_node()

Commit Message

Comments

Patch