[v2,3/3] mm/slub: simplify get_partial_node()

Message ID	20240404055826.1469415-4-xiongwei.song@windriver.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: <xiongwei.song@windriver.com> To: <vbabka@suse.cz>, <rientjes@google.com>, <cl@linux.com>, <penberg@kernel.org>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>, <roman.gushchin@linux.dev>, <42.hyeyoo@gmail.com> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <chengming.zhou@linux.dev> Subject: [PATCH v2 3/3] mm/slub: simplify get_partial_node() Date: Thu, 4 Apr 2024 13:58:26 +0800 Message-ID: <20240404055826.1469415-4-xiongwei.song@windriver.com> In-Reply-To: <20240404055826.1469415-1-xiongwei.song@windriver.com> References: <20240404055826.1469415-1-xiongwei.song@windriver.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	SLUB: improve filling cpu partial a bit in get_partial_node() \| expand [v2,0/3] SLUB: improve filling cpu partial a bit in get_partial_node() [v2,1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() [v2,2/3] mm/slub: add slub_get_cpu_partial() helper [v2,3/3] mm/slub: simplify get_partial_node()

Message ID

20240404055826.1469415-4-xiongwei.song@windriver.com (mailing list archive)

State

New

Headers

From: <xiongwei.song@windriver.com>
To: <vbabka@suse.cz>, <rientjes@google.com>, <cl@linux.com>,
        <penberg@kernel.org>, <iamjoonsoo.kim@lge.com>,
        <akpm@linux-foundation.org>, <roman.gushchin@linux.dev>,
        <42.hyeyoo@gmail.com>
CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
        <chengming.zhou@linux.dev>
Subject: [PATCH v2 3/3] mm/slub: simplify get_partial_node()
Date: Thu, 4 Apr 2024 13:58:26 +0800
Message-ID: <20240404055826.1469415-4-xiongwei.song@windriver.com>
In-Reply-To: <20240404055826.1469415-1-xiongwei.song@windriver.com>
References: <20240404055826.1469415-1-xiongwei.song@windriver.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

SLUB: improve filling cpu partial a bit in get_partial_node() | expand

Commit Message

Song, Xiongwei April 4, 2024, 5:58 a.m. UTC

From: Xiongwei Song <xiongwei.song@windriver.com>

The break conditions for filling cpu partial can be more readable and
simple.

If slub_get_cpu_partial() returns 0, we can confirm that we don't need
to fill cpu partial, then we should break from the loop. On the other
hand, we also should break from the loop if we have added enough cpu
partial slabs.

Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
allocate at least one here.

Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
---

The measurement below is to compare the performance effects when checking
if we need to break from the filling cpu partial loop with the following
either-or condition:

Condition 1:
When the count of added cpu slabs is greater than cpu_partial_slabs/2:
(partial_slabs > slub_get_cpu_partial(s) / 2)

Condition 2:
When the count of added cpu slabs is greater than or equal to
cpu_partial_slabs/2:
(partial_slabs >= slub_get_cpu_partial(s) / 2)

The change of breaking condition can effect how many cpu partial slabs
would be put on the cpu partial list.

Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with
16 cores. The OS is Ubuntu 22.04.

hackbench-process-pipes
                  6.9-rc2(with ">")      6.9.0-rc2(with ">=")
Amean     1       0.0373 (   0.00%)      0.0356 *   4.60%*
Amean     4       0.0984 (   0.00%)      0.1014 *  -3.05%*
Amean     7       0.1803 (   0.00%)      0.1851 *  -2.69%*
Amean     12      0.2947 (   0.00%)      0.3141 *  -6.59%*
Amean     21      0.4577 (   0.00%)      0.4927 *  -7.65%*
Amean     30      0.6326 (   0.00%)      0.6649 *  -5.10%*
Amean     48      0.9396 (   0.00%)      0.9884 *  -5.20%*
Amean     64      1.2321 (   0.00%)      1.3004 *  -5.54%*

hackbench-process-sockets
                  6.9-rc2(with ">")      6.9.0-rc2(with ">=")
Amean     1       0.0609 (   0.00%)      0.0623 *  -2.35%*
Amean     4       0.2107 (   0.00%)      0.2140 *  -1.56%*
Amean     7       0.3754 (   0.00%)      0.3966 *  -5.63%*
Amean     12      0.6456 (   0.00%)      0.6734 *  -4.32%*
Amean     21      1.1440 (   0.00%)      1.1769 *  -2.87%*
Amean     30      1.6629 (   0.00%)      1.7031 *  -2.42%*
Amean     48      2.7321 (   0.00%)      2.7897 *  -2.11%*
Amean     64      3.7397 (   0.00%)      3.7640 *  -0.65%*

It seems there is a bit performance penalty when using ">=" to break up
the loop. Hence, we should still use ">" here.
---
 mm/slub.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Vlastimil Babka April 4, 2024, 9:26 a.m. UTC | #1

On 4/4/24 7:58 AM, xiongwei.song@windriver.com wrote:
> From: Xiongwei Song <xiongwei.song@windriver.com>
> 
> The break conditions for filling cpu partial can be more readable and
> simple.
> 
> If slub_get_cpu_partial() returns 0, we can confirm that we don't need
> to fill cpu partial, then we should break from the loop. On the other
> hand, we also should break from the loop if we have added enough cpu
> partial slabs.
> 
> Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
> corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
> allocate at least one here.
> 
> Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
> ---
> 
> The measurement below is to compare the performance effects when checking
> if we need to break from the filling cpu partial loop with the following
> either-or condition:
> 
> Condition 1:
> When the count of added cpu slabs is greater than cpu_partial_slabs/2:
> (partial_slabs > slub_get_cpu_partial(s) / 2)
> 
> Condition 2:
> When the count of added cpu slabs is greater than or equal to
> cpu_partial_slabs/2:
> (partial_slabs >= slub_get_cpu_partial(s) / 2)
> 
> The change of breaking condition can effect how many cpu partial slabs
> would be put on the cpu partial list.
> 
> Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with
> 16 cores. The OS is Ubuntu 22.04.
> 
> hackbench-process-pipes
>                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> Amean     1       0.0373 (   0.00%)      0.0356 *   4.60%*
> Amean     4       0.0984 (   0.00%)      0.1014 *  -3.05%*
> Amean     7       0.1803 (   0.00%)      0.1851 *  -2.69%*
> Amean     12      0.2947 (   0.00%)      0.3141 *  -6.59%*
> Amean     21      0.4577 (   0.00%)      0.4927 *  -7.65%*
> Amean     30      0.6326 (   0.00%)      0.6649 *  -5.10%*
> Amean     48      0.9396 (   0.00%)      0.9884 *  -5.20%*
> Amean     64      1.2321 (   0.00%)      1.3004 *  -5.54%*
> 
> hackbench-process-sockets
>                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> Amean     1       0.0609 (   0.00%)      0.0623 *  -2.35%*
> Amean     4       0.2107 (   0.00%)      0.2140 *  -1.56%*
> Amean     7       0.3754 (   0.00%)      0.3966 *  -5.63%*
> Amean     12      0.6456 (   0.00%)      0.6734 *  -4.32%*
> Amean     21      1.1440 (   0.00%)      1.1769 *  -2.87%*
> Amean     30      1.6629 (   0.00%)      1.7031 *  -2.42%*
> Amean     48      2.7321 (   0.00%)      2.7897 *  -2.11%*
> Amean     64      3.7397 (   0.00%)      3.7640 *  -0.65%*
> 
> It seems there is a bit performance penalty when using ">=" to break up
> the loop. Hence, we should still use ">" here.

Thanks for evaluating that, I suspected that would be the case so we should
not change that performance aspect as part of a cleanup.

> ---
>  mm/slub.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 590cc953895d..6beff3b1e22c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct kmem_cache *s,
>  			stat(s, CPU_PARTIAL_NODE);
>  			partial_slabs++;
>  		}
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> -		if (partial_slabs > s->cpu_partial_slabs / 2)
> -			break;
> -#else
> -		break;
> -#endif
>  
> +		if ((slub_get_cpu_partial(s) == 0) ||
> +		    (partial_slabs > slub_get_cpu_partial(s) / 2))
> +			break;
>  	}
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  	return partial;

After looking at the result and your v1 again, I arrived at this
modification that incorporates the core v1 idea without reintroducing
kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK
with you? Pushed the whole series with this modification to slab/for-next
for now.

----8<-----
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct kmem_cache *s,
                if (!partial) {
                        partial = slab;
                        stat(s, ALLOC_FROM_PARTIAL);
+                       if ((slub_get_cpu_partial(s) == 0)) {
+                               break;
+                       }
                } else {
                        put_cpu_partial(s, slab, 0);
                        stat(s, CPU_PARTIAL_NODE);
-                       partial_slabs++;
-               }
-#ifdef CONFIG_SLUB_CPU_PARTIAL
-               if (partial_slabs > s->cpu_partial_slabs / 2)
-                       break;
-#else
-               break;
-#endif
 
+                       if (++partial_slabs > slub_get_cpu_partial(s) / 2) {
+                               break;
+                       }
+               }
        }
        spin_unlock_irqrestore(&n->list_lock, flags);
        return partial;

Song, Xiongwei April 7, 2024, 1:47 a.m. UTC | #2

> On 4/4/24 7:58 AM, xiongwei.song@windriver.com wrote:
> > From: Xiongwei Song <xiongwei.song@windriver.com>
> >
> > The break conditions for filling cpu partial can be more readable and
> > simple.
> >
> > If slub_get_cpu_partial() returns 0, we can confirm that we don't need
> > to fill cpu partial, then we should break from the loop. On the other
> > hand, we also should break from the loop if we have added enough cpu
> > partial slabs.
> >
> > Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
> > corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
> > allocate at least one here.
> >
> > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
> > ---
> >
> > The measurement below is to compare the performance effects when
> checking
> > if we need to break from the filling cpu partial loop with the following
> > either-or condition:
> >
> > Condition 1:
> > When the count of added cpu slabs is greater than cpu_partial_slabs/2:
> > (partial_slabs > slub_get_cpu_partial(s) / 2)
> >
> > Condition 2:
> > When the count of added cpu slabs is greater than or equal to
> > cpu_partial_slabs/2:
> > (partial_slabs >= slub_get_cpu_partial(s) / 2)
> >
> > The change of breaking condition can effect how many cpu partial slabs
> > would be put on the cpu partial list.
> >
> > Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with
> > 16 cores. The OS is Ubuntu 22.04.
> >
> > hackbench-process-pipes
> >                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> > Amean     1       0.0373 (   0.00%)      0.0356 *   4.60%*
> > Amean     4       0.0984 (   0.00%)      0.1014 *  -3.05%*
> > Amean     7       0.1803 (   0.00%)      0.1851 *  -2.69%*
> > Amean     12      0.2947 (   0.00%)      0.3141 *  -6.59%*
> > Amean     21      0.4577 (   0.00%)      0.4927 *  -7.65%*
> > Amean     30      0.6326 (   0.00%)      0.6649 *  -5.10%*
> > Amean     48      0.9396 (   0.00%)      0.9884 *  -5.20%*
> > Amean     64      1.2321 (   0.00%)      1.3004 *  -5.54%*
> >
> > hackbench-process-sockets
> >                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> > Amean     1       0.0609 (   0.00%)      0.0623 *  -2.35%*
> > Amean     4       0.2107 (   0.00%)      0.2140 *  -1.56%*
> > Amean     7       0.3754 (   0.00%)      0.3966 *  -5.63%*
> > Amean     12      0.6456 (   0.00%)      0.6734 *  -4.32%*
> > Amean     21      1.1440 (   0.00%)      1.1769 *  -2.87%*
> > Amean     30      1.6629 (   0.00%)      1.7031 *  -2.42%*
> > Amean     48      2.7321 (   0.00%)      2.7897 *  -2.11%*
> > Amean     64      3.7397 (   0.00%)      3.7640 *  -0.65%*
> >
> > It seems there is a bit performance penalty when using ">=" to break up
> > the loop. Hence, we should still use ">" here.
> 
> Thanks for evaluating that, I suspected that would be the case so we should
> not change that performance aspect as part of a cleanup.
> 
> > ---
> >  mm/slub.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 590cc953895d..6beff3b1e22c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct
> kmem_cache *s,
> >                       stat(s, CPU_PARTIAL_NODE);
> >                       partial_slabs++;
> >               }
> > -#ifdef CONFIG_SLUB_CPU_PARTIAL
> > -             if (partial_slabs > s->cpu_partial_slabs / 2)
> > -                     break;
> > -#else
> > -             break;
> > -#endif
> >
> > +             if ((slub_get_cpu_partial(s) == 0) ||
> > +                 (partial_slabs > slub_get_cpu_partial(s) / 2))
> > +                     break;
> >       }
> >       spin_unlock_irqrestore(&n->list_lock, flags);
> >       return partial;
> 
> After looking at the result and your v1 again, I arrived at this
> modification that incorporates the core v1 idea without reintroducing
> kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK
> with you? Pushed the whole series with this modification to slab/for-next
> for now.

Sorry for the late response, I was on vacation.

I'm ok with the patch below.

Thanks,
Xiongwei

> 
> ----8<-----
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct
> kmem_cache *s,
>                 if (!partial) {
>                         partial = slab;
>                         stat(s, ALLOC_FROM_PARTIAL);
> +                       if ((slub_get_cpu_partial(s) == 0)) {
> +                               break;
> +                       }
>                 } else {
>                         put_cpu_partial(s, slab, 0);
>                         stat(s, CPU_PARTIAL_NODE);
> -                       partial_slabs++;
> -               }
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> -               if (partial_slabs > s->cpu_partial_slabs / 2)
> -                       break;
> -#else
> -               break;
> -#endif
> 
> +                       if (++partial_slabs > slub_get_cpu_partial(s) / 2) {
> +                               break;
> +                       }
> +               }
>         }
>         spin_unlock_irqrestore(&n->list_lock, flags);
>         return partial;

diff --git a/mm/slub.c b/mm/slub.c
index 590cc953895d..6beff3b1e22c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2619,13 +2619,10 @@  static struct slab *get_partial_node(struct kmem_cache *s,
 			stat(s, CPU_PARTIAL_NODE);
 			partial_slabs++;
 		}
-#ifdef CONFIG_SLUB_CPU_PARTIAL
-		if (partial_slabs > s->cpu_partial_slabs / 2)
-			break;
-#else
-		break;
-#endif
 
+		if ((slub_get_cpu_partial(s) == 0) ||
+		    (partial_slabs > slub_get_cpu_partial(s) / 2))
+			break;
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
 	return partial;

[v2,3/3] mm/slub: simplify get_partial_node()

Commit Message

Comments

Patch