diff mbox series

[3/4] sched/fair: Do not replace recent_used_cpu with the new target

Message ID 20201208153501.1467-4-mgorman@techsingularity.net (mailing list archive)
State New, archived
Headers show
Series Reduce scanning of runqueues in select_idle_sibling | expand

Commit Message

Mel Gorman Dec. 8, 2020, 3:35 p.m. UTC
After select_idle_sibling, p->recent_used_cpu is set to the
new target. However on the next wakeup, prev will be the same as
recent_used_cpu unless the load balancer has moved the task since the last
wakeup. It still works, but is less efficient than it can be after all
the changes that went in since that reduce unnecessary migrations, load
balancer changes etc.  This patch preserves recent_used_cpu for longer.

With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled

                          5.10.0-rc6             5.10.0-rc6
                 	 baseline-v2           altrecent-v2
Hmean     1        508.39 (   0.00%)      502.05 *  -1.25%*
Hmean     2        986.70 (   0.00%)      983.65 *  -0.31%*
Hmean     4       1914.55 (   0.00%)     1920.24 *   0.30%*
Hmean     8       3702.37 (   0.00%)     3663.96 *  -1.04%*
Hmean     16      6573.11 (   0.00%)     6545.58 *  -0.42%*
Hmean     32     10142.57 (   0.00%)    10253.73 *   1.10%*
Hmean     64     14348.40 (   0.00%)    12506.31 * -12.84%*
Hmean     128    21842.59 (   0.00%)    21967.13 *   0.57%*
Hmean     256    20813.75 (   0.00%)    21534.52 *   3.46%*
Hmean     320    20684.33 (   0.00%)    21070.14 *   1.87%*

The different was marginal except for 64 threads which showed in the
baseline that the result was very unstable where as the patch was much
more stable. This is somewhat machine specific as on a separate 80-cpu
Broadwell machine the same test reported.

                          5.10.0-rc6             5.10.0-rc6
                 	 baseline-v2           altrecent-v2
Hmean     1        310.36 (   0.00%)      291.81 *  -5.98%*
Hmean     2        340.86 (   0.00%)      547.22 *  60.54%*
Hmean     4        912.29 (   0.00%)     1063.21 *  16.54%*
Hmean     8       2116.40 (   0.00%)     2103.60 *  -0.60%*
Hmean     16      4232.90 (   0.00%)     4362.92 *   3.07%*
Hmean     32      8442.03 (   0.00%)     8642.10 *   2.37%*
Hmean     64     11733.91 (   0.00%)    11473.66 *  -2.22%*
Hmean     128    17727.24 (   0.00%)    16784.23 *  -5.32%*
Hmean     256    16089.23 (   0.00%)    16110.79 *   0.13%*
Hmean     320    15992.60 (   0.00%)    16071.64 *   0.49%*

schedstats were not used in this series but from an earlier debugging
effort, the schedstats after the test run were as follows;

Ops SIS Search               5653107942.00  5726545742.00
Ops SIS Domain Search        3365067916.00  3319768543.00
Ops SIS Scanned            112173512543.00 99194352541.00
Ops SIS Domain Scanned     109885472517.00 96787575342.00
Ops SIS Failures             2923185114.00  2950166441.00
Ops SIS Recent Used Hit           56547.00   118064916.00
Ops SIS Recent Used Miss     1590899250.00   354942791.00
Ops SIS Recent Attempts      1590955797.00   473007707.00
Ops SIS Search Efficiency             5.04           5.77
Ops SIS Domain Search Eff             3.06           3.43
Ops SIS Fast Success Rate            40.47          42.03
Ops SIS Success Rate                 48.29          48.48
Ops SIS Recent Success Rate           0.00          24.96

First interesting point is the ridiculous number of times runqueues are
enabled -- almost 97 billion times over the course of 40 minutes

With the patch, "Recent Used Hit" is over 2000 times more likely to
succeed. The failure rate also increases by quite a lot but the cost is
marginal even if the "Fast Success Rate" only increases by 2% overall. What
cannot be observed from these stats is where the biggest impact as these
stats cover low utilisation to over saturation.

If graphed over time, the graphs show that the sched domain is only
scanned at negligible rates until the machine is fully busy. With
low utilisation, the "Fast Success Rate" is almost 100% until the
machine is fully busy. For 320 clients, the success rate is close to
0% which is unsurprising.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Vincent Guittot Dec. 8, 2020, 4:14 p.m. UTC | #1
On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> After select_idle_sibling, p->recent_used_cpu is set to the
> new target. However on the next wakeup, prev will be the same as
> recent_used_cpu unless the load balancer has moved the task since the last
> wakeup. It still works, but is less efficient than it can be after all
> the changes that went in since that reduce unnecessary migrations, load
> balancer changes etc.  This patch preserves recent_used_cpu for longer.
>
> With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled
>
>                           5.10.0-rc6             5.10.0-rc6
>                          baseline-v2           altrecent-v2
> Hmean     1        508.39 (   0.00%)      502.05 *  -1.25%*
> Hmean     2        986.70 (   0.00%)      983.65 *  -0.31%*
> Hmean     4       1914.55 (   0.00%)     1920.24 *   0.30%*
> Hmean     8       3702.37 (   0.00%)     3663.96 *  -1.04%*
> Hmean     16      6573.11 (   0.00%)     6545.58 *  -0.42%*
> Hmean     32     10142.57 (   0.00%)    10253.73 *   1.10%*
> Hmean     64     14348.40 (   0.00%)    12506.31 * -12.84%*
> Hmean     128    21842.59 (   0.00%)    21967.13 *   0.57%*
> Hmean     256    20813.75 (   0.00%)    21534.52 *   3.46%*
> Hmean     320    20684.33 (   0.00%)    21070.14 *   1.87%*
>
> The different was marginal except for 64 threads which showed in the
> baseline that the result was very unstable where as the patch was much
> more stable. This is somewhat machine specific as on a separate 80-cpu
> Broadwell machine the same test reported.
>
>                           5.10.0-rc6             5.10.0-rc6
>                          baseline-v2           altrecent-v2
> Hmean     1        310.36 (   0.00%)      291.81 *  -5.98%*
> Hmean     2        340.86 (   0.00%)      547.22 *  60.54%*
> Hmean     4        912.29 (   0.00%)     1063.21 *  16.54%*
> Hmean     8       2116.40 (   0.00%)     2103.60 *  -0.60%*
> Hmean     16      4232.90 (   0.00%)     4362.92 *   3.07%*
> Hmean     32      8442.03 (   0.00%)     8642.10 *   2.37%*
> Hmean     64     11733.91 (   0.00%)    11473.66 *  -2.22%*
> Hmean     128    17727.24 (   0.00%)    16784.23 *  -5.32%*
> Hmean     256    16089.23 (   0.00%)    16110.79 *   0.13%*
> Hmean     320    15992.60 (   0.00%)    16071.64 *   0.49%*
>
> schedstats were not used in this series but from an earlier debugging
> effort, the schedstats after the test run were as follows;
>
> Ops SIS Search               5653107942.00  5726545742.00
> Ops SIS Domain Search        3365067916.00  3319768543.00
> Ops SIS Scanned            112173512543.00 99194352541.00
> Ops SIS Domain Scanned     109885472517.00 96787575342.00
> Ops SIS Failures             2923185114.00  2950166441.00
> Ops SIS Recent Used Hit           56547.00   118064916.00
> Ops SIS Recent Used Miss     1590899250.00   354942791.00
> Ops SIS Recent Attempts      1590955797.00   473007707.00
> Ops SIS Search Efficiency             5.04           5.77
> Ops SIS Domain Search Eff             3.06           3.43
> Ops SIS Fast Success Rate            40.47          42.03
> Ops SIS Success Rate                 48.29          48.48
> Ops SIS Recent Success Rate           0.00          24.96
>
> First interesting point is the ridiculous number of times runqueues are
> enabled -- almost 97 billion times over the course of 40 minutes
>
> With the patch, "Recent Used Hit" is over 2000 times more likely to
> succeed. The failure rate also increases by quite a lot but the cost is
> marginal even if the "Fast Success Rate" only increases by 2% overall. What
> cannot be observed from these stats is where the biggest impact as these
> stats cover low utilisation to over saturation.
>
> If graphed over time, the graphs show that the sched domain is only
> scanned at negligible rates until the machine is fully busy. With
> low utilisation, the "Fast Success Rate" is almost 100% until the
> machine is fully busy. For 320 clients, the success rate is close to
> 0% which is unsurprising.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5c41875aec23..413d895bbbf8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>
>         /* Check a recently used CPU as a potential idle candidate: */
>         recent_used_cpu = p->recent_used_cpu;
> +       p->recent_used_cpu = prev;
>         if (recent_used_cpu != prev &&
>             recent_used_cpu != target &&
>             cpus_share_cache(recent_used_cpu, target) &&
>             (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
>             cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
>             asym_fits_capacity(task_util, recent_used_cpu)) {
> -               /*
> -                * Replace recent_used_cpu with prev as it is a potential
> -                * candidate for the next wake:
> -                */
> -               p->recent_used_cpu = prev;
>                 return recent_used_cpu;
>         }
>
> @@ -6768,9 +6764,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>         } else if (wake_flags & WF_TTWU) { /* XXX always ? */
>                 /* Fast path */
>                 new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> -
> -               if (want_affine)
> -                       current->recent_used_cpu = cpu;
>         }
>         rcu_read_unlock();
>
> --
> 2.26.2
>
Vincent Guittot Dec. 10, 2020, 9:40 a.m. UTC | #2
On Tue, 8 Dec 2020 at 17:14, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > After select_idle_sibling, p->recent_used_cpu is set to the
> > new target. However on the next wakeup, prev will be the same as
> > recent_used_cpu unless the load balancer has moved the task since the last
> > wakeup. It still works, but is less efficient than it can be after all
> > the changes that went in since that reduce unnecessary migrations, load
> > balancer changes etc.  This patch preserves recent_used_cpu for longer.
> >
> > With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled
> >
> >                           5.10.0-rc6             5.10.0-rc6
> >                          baseline-v2           altrecent-v2
> > Hmean     1        508.39 (   0.00%)      502.05 *  -1.25%*
> > Hmean     2        986.70 (   0.00%)      983.65 *  -0.31%*
> > Hmean     4       1914.55 (   0.00%)     1920.24 *   0.30%*
> > Hmean     8       3702.37 (   0.00%)     3663.96 *  -1.04%*
> > Hmean     16      6573.11 (   0.00%)     6545.58 *  -0.42%*
> > Hmean     32     10142.57 (   0.00%)    10253.73 *   1.10%*
> > Hmean     64     14348.40 (   0.00%)    12506.31 * -12.84%*
> > Hmean     128    21842.59 (   0.00%)    21967.13 *   0.57%*
> > Hmean     256    20813.75 (   0.00%)    21534.52 *   3.46%*
> > Hmean     320    20684.33 (   0.00%)    21070.14 *   1.87%*
> >
> > The different was marginal except for 64 threads which showed in the
> > baseline that the result was very unstable where as the patch was much
> > more stable. This is somewhat machine specific as on a separate 80-cpu
> > Broadwell machine the same test reported.
> >
> >                           5.10.0-rc6             5.10.0-rc6
> >                          baseline-v2           altrecent-v2
> > Hmean     1        310.36 (   0.00%)      291.81 *  -5.98%*
> > Hmean     2        340.86 (   0.00%)      547.22 *  60.54%*
> > Hmean     4        912.29 (   0.00%)     1063.21 *  16.54%*
> > Hmean     8       2116.40 (   0.00%)     2103.60 *  -0.60%*
> > Hmean     16      4232.90 (   0.00%)     4362.92 *   3.07%*
> > Hmean     32      8442.03 (   0.00%)     8642.10 *   2.37%*
> > Hmean     64     11733.91 (   0.00%)    11473.66 *  -2.22%*
> > Hmean     128    17727.24 (   0.00%)    16784.23 *  -5.32%*
> > Hmean     256    16089.23 (   0.00%)    16110.79 *   0.13%*
> > Hmean     320    15992.60 (   0.00%)    16071.64 *   0.49%*
> >
> > schedstats were not used in this series but from an earlier debugging
> > effort, the schedstats after the test run were as follows;
> >
> > Ops SIS Search               5653107942.00  5726545742.00
> > Ops SIS Domain Search        3365067916.00  3319768543.00
> > Ops SIS Scanned            112173512543.00 99194352541.00
> > Ops SIS Domain Scanned     109885472517.00 96787575342.00
> > Ops SIS Failures             2923185114.00  2950166441.00
> > Ops SIS Recent Used Hit           56547.00   118064916.00
> > Ops SIS Recent Used Miss     1590899250.00   354942791.00
> > Ops SIS Recent Attempts      1590955797.00   473007707.00
> > Ops SIS Search Efficiency             5.04           5.77
> > Ops SIS Domain Search Eff             3.06           3.43
> > Ops SIS Fast Success Rate            40.47          42.03
> > Ops SIS Success Rate                 48.29          48.48
> > Ops SIS Recent Success Rate           0.00          24.96
> >
> > First interesting point is the ridiculous number of times runqueues are
> > enabled -- almost 97 billion times over the course of 40 minutes
> >
> > With the patch, "Recent Used Hit" is over 2000 times more likely to
> > succeed. The failure rate also increases by quite a lot but the cost is
> > marginal even if the "Fast Success Rate" only increases by 2% overall. What
> > cannot be observed from these stats is where the biggest impact as these
> > stats cover low utilisation to over saturation.
> >
> > If graphed over time, the graphs show that the sched domain is only
> > scanned at negligible rates until the machine is fully busy. With
> > low utilisation, the "Fast Success Rate" is almost 100% until the
> > machine is fully busy. For 320 clients, the success rate is close to
> > 0% which is unsurprising.
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

This patch is responsible for a performance regression on my thx2 with
hackbench. So although i reviewed it, it should not be applied as the
change in the behavior is far deeper than expected

>
> > ---
> >  kernel/sched/fair.c | 9 +--------
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5c41875aec23..413d895bbbf8 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >
> >         /* Check a recently used CPU as a potential idle candidate: */
> >         recent_used_cpu = p->recent_used_cpu;
> > +       p->recent_used_cpu = prev;
> >         if (recent_used_cpu != prev &&
> >             recent_used_cpu != target &&
> >             cpus_share_cache(recent_used_cpu, target) &&
> >             (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
> >             cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
> >             asym_fits_capacity(task_util, recent_used_cpu)) {
> > -               /*
> > -                * Replace recent_used_cpu with prev as it is a potential
> > -                * candidate for the next wake:
> > -                */
> > -               p->recent_used_cpu = prev;
> >                 return recent_used_cpu;
> >         }
> >
> > @@ -6768,9 +6764,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >         } else if (wake_flags & WF_TTWU) { /* XXX always ? */
> >                 /* Fast path */
> >                 new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> > -
> > -               if (want_affine)
> > -                       current->recent_used_cpu = cpu;
> >         }
> >         rcu_read_unlock();
> >
> > --
> > 2.26.2
> >
Hillf Danton Dec. 11, 2020, 6:25 a.m. UTC | #3
On Tue,  8 Dec 2020 15:35:00 +0000 Mel Gorman wrote:
> @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  
>  	/* Check a recently used CPU as a potential idle candidate: */
>  	recent_used_cpu = p->recent_used_cpu;
> +	p->recent_used_cpu = prev;
>  	if (recent_used_cpu != prev &&
>  	    recent_used_cpu != target &&
>  	    cpus_share_cache(recent_used_cpu, target) &&
>  	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
>  	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&

Typo? Fix it in spin if so.

>  	    asym_fits_capacity(task_util, recent_used_cpu)) {
> -		/*
> -		 * Replace recent_used_cpu with prev as it is a potential
> -		 * candidate for the next wake:
> -		 */
> -		p->recent_used_cpu = prev;
>  		return recent_used_cpu;

I prefer to update the recent CPU after llc check.

>  	}
>  
> @@ -6768,9 +6764,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>  	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
>  		/* Fast path */
>  		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> -
> -		if (want_affine)
> -			current->recent_used_cpu = cpu;
>  	}
>  	rcu_read_unlock();
>  
> -- 
> 2.26.2
Mel Gorman Dec. 11, 2020, 9:02 a.m. UTC | #4
On Fri, Dec 11, 2020 at 02:25:42PM +0800, Hillf Danton wrote:
> On Tue,  8 Dec 2020 15:35:00 +0000 Mel Gorman wrote:
> > @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >  
> >  	/* Check a recently used CPU as a potential idle candidate: */
> >  	recent_used_cpu = p->recent_used_cpu;
> > +	p->recent_used_cpu = prev;
> >  	if (recent_used_cpu != prev &&
> >  	    recent_used_cpu != target &&
> >  	    cpus_share_cache(recent_used_cpu, target) &&
> >  	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
> >  	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
> 
> Typo? Fix it in spin if so.
> 

What typo?

> >  	    asym_fits_capacity(task_util, recent_used_cpu)) {
> > -		/*
> > -		 * Replace recent_used_cpu with prev as it is a potential
> > -		 * candidate for the next wake:
> > -		 */
> > -		p->recent_used_cpu = prev;
> >  		return recent_used_cpu;
> 
> I prefer to update the recent CPU after llc check.
> 

That would prevent recent_used_cpu leaving the LLC the task first
started on.
Hillf Danton Dec. 11, 2020, 9:34 a.m. UTC | #5
On Fri, 11 Dec 2020 09:02:28 +0000 Mel Gorman wrote:
>On Fri, Dec 11, 2020 at 02:25:42PM +0800, Hillf Danton wrote:
>> On Tue,  8 Dec 2020 15:35:00 +0000 Mel Gorman wrote:
>> > @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>> >  
>> >  	/* Check a recently used CPU as a potential idle candidate: */
>> >  	recent_used_cpu = p->recent_used_cpu;
>> > +	p->recent_used_cpu = prev;
>> >  	if (recent_used_cpu != prev &&
>> >  	    recent_used_cpu != target &&
>> >  	    cpus_share_cache(recent_used_cpu, target) &&
>> >  	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
>> >  	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
>> 
>> Typo? Fix it in spin if so.
>> 
>
>What typo?

After your change it is prev that we check against p->cpus_ptr instead of
the recent CPU. Wonder the point to do such a check for returning the
recent one.
Mel Gorman Dec. 11, 2020, 9:45 a.m. UTC | #6
On Fri, Dec 11, 2020 at 05:34:43PM +0800, Hillf Danton wrote:
> On Fri, 11 Dec 2020 09:02:28 +0000 Mel Gorman wrote:
> >On Fri, Dec 11, 2020 at 02:25:42PM +0800, Hillf Danton wrote:
> >> On Tue,  8 Dec 2020 15:35:00 +0000 Mel Gorman wrote:
> >> > @@ -6277,17 +6277,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >> >  
> >> >  	/* Check a recently used CPU as a potential idle candidate: */
> >> >  	recent_used_cpu = p->recent_used_cpu;
> >> > +	p->recent_used_cpu = prev;
> >> >  	if (recent_used_cpu != prev &&
> >> >  	    recent_used_cpu != target &&
> >> >  	    cpus_share_cache(recent_used_cpu, target) &&
> >> >  	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
> >> >  	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
> >> 
> >> Typo? Fix it in spin if so.
> >> 
> >
> >What typo?
> 
> After your change it is prev that we check against p->cpus_ptr instead of
> the recent CPU. Wonder the point to do such a check for returning the
> recent one.

Ah... yes, this is indeed wrong. It wouldn't affect Vincent's case
that showed a problem with a hackbench configuration (which I'm still
disappointed about as it's a trade-off depending on machine and workload)
but it allows a task to run on the wrong cpu if sched_setscheduler()
was called between wakeup events.
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5c41875aec23..413d895bbbf8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6277,17 +6277,13 @@  static int select_idle_sibling(struct task_struct *p, int prev, int target)
 
 	/* Check a recently used CPU as a potential idle candidate: */
 	recent_used_cpu = p->recent_used_cpu;
+	p->recent_used_cpu = prev;
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
 	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
 	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr) &&
 	    asym_fits_capacity(task_util, recent_used_cpu)) {
-		/*
-		 * Replace recent_used_cpu with prev as it is a potential
-		 * candidate for the next wake:
-		 */
-		p->recent_used_cpu = prev;
 		return recent_used_cpu;
 	}
 
@@ -6768,9 +6764,6 @@  select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
 		/* Fast path */
 		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
-
-		if (want_affine)
-			current->recent_used_cpu = cpu;
 	}
 	rcu_read_unlock();