sched/fair: Prefer small idle cores for forkees

Message ID	20220112143902.13239-1-quic_ctheegal@quicinc.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-arm-msm-owner@kernel.org> From: Chitti Babu Theegala <quic_ctheegal@quicinc.com> To: <mingo@redhat.com>, <peterz@infradead.org>, <juri.lelli@redhat.com>, <vincent.guittot@linaro.org>, <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>, <joel@joelfernandes.org> CC: <linux-arm-msm@vger.kernel.org>, <quic_lingutla@quicinc.com>, <linux-kernel@vger.kernel.org>, <quic_rjendra@quicinc.com>, "Chitti Babu Theegala" <quic_ctheegal@quicinc.com> Subject: [PATCH] sched/fair: Prefer small idle cores for forkees Date: Wed, 12 Jan 2022 20:09:02 +0530 Message-ID: <20220112143902.13239-1-quic_ctheegal@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	sched/fair: Prefer small idle cores for forkees \| expand sched/fair: Prefer small idle cores for forkees

Message ID

20220112143902.13239-1-quic_ctheegal@quicinc.com (mailing list archive)

State

Superseded

Headers

From: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
To: <mingo@redhat.com>, <peterz@infradead.org>,
        <juri.lelli@redhat.com>, <vincent.guittot@linaro.org>,
        <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
        <joel@joelfernandes.org>
CC: <linux-arm-msm@vger.kernel.org>, <quic_lingutla@quicinc.com>,
        <linux-kernel@vger.kernel.org>, <quic_rjendra@quicinc.com>,
        "Chitti Babu Theegala" <quic_ctheegal@quicinc.com>
Subject: [PATCH] sched/fair: Prefer small idle cores for forkees
Date: Wed, 12 Jan 2022 20:09:02 +0530
Message-ID: <20220112143902.13239-1-quic_ctheegal@quicinc.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

sched/fair: Prefer small idle cores for forkees | expand

Commit Message

Chitti Babu Theegala Jan. 12, 2022, 2:39 p.m. UTC

Newly forked threads don't have any useful utilization data yet and
it's not possible to forecast their impact on energy consumption.
These forkees (though very small, most times) end up waking big
cores from deep sleep for that very small durations.

Bias all forkees to small cores to prevent waking big cores from deep
sleep to save power.

Signed-off-by: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
---
 kernel/sched/fair.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Vincent Donnefort Jan. 13, 2022, 4:35 p.m. UTC | #1

On Wed, Jan 12, 2022 at 08:09:02PM +0530, Chitti Babu Theegala wrote:
> Newly forked threads don't have any useful utilization data yet and
> it's not possible to forecast their impact on energy consumption.
>update_pick_idlest These forkees (though very small, most times) end up waking big
> cores from deep sleep for that very small durations.
> 
> Bias all forkees to small cores to prevent waking big cores from deep
> sleep to save power.

This bias might be interesting for some workloads, but what about the
others? (see find_energy_efficient_cpu() comment, which discusses forkees).

> 
> Signed-off-by: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
> ---
>  kernel/sched/fair.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6..d407bbc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5976,7 +5976,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  }
>  
>  static struct sched_group *
> -find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag);
>  
>  /*
>   * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
> @@ -6063,7 +6063,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>  			continue;
>  		}
>  
> -		group = find_idlest_group(sd, p, cpu);
> +		group = find_idlest_group(sd, p, cpu, sd_flag);
>  		if (!group) {
>  			sd = sd->child;
>  			continue;
> @@ -8997,7 +8997,8 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
>  static bool update_pick_idlest(struct sched_group *idlest,
>  			       struct sg_lb_stats *idlest_sgs,
>  			       struct sched_group *group,
> -			       struct sg_lb_stats *sgs)
> +			       struct sg_lb_stats *sgs,
> +			       int sd_flag)
>  {
>  	if (sgs->group_type < idlest_sgs->group_type)
>  		return true;
> @@ -9034,6 +9035,11 @@ static bool update_pick_idlest(struct sched_group *idlest,
>  		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
>  			return false;
>  
> +		/* Select smaller cpu group for newly woken up forkees */
> +		if ((sd_flag & SD_BALANCE_FORK) && (idlest_sgs->idle_cpus &&
> +			!capacity_greater(idlest->sgc->max_capacity, group->sgc->max_capacity)))
> +			return false;
> +

Energy biased placement should probably be applied only when EAS is enabled.

It's especially true here, if all CPUs have the same capacity, capacity_greater
would be always false. So unless I missed something, we wouldn't let the group_util
evaluation happen, would we?

[...]

Chitti Babu Theegala Jan. 20, 2022, 4:45 p.m. UTC | #2

On 1/13/2022 10:05 PM, Vincent Donnefort wrote:
> On Wed, Jan 12, 2022 at 08:09:02PM +0530, Chitti Babu Theegala wrote:
>> Newly forked threads don't have any useful utilization data yet and
>> it's not possible to forecast their impact on energy consumption.
>> update_pick_idlest These forkees (though very small, most times) end up waking big
>> cores from deep sleep for that very small durations.
>>
>> Bias all forkees to small cores to prevent waking big cores from deep
>> sleep to save power.
> 
> This bias might be interesting for some workloads, but what about the
> others? (see find_energy_efficient_cpu() comment, which discusses forkees).
> 

Yes, I agree with the find_energy_efficient_cpu() comment that we don't 
have any useful utilization data yet and hence not possible to forecast. 
However, I don't see any point in penalizing the power by waking up 
bigger cores which are in deep sleep state for very small workloads.

This patch helps lighter workloads during idle conditions w.r.t power 
POV. For active (interactive or heavier) workloads, on most big.Little 
systems' these foreground tasks get pulled into gold affined cpu-sets 
where this patch would not play any spoilsport. Even for systems with 
such cpu-sets not defined, heavy workloads might need just another 1 or 
2 scheduling windows for ramping to better freq or core.

>>
>> Signed-off-by: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
>> ---
>>   kernel/sched/fair.c | 16 +++++++++++-----
>>   1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6e476f6..d407bbc 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5976,7 +5976,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>   }
>>   
>>   static struct sched_group *
>> -find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
>> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag);
>>   
>>   /*
>>    * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
>> @@ -6063,7 +6063,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>>   			continue;
>>   		}
>>   
>> -		group = find_idlest_group(sd, p, cpu);
>> +		group = find_idlest_group(sd, p, cpu, sd_flag);
>>   		if (!group) {
>>   			sd = sd->child;
>>   			continue;
>> @@ -8997,7 +8997,8 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
>>   static bool update_pick_idlest(struct sched_group *idlest,
>>   			       struct sg_lb_stats *idlest_sgs,
>>   			       struct sched_group *group,
>> -			       struct sg_lb_stats *sgs)
>> +			       struct sg_lb_stats *sgs,
>> +			       int sd_flag)
>>   {
>>   	if (sgs->group_type < idlest_sgs->group_type)
>>   		return true;
>> @@ -9034,6 +9035,11 @@ static bool update_pick_idlest(struct sched_group *idlest,
>>   		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
>>   			return false;
>>   
>> +		/* Select smaller cpu group for newly woken up forkees */
>> +		if ((sd_flag & SD_BALANCE_FORK) && (idlest_sgs->idle_cpus &&
>> +			!capacity_greater(idlest->sgc->max_capacity, group->sgc->max_capacity)))
>> +			return false;
>> +
> 
> Energy biased placement should probably be applied only when EAS is enabled.
> 
> It's especially true here, if all CPUs have the same capacity, capacity_greater
> would be always false. So unless I missed something, we wouldn't let the group_util
> evaluation happen, would we?

True. I am uploading new version patch with a EAS enablement check in place.

> 
> [...]

Vincent Donnefort Jan. 21, 2022, 10:17 a.m. UTC | #3

On Thu, Jan 20, 2022 at 10:15:07PM +0530, Chitti Babu Theegala wrote:
> 
> 
> On 1/13/2022 10:05 PM, Vincent Donnefort wrote:
> > On Wed, Jan 12, 2022 at 08:09:02PM +0530, Chitti Babu Theegala wrote:
> > > Newly forked threads don't have any useful utilization data yet and
> > > it's not possible to forecast their impact on energy consumption.
> > > update_pick_idlest These forkees (though very small, most times) end up waking big
> > > cores from deep sleep for that very small durations.
> > > 
> > > Bias all forkees to small cores to prevent waking big cores from deep
> > > sleep to save power.
> > 
> > This bias might be interesting for some workloads, but what about the
> > others? (see find_energy_efficient_cpu() comment, which discusses forkees).
> > 
> 
> Yes, I agree with the find_energy_efficient_cpu() comment that we don't have
> any useful utilization data yet and hence not possible to forecast. However,
> I don't see any point in penalizing the power by waking up bigger cores
> which are in deep sleep state for very small workloads.
> 
> This patch helps lighter workloads during idle conditions w.r.t power POV.
> For active (interactive or heavier) workloads, on most big.Little systems'
> these foreground tasks get pulled into gold affined cpu-sets where this
> patch would not play any spoilsport. Even for systems with such cpu-sets not
> defined, heavy workloads might need just another 1 or 2 scheduling windows
> for ramping to better freq or core.

Scheduling windows? I suppose you do not refer to PELT here, so I'm not sure
this argument applies here.

Beside, CFS always bias toward performance (except feec(), which does it in a
lesser extent).

> 
> > > 
> > > Signed-off-by: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
> > > ---
> > >   kernel/sched/fair.c | 16 +++++++++++-----
> > >   1 file changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 6e476f6..d407bbc 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5976,7 +5976,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > >   }
> > >   static struct sched_group *
> > > -find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
> > > +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag);
> > >   /*
> > >    * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
> > > @@ -6063,7 +6063,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
> > >   			continue;
> > >   		}
> > > -		group = find_idlest_group(sd, p, cpu);
> > > +		group = find_idlest_group(sd, p, cpu, sd_flag);
> > >   		if (!group) {
> > >   			sd = sd->child;
> > >   			continue;
> > > @@ -8997,7 +8997,8 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
> > >   static bool update_pick_idlest(struct sched_group *idlest,
> > >   			       struct sg_lb_stats *idlest_sgs,
> > >   			       struct sched_group *group,
> > > -			       struct sg_lb_stats *sgs)
> > > +			       struct sg_lb_stats *sgs,
> > > +			       int sd_flag)
> > >   {
> > >   	if (sgs->group_type < idlest_sgs->group_type)
> > >   		return true;
> > > @@ -9034,6 +9035,11 @@ static bool update_pick_idlest(struct sched_group *idlest,
> > >   		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
> > >   			return false;
> > > +		/* Select smaller cpu group for newly woken up forkees */
> > > +		if ((sd_flag & SD_BALANCE_FORK) && (idlest_sgs->idle_cpus &&
> > > +			!capacity_greater(idlest->sgc->max_capacity, group->sgc->max_capacity)))
> > > +			return false;
> > > +
> > 
> > Energy biased placement should probably be applied only when EAS is enabled.
> > 
> > It's especially true here, if all CPUs have the same capacity, capacity_greater
> > would be always false. So unless I missed something, we wouldn't let the group_util
> > evaluation happen, would we?
> 
> True. I am uploading new version patch with a EAS enablement check in place.
> 
> > 
> > [...]

Chitti Babu Theegala Jan. 25, 2022, 7:13 a.m. UTC | #4

On 1/21/2022 3:47 PM, Vincent Donnefort wrote:
> On Thu, Jan 20, 2022 at 10:15:07PM +0530, Chitti Babu Theegala wrote:
>>
>>
>> On 1/13/2022 10:05 PM, Vincent Donnefort wrote:
>>> On Wed, Jan 12, 2022 at 08:09:02PM +0530, Chitti Babu Theegala wrote:
>>>> Newly forked threads don't have any useful utilization data yet and
>>>> it's not possible to forecast their impact on energy consumption.
>>>> update_pick_idlest These forkees (though very small, most times) end up waking big
>>>> cores from deep sleep for that very small durations.
>>>>
>>>> Bias all forkees to small cores to prevent waking big cores from deep
>>>> sleep to save power.
>>>
>>> This bias might be interesting for some workloads, but what about the
>>> others? (see find_energy_efficient_cpu() comment, which discusses forkees).
>>>
>>
>> Yes, I agree with the find_energy_efficient_cpu() comment that we don't have
>> any useful utilization data yet and hence not possible to forecast. However,
>> I don't see any point in penalizing the power by waking up bigger cores
>> which are in deep sleep state for very small workloads.
>>
>> This patch helps lighter workloads during idle conditions w.r.t power POV.
>> For active (interactive or heavier) workloads, on most big.Little systems'
>> these foreground tasks get pulled into gold affined cpu-sets where this
>> patch would not play any spoilsport. Even for systems with such cpu-sets not
>> defined, heavy workloads might need just another 1 or 2 scheduling windows
>> for ramping to better freq or core.
> 
> Scheduling windows? I suppose you do not refer to PELT here, so I'm not sure
> this argument applies here.

Sorry. I didn’t mean it to be WALT. I meant that ramp up would happen in 
next couple of ms which can give very small penalty for such heavy 
workloads for the initial ms.

> 
> Beside, CFS always bias toward performance (except feec(), which does it in a
> lesser extent).
> 

Yes, aware that CFS is perf bias. Can we have a knob atleast which can 
turn-on such power friendly features ?

>>
>>>>
>>>> Signed-off-by: Chitti Babu Theegala <quic_ctheegal@quicinc.com>
>>>> ---
>>>>    kernel/sched/fair.c | 16 +++++++++++-----
>>>>    1 file changed, 11 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 6e476f6..d407bbc 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -5976,7 +5976,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>>>    }
>>>>    static struct sched_group *
>>>> -find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
>>>> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag);
>>>>    /*
>>>>     * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
>>>> @@ -6063,7 +6063,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>>>>    			continue;
>>>>    		}
>>>> -		group = find_idlest_group(sd, p, cpu);
>>>> +		group = find_idlest_group(sd, p, cpu, sd_flag);
>>>>    		if (!group) {
>>>>    			sd = sd->child;
>>>>    			continue;
>>>> @@ -8997,7 +8997,8 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
>>>>    static bool update_pick_idlest(struct sched_group *idlest,
>>>>    			       struct sg_lb_stats *idlest_sgs,
>>>>    			       struct sched_group *group,
>>>> -			       struct sg_lb_stats *sgs)
>>>> +			       struct sg_lb_stats *sgs,
>>>> +			       int sd_flag)
>>>>    {
>>>>    	if (sgs->group_type < idlest_sgs->group_type)
>>>>    		return true;
>>>> @@ -9034,6 +9035,11 @@ static bool update_pick_idlest(struct sched_group *idlest,
>>>>    		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
>>>>    			return false;
>>>> +		/* Select smaller cpu group for newly woken up forkees */
>>>> +		if ((sd_flag & SD_BALANCE_FORK) && (idlest_sgs->idle_cpus &&
>>>> +			!capacity_greater(idlest->sgc->max_capacity, group->sgc->max_capacity)))
>>>> +			return false;
>>>> +
>>>
>>> Energy biased placement should probably be applied only when EAS is enabled.
>>>
>>> It's especially true here, if all CPUs have the same capacity, capacity_greater
>>> would be always false. So unless I missed something, we wouldn't let the group_util
>>> evaluation happen, would we?
>>
>> True. I am uploading new version patch with a EAS enablement check in place.
>>
>>>
>>> [...]

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6..d407bbc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5976,7 +5976,7 @@  static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 }
 
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag);
 
 /*
  * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
@@ -6063,7 +6063,7 @@  static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 			continue;
 		}
 
-		group = find_idlest_group(sd, p, cpu);
+		group = find_idlest_group(sd, p, cpu, sd_flag);
 		if (!group) {
 			sd = sd->child;
 			continue;
@@ -8997,7 +8997,8 @@  static inline void update_sg_wakeup_stats(struct sched_domain *sd,
 static bool update_pick_idlest(struct sched_group *idlest,
 			       struct sg_lb_stats *idlest_sgs,
 			       struct sched_group *group,
-			       struct sg_lb_stats *sgs)
+			       struct sg_lb_stats *sgs,
+			       int sd_flag)
 {
 	if (sgs->group_type < idlest_sgs->group_type)
 		return true;
@@ -9034,6 +9035,11 @@  static bool update_pick_idlest(struct sched_group *idlest,
 		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
 			return false;
 
+		/* Select smaller cpu group for newly woken up forkees */
+		if ((sd_flag & SD_BALANCE_FORK) && (idlest_sgs->idle_cpus &&
+			!capacity_greater(idlest->sgc->max_capacity, group->sgc->max_capacity)))
+			return false;
+
 		/* Select group with lowest group_util */
 		if (idlest_sgs->idle_cpus == sgs->idle_cpus &&
 			idlest_sgs->group_util <= sgs->group_util)
@@ -9062,7 +9068,7 @@  static inline bool allow_numa_imbalance(int dst_running, int dst_weight)
  * Assumes p is allowed on at least one CPU in sd.
  */
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu, int sd_flag)
 {
 	struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
 	struct sg_lb_stats local_sgs, tmp_sgs;
@@ -9097,7 +9103,7 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 
 		update_sg_wakeup_stats(sd, group, sgs, p);
 
-		if (!local_group && update_pick_idlest(idlest, &idlest_sgs, group, sgs)) {
+		if (!local_group && update_pick_idlest(idlest, &idlest_sgs, group, sgs, sd_flag)) {
 			idlest = group;
 			idlest_sgs = *sgs;
 		}

sched/fair: Prefer small idle cores for forkees

Commit Message

Comments

Patch