[RFC,v4,4/6] sched/cpufreq: Introduce sugov_cpu_ramp_boost

Message ID	20200122173538.1142069-5-douglas.raillard@arm.com (mailing list archive)
State	RFC, archived
Headers	show Return-Path: <SRS0=8fnT=3L=vger.kernel.org=linux-pm-owner@kernel.org> From: Douglas RAILLARD <douglas.raillard@arm.com> To: linux-kernel@vger.kernel.org, rjw@rjwysocki.net, viresh.kumar@linaro.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org Cc: douglas.raillard@arm.com, dietmar.eggemann@arm.com, qperret@google.com, linux-pm@vger.kernel.org Subject: [RFC PATCH v4 4/6] sched/cpufreq: Introduce sugov_cpu_ramp_boost Date: Wed, 22 Jan 2020 17:35:36 +0000 Message-Id: <20200122173538.1142069-5-douglas.raillard@arm.com> In-Reply-To: <20200122173538.1142069-1-douglas.raillard@arm.com> References: <20200122173538.1142069-1-douglas.raillard@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	sched/cpufreq: Make schedutil energy aware \| expand [RFC,v4,0/6] sched/cpufreq: Make schedutil energy aware [RFC,v4,1/6] PM: Introduce em_pd_get_higher_freq() [RFC,v4,2/6] sched/cpufreq: Attach perf domain to sugov policy [RFC,v4,3/6] sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() [RFC,v4,4/6] sched/cpufreq: Introduce sugov_cpu_ramp_boost [RFC,v4,5/6] sched/cpufreq: Boost schedutil frequency ramp up [RFC,v4,6/6] sched/cpufreq: Add schedutil_em_tp tracepoint

Douglas RAILLARD Jan. 22, 2020, 5:35 p.m. UTC

Use the utilization signals dynamic to detect when the utilization of a
set of tasks starts increasing because of a change in tasks' behavior.
This allows detecting when spending extra power for faster frequency
ramp up response would be beneficial to the reactivity of the system.

This ramp boost is computed as the difference between util_avg and
util_est_enqueued. This number somehow represents a lower bound of how
much extra utilization this tasks is actually using, compared to our
best current stable knowledge of it (which is util_est_enqueued).

When the set of runnable tasks changes, the boost is disabled as the
impact of blocked utilization on util_avg will make the delta with
util_est_enqueued not very informative.

Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
---
 kernel/sched/cpufreq_schedutil.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

Rafael J. Wysocki Jan. 23, 2020, 3:55 p.m. UTC | #1

On Wed, Jan 22, 2020 at 6:36 PM Douglas RAILLARD
<douglas.raillard@arm.com> wrote:
>
> Use the utilization signals dynamic to detect when the utilization of a
> set of tasks starts increasing because of a change in tasks' behavior.
> This allows detecting when spending extra power for faster frequency
> ramp up response would be beneficial to the reactivity of the system.
>
> This ramp boost is computed as the difference between util_avg and
> util_est_enqueued. This number somehow represents a lower bound of how
> much extra utilization this tasks is actually using, compared to our
> best current stable knowledge of it (which is util_est_enqueued).
>
> When the set of runnable tasks changes, the boost is disabled as the
> impact of blocked utilization on util_avg will make the delta with
> util_est_enqueued not very informative.
>
> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 43 ++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 608963da4916..25a410a1ff6a 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -61,6 +61,10 @@ struct sugov_cpu {
>         unsigned long           bw_dl;
>         unsigned long           max;
>
> +       unsigned long           ramp_boost;
> +       unsigned long           util_est_enqueued;
> +       unsigned long           util_avg;
> +
>         /* The field below is for single-CPU policies only: */
>  #ifdef CONFIG_NO_HZ_COMMON
>         unsigned long           saved_idle_calls;
> @@ -183,6 +187,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
>         }
>  }
>
> +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu)
> +{
> +       return READ_ONCE(sg_cpu->ramp_boost);
> +}

Where exactly is this function used?

> +
> +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu)
> +{
> +       struct rq *rq = cpu_rq(sg_cpu->cpu);
> +       unsigned long util_est_enqueued;
> +       unsigned long util_avg;
> +       unsigned long boost = 0;
> +
> +       util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued);
> +       util_avg = READ_ONCE(rq->cfs.avg.util_avg);
> +
> +       /*
> +        * Boost when util_avg becomes higher than the previous stable
> +        * knowledge of the enqueued tasks' set util, which is CPU's
> +        * util_est_enqueued.
> +        *
> +        * We try to spot changes in the workload itself, so we want to
> +        * avoid the noise of tasks being enqueued/dequeued. To do that,
> +        * we only trigger boosting when the "amount of work" enqueued
> +        * is stable.
> +        */
> +       if (util_est_enqueued == sg_cpu->util_est_enqueued &&
> +           util_avg >= sg_cpu->util_avg &&
> +           util_avg > util_est_enqueued)
> +               boost = util_avg - util_est_enqueued;
> +
> +       sg_cpu->util_est_enqueued = util_est_enqueued;
> +       sg_cpu->util_avg = util_avg;
> +       WRITE_ONCE(sg_cpu->ramp_boost, boost);
> +       return boost;
> +}
> +
>  /**
>   * get_next_freq - Compute a new frequency for a given cpufreq policy.
>   * @sg_policy: schedutil policy object to compute the new frequency for.
> @@ -514,6 +554,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>         busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
>
>         util = sugov_get_util(sg_cpu);
> +       sugov_cpu_ramp_boost_update(sg_cpu);
>         max = sg_cpu->max;
>         util = sugov_iowait_apply(sg_cpu, time, util, max);
>         next_f = get_next_freq(sg_policy, util, max);
> @@ -554,6 +595,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>                 unsigned long j_util, j_max;
>
>                 j_util = sugov_get_util(j_sg_cpu);
> +               if (j_sg_cpu == sg_cpu)
> +                       sugov_cpu_ramp_boost_update(sg_cpu);
>                 j_max = j_sg_cpu->max;
>                 j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
>
> --

Douglas RAILLARD Jan. 23, 2020, 5:21 p.m. UTC | #2

On 1/23/20 3:55 PM, Rafael J. Wysocki wrote:
> On Wed, Jan 22, 2020 at 6:36 PM Douglas RAILLARD
> <douglas.raillard@arm.com> wrote:
>>
>> Use the utilization signals dynamic to detect when the utilization of a
>> set of tasks starts increasing because of a change in tasks' behavior.
>> This allows detecting when spending extra power for faster frequency
>> ramp up response would be beneficial to the reactivity of the system.
>>
>> This ramp boost is computed as the difference between util_avg and
>> util_est_enqueued. This number somehow represents a lower bound of how
>> much extra utilization this tasks is actually using, compared to our
>> best current stable knowledge of it (which is util_est_enqueued).
>>
>> When the set of runnable tasks changes, the boost is disabled as the
>> impact of blocked utilization on util_avg will make the delta with
>> util_est_enqueued not very informative.
>>
>> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
>> ---
>>  kernel/sched/cpufreq_schedutil.c | 43 ++++++++++++++++++++++++++++++++
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 608963da4916..25a410a1ff6a 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -61,6 +61,10 @@ struct sugov_cpu {
>>         unsigned long           bw_dl;
>>         unsigned long           max;
>>
>> +       unsigned long           ramp_boost;
>> +       unsigned long           util_est_enqueued;
>> +       unsigned long           util_avg;
>> +
>>         /* The field below is for single-CPU policies only: */
>>  #ifdef CONFIG_NO_HZ_COMMON
>>         unsigned long           saved_idle_calls;
>> @@ -183,6 +187,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
>>         }
>>  }
>>
>> +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu)
>> +{
>> +       return READ_ONCE(sg_cpu->ramp_boost);
>> +}
> 
> Where exactly is this function used?

In the next commit where the boost value is actually used to do
something. The function is introduced here to keep the
WRITE_ONCE/READ_ONCE pair together.

> 
>> +
>> +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu)
>> +{
>> +       struct rq *rq = cpu_rq(sg_cpu->cpu);
>> +       unsigned long util_est_enqueued;
>> +       unsigned long util_avg;
>> +       unsigned long boost = 0;
>> +
>> +       util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued);
>> +       util_avg = READ_ONCE(rq->cfs.avg.util_avg);
>> +
>> +       /*
>> +        * Boost when util_avg becomes higher than the previous stable
>> +        * knowledge of the enqueued tasks' set util, which is CPU's
>> +        * util_est_enqueued.
>> +        *
>> +        * We try to spot changes in the workload itself, so we want to
>> +        * avoid the noise of tasks being enqueued/dequeued. To do that,
>> +        * we only trigger boosting when the "amount of work" enqueued
>> +        * is stable.
>> +        */
>> +       if (util_est_enqueued == sg_cpu->util_est_enqueued &&
>> +           util_avg >= sg_cpu->util_avg &&
>> +           util_avg > util_est_enqueued)
>> +               boost = util_avg - util_est_enqueued;
>> +
>> +       sg_cpu->util_est_enqueued = util_est_enqueued;
>> +       sg_cpu->util_avg = util_avg;
>> +       WRITE_ONCE(sg_cpu->ramp_boost, boost);
>> +       return boost;
>> +}
>> +
>>  /**
>>   * get_next_freq - Compute a new frequency for a given cpufreq policy.
>>   * @sg_policy: schedutil policy object to compute the new frequency for.
>> @@ -514,6 +554,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>>         busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
>>
>>         util = sugov_get_util(sg_cpu);
>> +       sugov_cpu_ramp_boost_update(sg_cpu);
>>         max = sg_cpu->max;
>>         util = sugov_iowait_apply(sg_cpu, time, util, max);
>>         next_f = get_next_freq(sg_policy, util, max);
>> @@ -554,6 +595,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>>                 unsigned long j_util, j_max;
>>
>>                 j_util = sugov_get_util(j_sg_cpu);
>> +               if (j_sg_cpu == sg_cpu)
>> +                       sugov_cpu_ramp_boost_update(sg_cpu);
>>                 j_max = j_sg_cpu->max;
>>                 j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max);
>>
>> --

Rafael J. Wysocki Jan. 23, 2020, 9:02 p.m. UTC | #3

On Thu, Jan 23, 2020 at 6:21 PM Douglas Raillard
<douglas.raillard@arm.com> wrote:
>
>
>
> On 1/23/20 3:55 PM, Rafael J. Wysocki wrote:
> > On Wed, Jan 22, 2020 at 6:36 PM Douglas RAILLARD
> > <douglas.raillard@arm.com> wrote:
> >>
> >> Use the utilization signals dynamic to detect when the utilization of a
> >> set of tasks starts increasing because of a change in tasks' behavior.
> >> This allows detecting when spending extra power for faster frequency
> >> ramp up response would be beneficial to the reactivity of the system.
> >>
> >> This ramp boost is computed as the difference between util_avg and
> >> util_est_enqueued. This number somehow represents a lower bound of how
> >> much extra utilization this tasks is actually using, compared to our
> >> best current stable knowledge of it (which is util_est_enqueued).
> >>
> >> When the set of runnable tasks changes, the boost is disabled as the
> >> impact of blocked utilization on util_avg will make the delta with
> >> util_est_enqueued not very informative.
> >>
> >> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
> >> ---
> >>  kernel/sched/cpufreq_schedutil.c | 43 ++++++++++++++++++++++++++++++++
> >>  1 file changed, 43 insertions(+)
> >>
> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> >> index 608963da4916..25a410a1ff6a 100644
> >> --- a/kernel/sched/cpufreq_schedutil.c
> >> +++ b/kernel/sched/cpufreq_schedutil.c
> >> @@ -61,6 +61,10 @@ struct sugov_cpu {
> >>         unsigned long           bw_dl;
> >>         unsigned long           max;
> >>
> >> +       unsigned long           ramp_boost;
> >> +       unsigned long           util_est_enqueued;
> >> +       unsigned long           util_avg;
> >> +
> >>         /* The field below is for single-CPU policies only: */
> >>  #ifdef CONFIG_NO_HZ_COMMON
> >>         unsigned long           saved_idle_calls;
> >> @@ -183,6 +187,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> >>         }
> >>  }
> >>
> >> +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu)
> >> +{
> >> +       return READ_ONCE(sg_cpu->ramp_boost);
> >> +}
> >
> > Where exactly is this function used?
>
> In the next commit where the boost value is actually used to do
> something. The function is introduced here to keep the
> WRITE_ONCE/READ_ONCE pair together.

But ramp_boost itself is not really used in this patch too AFAICS.

Douglas RAILLARD Jan. 28, 2020, 3:38 p.m. UTC | #4

On 1/23/20 9:02 PM, Rafael J. Wysocki wrote:
> On Thu, Jan 23, 2020 at 6:21 PM Douglas Raillard
> <douglas.raillard@arm.com> wrote:
>>
>>
>>
>> On 1/23/20 3:55 PM, Rafael J. Wysocki wrote:
>>> On Wed, Jan 22, 2020 at 6:36 PM Douglas RAILLARD
>>> <douglas.raillard@arm.com> wrote:
>>>>
>>>> Use the utilization signals dynamic to detect when the utilization of a
>>>> set of tasks starts increasing because of a change in tasks' behavior.
>>>> This allows detecting when spending extra power for faster frequency
>>>> ramp up response would be beneficial to the reactivity of the system.
>>>>
>>>> This ramp boost is computed as the difference between util_avg and
>>>> util_est_enqueued. This number somehow represents a lower bound of how
>>>> much extra utilization this tasks is actually using, compared to our
>>>> best current stable knowledge of it (which is util_est_enqueued).
>>>>
>>>> When the set of runnable tasks changes, the boost is disabled as the
>>>> impact of blocked utilization on util_avg will make the delta with
>>>> util_est_enqueued not very informative.
>>>>
>>>> Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com>
>>>> ---
>>>>  kernel/sched/cpufreq_schedutil.c | 43 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 43 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>>>> index 608963da4916..25a410a1ff6a 100644
>>>> --- a/kernel/sched/cpufreq_schedutil.c
>>>> +++ b/kernel/sched/cpufreq_schedutil.c
>>>> @@ -61,6 +61,10 @@ struct sugov_cpu {
>>>>         unsigned long           bw_dl;
>>>>         unsigned long           max;
>>>>
>>>> +       unsigned long           ramp_boost;
>>>> +       unsigned long           util_est_enqueued;
>>>> +       unsigned long           util_avg;
>>>> +
>>>>         /* The field below is for single-CPU policies only: */
>>>>  #ifdef CONFIG_NO_HZ_COMMON
>>>>         unsigned long           saved_idle_calls;
>>>> @@ -183,6 +187,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
>>>>         }
>>>>  }
>>>>
>>>> +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu)
>>>> +{
>>>> +       return READ_ONCE(sg_cpu->ramp_boost);
>>>> +}
>>>
>>> Where exactly is this function used?
>>
>> In the next commit where the boost value is actually used to do
>> something. The function is introduced here to keep the
>> WRITE_ONCE/READ_ONCE pair together.
> 
> But ramp_boost itself is not really used in this patch too AFAICS.

I'll squash that patch with the next one where it's actually used then:
sched/cpufreq: Boost schedutil frequency ramp up

Thanks,
Douglas

Peter Zijlstra Feb. 10, 2020, 1:08 p.m. UTC | #5

On Wed, Jan 22, 2020 at 05:35:36PM +0000, Douglas RAILLARD wrote:

> +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu)
> +{
> +	struct rq *rq = cpu_rq(sg_cpu->cpu);
> +	unsigned long util_est_enqueued;
> +	unsigned long util_avg;
> +	unsigned long boost = 0;
> +

Should we NO-OP this function when !sched_feat(UTIL_EST) ?

> +	util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued);

Otherwise you're reading garbage here, no?

> +	util_avg = READ_ONCE(rq->cfs.avg.util_avg);
> +
> +	/*
> +	 * Boost when util_avg becomes higher than the previous stable
> +	 * knowledge of the enqueued tasks' set util, which is CPU's
> +	 * util_est_enqueued.
> +	 *
> +	 * We try to spot changes in the workload itself, so we want to
> +	 * avoid the noise of tasks being enqueued/dequeued. To do that,
> +	 * we only trigger boosting when the "amount of work" enqueued
> +	 * is stable.
> +	 */
> +	if (util_est_enqueued == sg_cpu->util_est_enqueued &&
> +	    util_avg >= sg_cpu->util_avg &&
> +	    util_avg > util_est_enqueued)
> +		boost = util_avg - util_est_enqueued;
> +
> +	sg_cpu->util_est_enqueued = util_est_enqueued;
> +	sg_cpu->util_avg = util_avg;
> +	WRITE_ONCE(sg_cpu->ramp_boost, boost);
> +	return boost;
> +}

Douglas RAILLARD Feb. 13, 2020, 10:49 a.m. UTC | #6

On 2/10/20 1:08 PM, Peter Zijlstra wrote:
> On Wed, Jan 22, 2020 at 05:35:36PM +0000, Douglas RAILLARD wrote:
> 
>> +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu)
>> +{
>> +	struct rq *rq = cpu_rq(sg_cpu->cpu);
>> +	unsigned long util_est_enqueued;
>> +	unsigned long util_avg;
>> +	unsigned long boost = 0;
>> +
> 
> Should we NO-OP this function when !sched_feat(UTIL_EST) ?
> 
>> +	util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued);
> 
> Otherwise you're reading garbage here, no?

Most likely indeed. The boosting should be disabled in that case.

> 
>> +	util_avg = READ_ONCE(rq->cfs.avg.util_avg);
>> +
>> +	/*
>> +	 * Boost when util_avg becomes higher than the previous stable
>> +	 * knowledge of the enqueued tasks' set util, which is CPU's
>> +	 * util_est_enqueued.
>> +	 *
>> +	 * We try to spot changes in the workload itself, so we want to
>> +	 * avoid the noise of tasks being enqueued/dequeued. To do that,
>> +	 * we only trigger boosting when the "amount of work" enqueued
>> +	 * is stable.
>> +	 */
>> +	if (util_est_enqueued == sg_cpu->util_est_enqueued &&
>> +	    util_avg >= sg_cpu->util_avg &&
>> +	    util_avg > util_est_enqueued)
>> +		boost = util_avg - util_est_enqueued;
>> +
>> +	sg_cpu->util_est_enqueued = util_est_enqueued;
>> +	sg_cpu->util_avg = util_avg;
>> +	WRITE_ONCE(sg_cpu->ramp_boost, boost);
>> +	return boost;
>> +}

[RFC,v4,4/6] sched/cpufreq: Introduce sugov_cpu_ramp_boost

Commit Message

Comments

Patch