diff mbox series

[RFC,1/1] cpuidle: teo: Add optional util-awareness

Message ID 20220915164411.2496380-2-kajetan.puchalski@arm.com (mailing list archive)
State Superseded, archived
Headers show
Series cpuidle: teo: Introduce optional util-awareness | expand

Commit Message

Kajetan Puchalski Sept. 15, 2022, 4:44 p.m. UTC
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely impact
performance due to increased latency. Additionally, if the CPU wakes up
from a deeper sleep before its target residency as is often the case, it
results in a waste of energy on top of that.

This patch extends the TEO governor with an optional mechanism adding
util-awareness, effectively providing a way for the governor to switch
between only selecting the shallowest idle state when the cpu is being
utilized over a certain threshold and trying to select the deepest possible
state using TEO's metrics when the cpu is not being utilized. This is now
possible since the CPU utilization is exported from the scheduler with the
sched_cpu_util function and already used e.g. in the thermal governor IPA.

This can provide drastically decreased latency and performance benefits in
certain types of mobile workloads that are sensitive to latency,
such as Geekbench 5.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
---
 drivers/cpuidle/Kconfig         | 12 +++++
 drivers/cpuidle/governors/teo.c | 86 +++++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

Comments

Doug Smythies Sept. 17, 2022, 10:58 p.m. UTC | #1
On Thu, Sep 15, 2022 at 9:45 AM Kajetan Puchalski
<kajetan.puchalski@arm.com> wrote:
>
> Modern interactive systems, such as recent Android phones, tend to have
> power efficient shallow idle states. Selecting deeper idle states on a
> device while a latency-sensitive workload is running can adversely impact
> performance due to increased latency. Additionally, if the CPU wakes up
> from a deeper sleep before its target residency as is often the case, it
> results in a waste of energy on top of that.
>
> This patch extends the TEO governor with an optional mechanism adding
> util-awareness, effectively providing a way for the governor to switch
> between only selecting the shallowest idle state when the cpu is being
> utilized over a certain threshold and trying to select the deepest possible
> state using TEO's metrics when the cpu is not being utilized. This is now
> possible since the CPU utilization is exported from the scheduler with the
> sched_cpu_util function and already used e.g. in the thermal governor IPA.
>
> This can provide drastically decreased latency and performance benefits in
> certain types of mobile workloads that are sensitive to latency,
> such as Geekbench 5.
>
> Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
> ---
>  drivers/cpuidle/Kconfig         | 12 +++++
>  drivers/cpuidle/governors/teo.c | 86 +++++++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
>
> diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
> index ff71dd662880..6b66ee88a2b2 100644
> --- a/drivers/cpuidle/Kconfig
> +++ b/drivers/cpuidle/Kconfig
> @@ -33,6 +33,18 @@ config CPU_IDLE_GOV_TEO
>           Some workloads benefit from using it and it generally should be safe
>           to use.  Say Y here if you are not happy with the alternatives.
>
> +config CPU_IDLE_GOV_TEO_UTIL_AWARE
> +       bool "Util-awareness mechanism for TEO"
> +       depends on CPU_IDLE_GOV_TEO
> +       help
> +         Util-awareness mechanism for the TEO governor. With this enabled,
> +         the governor will choose the shallowest available state when the
> +         CPU's average util is above a certain threshold and default to
> +         using the metrics-based approach when it's not.
> +
> +         Some latency-sensitive workloads on interactive devices can benefit
> +         from using it.
> +
>  config CPU_IDLE_GOV_HALTPOLL
>         bool "Haltpoll governor (for virtualized systems)"
>         depends on KVM_GUEST
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index d9262db79cae..fd5b2eb750be 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -2,8 +2,13 @@
>  /*
>   * Timer events oriented CPU idle governor
>   *
> + * TEO governor:
>   * Copyright (C) 2018 - 2021 Intel Corporation
>   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> + *
> + * Util-awareness mechanism:
> + * Copyright (C) 2022 Arm Ltd.
> + * Author: Kajetan Puchalski <kajetan.puchalski@arm.com>
>   */
>
>  /**
> @@ -99,14 +104,48 @@
>   *      select the given idle state instead of the candidate one.
>   *
>   * 3. By default, select the candidate state.
> + *
> + * Util-awareness mechanism:
> + *
> + * The idea behind the util-awareness extension is that there are two distinct
> + * scenarios for the CPU which should result in two different approaches to idle
> + * state selection - utilized and not utilized.
> + *
> + * In this case, 'utilized' means that the average runqueue util of the CPU is
> + * above a certain threshold.
> + *
> + * When the CPU is utilized while going into idle, more likely than not it will
> + * be woken up to do more work soon and so the shallowest idle state should be
> + * selected to minimise latency and maximise performance. When the CPU is not
> + * being utilized, the usual metrics-based approach to selecting the deepest
> + * available idle state should be preferred to take advantage of the power
> + * saving.
> + *
> + * In order to achieve this, the governor uses a utilization threshold.
> + * The threshold is computed per-cpu as a percentage of the CPU's capacity
> + * by bit shifting the capacity value. Based on testing, the shift of 6 (~1.56%)
> + * seems to be getting the best results.
> + *
> + * Before selecting the next idle state, the governor compares the current CPU
> + * util to the precomputed util threhsold. If it's below, it defaults to the

threshold

> + * TEO metrics mechanism. If it's above, it simply selects the shallowest
> + * enabled idle state.
>   */
>
>  #include <linux/cpuidle.h>
>  #include <linux/jiffies.h>
>  #include <linux/kernel.h>
> +#include <linux/sched.h>

I think it also needs this line:

+#include <linux/sched/topology.h>

At least for me, it didn't compile without it.

>  #include <linux/sched/clock.h>
>  #include <linux/tick.h>
>
> +/*
> + * The number of bits to shift the cpu's capacity by in order to determine
> + * the utilized threshold
> + */
> +#define UTIL_THRESHOLD_SHIFT 6
> +
> +
>  /*
>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
>   * is used for decreasing metrics on a regular basis.
> @@ -140,6 +179,8 @@ struct teo_bin {
>   * @total: Grand total of the "intercepts" and "hits" mertics for all bins.

metrics

>   * @next_recent_idx: Index of the next @recent_idx entry to update.
>   * @recent_idx: Indices of bins corresponding to recent "intercepts".
> + * @util_threshold: Threshold above which the CPU is considered utilized
> + * @utilized: Whether the last sleep on the CPU happened while utilized
>   */
>  struct teo_cpu {
>         s64 time_span_ns;
> @@ -148,10 +189,28 @@ struct teo_cpu {
>         unsigned int total;
>         int next_recent_idx;
>         int recent_idx[NR_RECENT];
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +       unsigned long util_threshold;
> +       bool utilized;
> +#endif
>  };
>
>  static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
>
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +/**
> + * teo_get_util - Update the CPU utilized status
> + * @dev: Target CPU
> + * @cpu_data: Governor CPU data for the target CPU
> + */
> +static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data)
> +{
> +       unsigned long util = sched_cpu_util(dev->cpu);
> +
> +       cpu_data->utilized = util > cpu_data->util_threshold;
> +}
> +#endif
> +
>  /**
>   * teo_update - Update CPU metrics after wakeup.
>   * @drv: cpuidle driver containing state data.
> @@ -301,7 +360,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>         int i;
>
>         if (dev->last_state_idx >= 0) {
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +               /* don't update metrics if the cpu was utilized during the last sleep */
> +               if (!cpu_data->utilized)
> +                       teo_update(drv, dev);
> +#else
>                 teo_update(drv, dev);
> +#endif
>                 dev->last_state_idx = -1;
>         }
>
> @@ -321,6 +386,21 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                         goto end;
>         }
>
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +       teo_get_util(dev, cpu_data);
> +       /* if the cpu is being utilized, choose the shallowest state and exit */
> +       if (cpu_data->utilized) {
> +               for (i = 0; i < drv->state_count; ++i) {
> +                       if (dev->states_usage[i].disable)
> +                               continue;
> +                       break;
> +               }
> +
> +               idx = i;
> +               goto end;
> +       }
> +#endif
> +
>         /*
>          * Find the deepest idle state whose target residency does not exceed
>          * the current sleep length and the deepest idle state not deeper than
> @@ -508,9 +588,15 @@ static int teo_enable_device(struct cpuidle_driver *drv,
>                              struct cpuidle_device *dev)
>  {
>         struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +       unsigned long max_capacity = arch_scale_cpu_capacity(dev->cpu);
> +#endif
>         int i;
>
>         memset(cpu_data, 0, sizeof(*cpu_data));
> +#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
> +       cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
> +#endif
>
>         for (i = 0; i < NR_RECENT; i++)
>                 cpu_data->recent_idx[i] = -1;
> --
> 2.37.1
>
Chen Yu Sept. 18, 2022, 7:17 a.m. UTC | #2
On Fri, Sep 16, 2022 at 12:49 AM Kajetan Puchalski
<kajetan.puchalski@arm.com> wrote:
>
> Modern interactive systems, such as recent Android phones, tend to have
> power efficient shallow idle states. Selecting deeper idle states on a
> device while a latency-sensitive workload is running can adversely impact
> performance due to increased latency. Additionally, if the CPU wakes up
> from a deeper sleep before its target residency as is often the case, it
> results in a waste of energy on top of that.
>
> This patch extends the TEO governor with an optional mechanism adding
> util-awareness, effectively providing a way for the governor to switch
> between only selecting the shallowest idle state when the cpu is being
> utilized over a certain threshold and trying to select the deepest possible
> state using TEO's metrics when the cpu is not being utilized.
Not sure if we can use util_avg as schedutil, but it looks interesting.
The last time I was trying to propose an idea to leverage util_avg to
optimize some
codes in the kernel, it was suggested that it would be better to make
the stategy
gradual rather than 0,1 state. So I was thinking if we could make it
something like:

next_idx = cpuidle_select();
next_idx = next_idx * (cpu_cap - util_avg) / cpu_cap;

The lower the util_avg is, the more we honor the choice of the governor,
vice versa.
> This is now possible since the CPU utilization is exported from the scheduler with the
> sched_cpu_util function and already used e.g. in the thermal governor IPA.
>
> This can provide drastically decreased latency and performance benefits in
> certain types of mobile workloads that are sensitive to latency,
> such as Geekbench 5.
As Doug mentioned in another thread, the impact data to energy consumption would
also be interesting.

thanks,
Chenyu
Kajetan Puchalski Sept. 20, 2022, 9:38 a.m. UTC | #3
> Not sure if we can use util_avg as schedutil, but it looks interesting.
> The last time I was trying to propose an idea to leverage util_avg to
> optimize some
> codes in the kernel, it was suggested that it would be better to make
> the stategy
> gradual rather than 0,1 state. So I was thinking if we could make it
> something like:
> 
> next_idx = cpuidle_select();
> next_idx = next_idx * (cpu_cap - util_avg) / cpu_cap;
> 
> The lower the util_avg is, the more we honor the choice of the governor,
> vice versa.

Would that be in order to still make use of intermediate idle states (ie
the ones between first and last) or to change how the util threshold
works? It seems similar to the issue Doug pointed out.

I think there's two scenarios here, the idle landscape on Arm just looks
really different from the one on x86/Intel and we should probably
account for that. In our use case "gradual" and 0-1 are the same thing,
it's just all about how you set the threshold. On x86 on the other hand
you have the threshold and the approach to state selection to worry about.

This just further makes me think that separating this out into a
separate governor is preferable as this can work really nicely on
certain systems like ours and really badly on others like Doug's. We
probably shouldn't be bundling this with generic solutions like TEO that
work well across the board.

It might also make sense to have slightly different implementations for
x86 and arm to account for the hardware differences but that'd also be
up to Rafael to express a view on.

> > This is now possible since the CPU utilization is exported from the scheduler with the
> > sched_cpu_util function and already used e.g. in the thermal governor IPA.
> >
> > This can provide drastically decreased latency and performance benefits in
> > certain types of mobile workloads that are sensitive to latency,
> > such as Geekbench 5.
> As Doug mentioned in another thread, the impact data to energy consumption would
> also be interesting.

I included energy consumption plots in the pdf I linked in the cover
letter, here's the link:

https://github.com/mrkajetanp/lisa-notebooks/blob/a2361a5b647629bfbfc676b942c8e6498fb9bd03/idle_util_aware.pdf

The unit on the plots is gmean mW measurement so they reflect average
power usage over the course of the benchmark. They also include a column
with 'shallow' which shows power consumption with only C0 and visualises
why this works on arm and how different this is compared to x86
behaviour described by Doug.

> thanks,
> Chenyu
diff mbox series

Patch

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index ff71dd662880..6b66ee88a2b2 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -33,6 +33,18 @@  config CPU_IDLE_GOV_TEO
 	  Some workloads benefit from using it and it generally should be safe
 	  to use.  Say Y here if you are not happy with the alternatives.
 
+config CPU_IDLE_GOV_TEO_UTIL_AWARE
+	bool "Util-awareness mechanism for TEO"
+	depends on CPU_IDLE_GOV_TEO
+	help
+	  Util-awareness mechanism for the TEO governor. With this enabled,
+	  the governor will choose the shallowest available state when the
+	  CPU's average util is above a certain threshold and default to
+	  using the metrics-based approach when it's not.
+
+	  Some latency-sensitive workloads on interactive devices can benefit
+	  from using it.
+
 config CPU_IDLE_GOV_HALTPOLL
 	bool "Haltpoll governor (for virtualized systems)"
 	depends on KVM_GUEST
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index d9262db79cae..fd5b2eb750be 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -2,8 +2,13 @@ 
 /*
  * Timer events oriented CPU idle governor
  *
+ * TEO governor:
  * Copyright (C) 2018 - 2021 Intel Corporation
  * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+ *
+ * Util-awareness mechanism:
+ * Copyright (C) 2022 Arm Ltd.
+ * Author: Kajetan Puchalski <kajetan.puchalski@arm.com>
  */
 
 /**
@@ -99,14 +104,48 @@ 
  *      select the given idle state instead of the candidate one.
  *
  * 3. By default, select the candidate state.
+ *
+ * Util-awareness mechanism:
+ *
+ * The idea behind the util-awareness extension is that there are two distinct
+ * scenarios for the CPU which should result in two different approaches to idle
+ * state selection - utilized and not utilized.
+ *
+ * In this case, 'utilized' means that the average runqueue util of the CPU is
+ * above a certain threshold.
+ *
+ * When the CPU is utilized while going into idle, more likely than not it will
+ * be woken up to do more work soon and so the shallowest idle state should be
+ * selected to minimise latency and maximise performance. When the CPU is not
+ * being utilized, the usual metrics-based approach to selecting the deepest
+ * available idle state should be preferred to take advantage of the power
+ * saving.
+ *
+ * In order to achieve this, the governor uses a utilization threshold.
+ * The threshold is computed per-cpu as a percentage of the CPU's capacity
+ * by bit shifting the capacity value. Based on testing, the shift of 6 (~1.56%)
+ * seems to be getting the best results.
+ *
+ * Before selecting the next idle state, the governor compares the current CPU
+ * util to the precomputed util threhsold. If it's below, it defaults to the
+ * TEO metrics mechanism. If it's above, it simply selects the shallowest
+ * enabled idle state.
  */
 
 #include <linux/cpuidle.h>
 #include <linux/jiffies.h>
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/tick.h>
 
+/*
+ * The number of bits to shift the cpu's capacity by in order to determine
+ * the utilized threshold
+ */
+#define UTIL_THRESHOLD_SHIFT 6
+
+
 /*
  * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
  * is used for decreasing metrics on a regular basis.
@@ -140,6 +179,8 @@  struct teo_bin {
  * @total: Grand total of the "intercepts" and "hits" mertics for all bins.
  * @next_recent_idx: Index of the next @recent_idx entry to update.
  * @recent_idx: Indices of bins corresponding to recent "intercepts".
+ * @util_threshold: Threshold above which the CPU is considered utilized
+ * @utilized: Whether the last sleep on the CPU happened while utilized
  */
 struct teo_cpu {
 	s64 time_span_ns;
@@ -148,10 +189,28 @@  struct teo_cpu {
 	unsigned int total;
 	int next_recent_idx;
 	int recent_idx[NR_RECENT];
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+	unsigned long util_threshold;
+	bool utilized;
+#endif
 };
 
 static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
 
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+/**
+ * teo_get_util - Update the CPU utilized status
+ * @dev: Target CPU
+ * @cpu_data: Governor CPU data for the target CPU
+ */
+static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data)
+{
+	unsigned long util = sched_cpu_util(dev->cpu);
+
+	cpu_data->utilized = util > cpu_data->util_threshold;
+}
+#endif
+
 /**
  * teo_update - Update CPU metrics after wakeup.
  * @drv: cpuidle driver containing state data.
@@ -301,7 +360,13 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	int i;
 
 	if (dev->last_state_idx >= 0) {
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+		/* don't update metrics if the cpu was utilized during the last sleep */
+		if (!cpu_data->utilized)
+			teo_update(drv, dev);
+#else
 		teo_update(drv, dev);
+#endif
 		dev->last_state_idx = -1;
 	}
 
@@ -321,6 +386,21 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			goto end;
 	}
 
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+	teo_get_util(dev, cpu_data);
+	/* if the cpu is being utilized, choose the shallowest state and exit */
+	if (cpu_data->utilized) {
+		for (i = 0; i < drv->state_count; ++i) {
+			if (dev->states_usage[i].disable)
+				continue;
+			break;
+		}
+
+		idx = i;
+		goto end;
+	}
+#endif
+
 	/*
 	 * Find the deepest idle state whose target residency does not exceed
 	 * the current sleep length and the deepest idle state not deeper than
@@ -508,9 +588,15 @@  static int teo_enable_device(struct cpuidle_driver *drv,
 			     struct cpuidle_device *dev)
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+	unsigned long max_capacity = arch_scale_cpu_capacity(dev->cpu);
+#endif
 	int i;
 
 	memset(cpu_data, 0, sizeof(*cpu_data));
+#ifdef CONFIG_CPU_IDLE_GOV_TEO_UTIL_AWARE
+	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
+#endif
 
 	for (i = 0; i < NR_RECENT; i++)
 		cpu_data->recent_idx[i] = -1;