diff mbox series

[v2,1/1] cpuidle: menu: Prefer polling state for short idle durations

Message ID 20250317060357.29451-1-aboorvad@linux.ibm.com (mailing list archive)
State New
Headers show
Series [v2,1/1] cpuidle: menu: Prefer polling state for short idle durations | expand

Commit Message

Aboorva Devarajan March 17, 2025, 6:03 a.m. UTC
Avoid selecting deep idle state when the predicted idle duration is
shorter than its target residency, as this leads to unnecessary state
transitions without energy savings.

On virtualized PowerPC (pseries) systems, where only one polling state
(Snooze) and one deep state (CEDE) are available, selecting CEDE when
its target residency exceeds the predicted idle duration hurts
performance.

For example, if the predicted idle duration is 15 us and the first
non-polling state has a target residency of 120 us, selecting it
would be suboptimal.

Remove the condition introduced in commit 69d25870f20c
("cpuidle: fix the menu governor to boost IO performance") that
prioritized non-polling states even when their target residency
exceeded the predicted idle duration and allow polling states to
be selected when appropriate.

Performance improvement observed with pgbench on PowerPC (pseries)
system:
+---------------------------+------------+------------+------------+
| Metric                    | Baseline   | Patched    | Change (%) |
+---------------------------+------------+------------+------------+
| Transactions/sec (TPS)    | 494,834    | 538,707    | +8.85%     |
| Avg latency (ms)          | 0.162      | 0.149      | -8.02%     |
+---------------------------+------------+------------+------------+

CPUIdle state usage:
+--------------+--------------+-------------+
| Metric       | Baseline     | Patched     |
+--------------+--------------+-------------+
| Total usage  | 12,703,630   | 13,941,966  |
| Above usage  | 11,388,990   | 1,620,474   |
| Below usage  | 19,973       | 684,708     |
+--------------+--------------+-------------+

Above/Total and Below/Total usage percentages:
+------------------------+-----------+---------+
| Metric                 | Baseline  | Patched |
+------------------------+-----------+---------+
| Above % (Above/Total)  | 89.67%    | 11.63%  |
| Below % (Below/Total)  | 0.16%     | 4.91%   |
| Total cpuidle miss (%) | 89.83%    | 16.54%  |
+------------------------+-----------+---------+

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>

---

v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/

v1 -> v2:

- Drop cover letter and improve commit message.
---
 drivers/cpuidle/governors/menu.c | 11 -----------
 1 file changed, 11 deletions(-)

Comments

Christian Loehle March 17, 2025, 8:35 a.m. UTC | #1
On 3/17/25 06:03, Aboorva Devarajan wrote:
> Avoid selecting deep idle state when the predicted idle duration is
> shorter than its target residency, as this leads to unnecessary state
> transitions without energy savings.
> 
> On virtualized PowerPC (pseries) systems, where only one polling state
> (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> its target residency exceeds the predicted idle duration hurts
> performance.
> 
> For example, if the predicted idle duration is 15 us and the first
> non-polling state has a target residency of 120 us, selecting it
> would be suboptimal.
> 
> Remove the condition introduced in commit 69d25870f20c
> ("cpuidle: fix the menu governor to boost IO performance") that
> prioritized non-polling states even when their target residency
> exceeded the predicted idle duration and allow polling states to
> be selected when appropriate.
> 
> Performance improvement observed with pgbench on PowerPC (pseries)
> system:
> +---------------------------+------------+------------+------------+
> | Metric                    | Baseline   | Patched    | Change (%) |
> +---------------------------+------------+------------+------------+
> | Transactions/sec (TPS)    | 494,834    | 538,707    | +8.85%     |
> | Avg latency (ms)          | 0.162      | 0.149      | -8.02%     |
> +---------------------------+------------+------------+------------+
> 
> CPUIdle state usage:
> +--------------+--------------+-------------+
> | Metric       | Baseline     | Patched     |
> +--------------+--------------+-------------+
> | Total usage  | 12,703,630   | 13,941,966  |
> | Above usage  | 11,388,990   | 1,620,474   |
> | Below usage  | 19,973       | 684,708     |
> +--------------+--------------+-------------+
> 
> Above/Total and Below/Total usage percentages:
> +------------------------+-----------+---------+
> | Metric                 | Baseline  | Patched |
> +------------------------+-----------+---------+
> | Above % (Above/Total)  | 89.67%    | 11.63%  |
> | Below % (Below/Total)  | 0.16%     | 4.91%   |
> | Total cpuidle miss (%) | 89.83%    | 16.54%  |
> +------------------------+-----------+---------+
> 
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> 
> ---
> 
> v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/
> 
> v1 -> v2:
> 
> - Drop cover letter and improve commit message.
> ---
>  drivers/cpuidle/governors/menu.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 28363bfa3e4c..4b199377e4be 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  			idx = i; /* first enabled state */
>  
>  		if (s->target_residency_ns > predicted_ns) {
> -			/*
> -			 * Use a physical idle state, not busy polling, unless
> -			 * a timer is going to trigger soon enough.
> -			 */
> -			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -			    s->exit_latency_ns <= latency_req &&
> -			    s->target_residency_ns <= data->next_timer_ns) {
> -				predicted_ns = s->target_residency_ns;
> -				idx = i;
> -				break;
> -			}
>  			if (predicted_ns < TICK_NSEC)
>  				break;
>  

I'm still fine with this and don't see a better way to solve the reported
issue generally, see the discussion on v1.
Rafael, do you have any objections?
We could make this conditional on there being a high latency difference between
polling and non-polling to keep x86 behavior.
Rafael J. Wysocki April 1, 2025, 4:58 p.m. UTC | #2
On Mon, Mar 17, 2025 at 9:35 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 3/17/25 06:03, Aboorva Devarajan wrote:
> > Avoid selecting deep idle state when the predicted idle duration is
> > shorter than its target residency, as this leads to unnecessary state
> > transitions without energy savings.
> >
> > On virtualized PowerPC (pseries) systems, where only one polling state
> > (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> > its target residency exceeds the predicted idle duration hurts
> > performance.
> >
> > For example, if the predicted idle duration is 15 us and the first
> > non-polling state has a target residency of 120 us, selecting it
> > would be suboptimal.
> >
> > Remove the condition introduced in commit 69d25870f20c
> > ("cpuidle: fix the menu governor to boost IO performance") that
> > prioritized non-polling states even when their target residency
> > exceeded the predicted idle duration and allow polling states to
> > be selected when appropriate.
> >
> > Performance improvement observed with pgbench on PowerPC (pseries)
> > system:
> > +---------------------------+------------+------------+------------+
> > | Metric                    | Baseline   | Patched    | Change (%) |
> > +---------------------------+------------+------------+------------+
> > | Transactions/sec (TPS)    | 494,834    | 538,707    | +8.85%     |
> > | Avg latency (ms)          | 0.162      | 0.149      | -8.02%     |
> > +---------------------------+------------+------------+------------+
> >
> > CPUIdle state usage:
> > +--------------+--------------+-------------+
> > | Metric       | Baseline     | Patched     |
> > +--------------+--------------+-------------+
> > | Total usage  | 12,703,630   | 13,941,966  |
> > | Above usage  | 11,388,990   | 1,620,474   |
> > | Below usage  | 19,973       | 684,708     |
> > +--------------+--------------+-------------+
> >
> > Above/Total and Below/Total usage percentages:
> > +------------------------+-----------+---------+
> > | Metric                 | Baseline  | Patched |
> > +------------------------+-----------+---------+
> > | Above % (Above/Total)  | 89.67%    | 11.63%  |
> > | Below % (Below/Total)  | 0.16%     | 4.91%   |
> > | Total cpuidle miss (%) | 89.83%    | 16.54%  |
> > +------------------------+-----------+---------+
> >
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> >
> > ---
> >
> > v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/
> >
> > v1 -> v2:
> >
> > - Drop cover letter and improve commit message.
> > ---
> >  drivers/cpuidle/governors/menu.c | 11 -----------
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> > index 28363bfa3e4c..4b199377e4be 100644
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >                       idx = i; /* first enabled state */
> >
> >               if (s->target_residency_ns > predicted_ns) {
> > -                     /*
> > -                      * Use a physical idle state, not busy polling, unless
> > -                      * a timer is going to trigger soon enough.
> > -                      */
> > -                     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> > -                         s->exit_latency_ns <= latency_req &&
> > -                         s->target_residency_ns <= data->next_timer_ns) {
> > -                             predicted_ns = s->target_residency_ns;
> > -                             idx = i;
> > -                             break;
> > -                     }
> >                       if (predicted_ns < TICK_NSEC)
> >                               break;
> >
>
> I'm still fine with this and don't see a better way to solve the reported
> issue generally, see the discussion on v1.
> Rafael, do you have any objections?

The behavior change on x86 would be rather unacceptable I'm afraid.

As already stated in a different thread today, on x86 the polling
state turns out to be overly shallow most of the time even before this
patch.

> We could make this conditional on there being a high latency difference between
> polling and non-polling to keep x86 behavior.

If I'm not mistaken, to address the issue at hand, it would be
sufficient to add an "s->target_residency_ns < RESIDENCY_THRESHOLD_NS"
condition to the if () statement removed by this patch, wouldn't it?
diff mbox series

Patch

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 28363bfa3e4c..4b199377e4be 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -296,17 +296,6 @@  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			idx = i; /* first enabled state */
 
 		if (s->target_residency_ns > predicted_ns) {
-			/*
-			 * Use a physical idle state, not busy polling, unless
-			 * a timer is going to trigger soon enough.
-			 */
-			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
-			    s->exit_latency_ns <= latency_req &&
-			    s->target_residency_ns <= data->next_timer_ns) {
-				predicted_ns = s->target_residency_ns;
-				idx = i;
-				break;
-			}
 			if (predicted_ns < TICK_NSEC)
 				break;