Message ID | 20250317060357.29451-1-aboorvad@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/1] cpuidle: menu: Prefer polling state for short idle durations | expand |
On 3/17/25 06:03, Aboorva Devarajan wrote: > Avoid selecting deep idle state when the predicted idle duration is > shorter than its target residency, as this leads to unnecessary state > transitions without energy savings. > > On virtualized PowerPC (pseries) systems, where only one polling state > (Snooze) and one deep state (CEDE) are available, selecting CEDE when > its target residency exceeds the predicted idle duration hurts > performance. > > For example, if the predicted idle duration is 15 us and the first > non-polling state has a target residency of 120 us, selecting it > would be suboptimal. > > Remove the condition introduced in commit 69d25870f20c > ("cpuidle: fix the menu governor to boost IO performance") that > prioritized non-polling states even when their target residency > exceeded the predicted idle duration and allow polling states to > be selected when appropriate. > > Performance improvement observed with pgbench on PowerPC (pseries) > system: > +---------------------------+------------+------------+------------+ > | Metric | Baseline | Patched | Change (%) | > +---------------------------+------------+------------+------------+ > | Transactions/sec (TPS) | 494,834 | 538,707 | +8.85% | > | Avg latency (ms) | 0.162 | 0.149 | -8.02% | > +---------------------------+------------+------------+------------+ > > CPUIdle state usage: > +--------------+--------------+-------------+ > | Metric | Baseline | Patched | > +--------------+--------------+-------------+ > | Total usage | 12,703,630 | 13,941,966 | > | Above usage | 11,388,990 | 1,620,474 | > | Below usage | 19,973 | 684,708 | > +--------------+--------------+-------------+ > > Above/Total and Below/Total usage percentages: > +------------------------+-----------+---------+ > | Metric | Baseline | Patched | > +------------------------+-----------+---------+ > | Above % (Above/Total) | 89.67% | 11.63% | > | Below % (Below/Total) | 0.16% | 4.91% | > | Total cpuidle miss (%) | 89.83% | 16.54% | > +------------------------+-----------+---------+ > > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> > > --- > > v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/ > > v1 -> v2: > > - Drop cover letter and improve commit message. > --- > drivers/cpuidle/governors/menu.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > index 28363bfa3e4c..4b199377e4be 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, > idx = i; /* first enabled state */ > > if (s->target_residency_ns > predicted_ns) { > - /* > - * Use a physical idle state, not busy polling, unless > - * a timer is going to trigger soon enough. > - */ > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && > - s->exit_latency_ns <= latency_req && > - s->target_residency_ns <= data->next_timer_ns) { > - predicted_ns = s->target_residency_ns; > - idx = i; > - break; > - } > if (predicted_ns < TICK_NSEC) > break; > I'm still fine with this and don't see a better way to solve the reported issue generally, see the discussion on v1. Rafael, do you have any objections? We could make this conditional on there being a high latency difference between polling and non-polling to keep x86 behavior.
On Mon, Mar 17, 2025 at 9:35 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 3/17/25 06:03, Aboorva Devarajan wrote: > > Avoid selecting deep idle state when the predicted idle duration is > > shorter than its target residency, as this leads to unnecessary state > > transitions without energy savings. > > > > On virtualized PowerPC (pseries) systems, where only one polling state > > (Snooze) and one deep state (CEDE) are available, selecting CEDE when > > its target residency exceeds the predicted idle duration hurts > > performance. > > > > For example, if the predicted idle duration is 15 us and the first > > non-polling state has a target residency of 120 us, selecting it > > would be suboptimal. > > > > Remove the condition introduced in commit 69d25870f20c > > ("cpuidle: fix the menu governor to boost IO performance") that > > prioritized non-polling states even when their target residency > > exceeded the predicted idle duration and allow polling states to > > be selected when appropriate. > > > > Performance improvement observed with pgbench on PowerPC (pseries) > > system: > > +---------------------------+------------+------------+------------+ > > | Metric | Baseline | Patched | Change (%) | > > +---------------------------+------------+------------+------------+ > > | Transactions/sec (TPS) | 494,834 | 538,707 | +8.85% | > > | Avg latency (ms) | 0.162 | 0.149 | -8.02% | > > +---------------------------+------------+------------+------------+ > > > > CPUIdle state usage: > > +--------------+--------------+-------------+ > > | Metric | Baseline | Patched | > > +--------------+--------------+-------------+ > > | Total usage | 12,703,630 | 13,941,966 | > > | Above usage | 11,388,990 | 1,620,474 | > > | Below usage | 19,973 | 684,708 | > > +--------------+--------------+-------------+ > > > > Above/Total and Below/Total usage percentages: > > +------------------------+-----------+---------+ > > | Metric | Baseline | Patched | > > +------------------------+-----------+---------+ > > | Above % (Above/Total) | 89.67% | 11.63% | > > | Below % (Below/Total) | 0.16% | 4.91% | > > | Total cpuidle miss (%) | 89.83% | 16.54% | > > +------------------------+-----------+---------+ > > > > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> > > > > --- > > > > v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/ > > > > v1 -> v2: > > > > - Drop cover letter and improve commit message. > > --- > > drivers/cpuidle/governors/menu.c | 11 ----------- > > 1 file changed, 11 deletions(-) > > > > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > > index 28363bfa3e4c..4b199377e4be 100644 > > --- a/drivers/cpuidle/governors/menu.c > > +++ b/drivers/cpuidle/governors/menu.c > > @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, > > idx = i; /* first enabled state */ > > > > if (s->target_residency_ns > predicted_ns) { > > - /* > > - * Use a physical idle state, not busy polling, unless > > - * a timer is going to trigger soon enough. > > - */ > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && > > - s->exit_latency_ns <= latency_req && > > - s->target_residency_ns <= data->next_timer_ns) { > > - predicted_ns = s->target_residency_ns; > > - idx = i; > > - break; > > - } > > if (predicted_ns < TICK_NSEC) > > break; > > > > I'm still fine with this and don't see a better way to solve the reported > issue generally, see the discussion on v1. > Rafael, do you have any objections? The behavior change on x86 would be rather unacceptable I'm afraid. As already stated in a different thread today, on x86 the polling state turns out to be overly shallow most of the time even before this patch. > We could make this conditional on there being a high latency difference between > polling and non-polling to keep x86 behavior. If I'm not mistaken, to address the issue at hand, it would be sufficient to add an "s->target_residency_ns < RESIDENCY_THRESHOLD_NS" condition to the if () statement removed by this patch, wouldn't it?
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 28363bfa3e4c..4b199377e4be 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -296,17 +296,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = i; /* first enabled state */ if (s->target_residency_ns > predicted_ns) { - /* - * Use a physical idle state, not busy polling, unless - * a timer is going to trigger soon enough. - */ - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && - s->exit_latency_ns <= latency_req && - s->target_residency_ns <= data->next_timer_ns) { - predicted_ns = s->target_residency_ns; - idx = i; - break; - } if (predicted_ns < TICK_NSEC) break;
Avoid selecting deep idle state when the predicted idle duration is shorter than its target residency, as this leads to unnecessary state transitions without energy savings. On virtualized PowerPC (pseries) systems, where only one polling state (Snooze) and one deep state (CEDE) are available, selecting CEDE when its target residency exceeds the predicted idle duration hurts performance. For example, if the predicted idle duration is 15 us and the first non-polling state has a target residency of 120 us, selecting it would be suboptimal. Remove the condition introduced in commit 69d25870f20c ("cpuidle: fix the menu governor to boost IO performance") that prioritized non-polling states even when their target residency exceeded the predicted idle duration and allow polling states to be selected when appropriate. Performance improvement observed with pgbench on PowerPC (pseries) system: +---------------------------+------------+------------+------------+ | Metric | Baseline | Patched | Change (%) | +---------------------------+------------+------------+------------+ | Transactions/sec (TPS) | 494,834 | 538,707 | +8.85% | | Avg latency (ms) | 0.162 | 0.149 | -8.02% | +---------------------------+------------+------------+------------+ CPUIdle state usage: +--------------+--------------+-------------+ | Metric | Baseline | Patched | +--------------+--------------+-------------+ | Total usage | 12,703,630 | 13,941,966 | | Above usage | 11,388,990 | 1,620,474 | | Below usage | 19,973 | 684,708 | +--------------+--------------+-------------+ Above/Total and Below/Total usage percentages: +------------------------+-----------+---------+ | Metric | Baseline | Patched | +------------------------+-----------+---------+ | Above % (Above/Total) | 89.67% | 11.63% | | Below % (Below/Total) | 0.16% | 4.91% | | Total cpuidle miss (%) | 89.83% | 16.54% | +------------------------+-----------+---------+ Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> --- v1: https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/ v1 -> v2: - Drop cover letter and improve commit message. --- drivers/cpuidle/governors/menu.c | 11 ----------- 1 file changed, 11 deletions(-)