Message ID | 6a9152e5-1a7d-c569-3483-66f022027597@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mwait-idle: updates from Linux (and more) | expand |
On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote: > While we don't want to skip calling update_idle_stats(), arrange for it > to not increment the overall time spent in the state we didn't really > enter. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > RFC: If we wanted to also move the tracing, then I think the part ahead > of the if() also would need moving. At that point we could as well > move update_last_cx_stat(), too, which afaict would allow skipping > update_idle_stats() on the "else" path (which therefore would go > away). Yet then, with the setting of power->safe_state moved up a > little (which imo it should have been anyway) the two > cpu_is_haltable() invocations would only have the lapic_timer_off() > invocation left in between. This would then seem to call for simply > ditching the 2nd one - acpi-idle also doesn't have a 2nd instance. It's possible for lapic_timer_off to take a non-trivial amount of time when virtualized, but it's likely we won't be using mwait in that case, so not sure it matter much to have the two cpu_is_haltable calls if there's just a lapic_timer_off between them. > TBD: For the tracing I wonder if that really needs to come ahead of the > local_irq_enable(). Maybe trace_exit_reason() needs to, but quite > certainly TRACE_6D() doesn't. Would be good if it could be moved after the local_irq_enable call, as it's not as trivial as I've expected, and will just add latency to any pending interrupt waiting to be serviced. FWIW, I haven't spotted a need to call it with interrupt disabled. > --- > v3: Also move cstate_restore_tsc() invocation and split ones to > update_idle_stats(). > v2: New. > > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -854,17 +854,23 @@ static void mwait_idle(void) > mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); > > local_irq_disable(); > - } > > - after = alternative_call(cpuidle_get_tick); > + after = alternative_call(cpuidle_get_tick); > + > + cstate_restore_tsc(); > + > + /* Now back in C0. */ > + update_idle_stats(power, cx, before, after); > + } else { > + /* Never left C0. */ > + after = alternative_call(cpuidle_get_tick); > + update_idle_stats(power, cx, after, after); While adjusting this, could you also modify update_idle_stats to avoid increasing cx->usage if before == after (or !sleep_ticks). I don't think it's fine to increase the state counter if we never actually entered it. I was also going to suggest that you don't set 'after' and just use update_idle_stats(power, cx, before, before); but seeing as TRACE_6D also makes use of 'after' there's not much point without further rework. I also see the RFC note at the top, so while I think this is an improvement, I agree it would be nice to avoid the trace altogether if we never actually enter the state. If you want to rework the patch or send a followup that's fine for me. Thanks, Roger.
On 01.02.2022 12:04, Roger Pau Monné wrote: > On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote: >> While we don't want to skip calling update_idle_stats(), arrange for it >> to not increment the overall time spent in the state we didn't really >> enter. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> RFC: If we wanted to also move the tracing, then I think the part ahead >> of the if() also would need moving. At that point we could as well >> move update_last_cx_stat(), too, which afaict would allow skipping >> update_idle_stats() on the "else" path (which therefore would go >> away). Yet then, with the setting of power->safe_state moved up a >> little (which imo it should have been anyway) the two >> cpu_is_haltable() invocations would only have the lapic_timer_off() >> invocation left in between. This would then seem to call for simply >> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance. > > It's possible for lapic_timer_off to take a non-trivial amount of time > when virtualized, but it's likely we won't be using mwait in that > case, so not sure it matter much to have the two cpu_is_haltable calls > if there's just a lapic_timer_off between them. > >> TBD: For the tracing I wonder if that really needs to come ahead of the >> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite >> certainly TRACE_6D() doesn't. > > Would be good if it could be moved after the local_irq_enable call, as > it's not as trivial as I've expected, and will just add latency to any > pending interrupt waiting to be serviced. FWIW, I haven't spotted a > need to call it with interrupt disabled. Okay, I guess I'll to the larger rework then. >> --- a/xen/arch/x86/cpu/mwait-idle.c >> +++ b/xen/arch/x86/cpu/mwait-idle.c >> @@ -854,17 +854,23 @@ static void mwait_idle(void) >> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); >> >> local_irq_disable(); >> - } >> >> - after = alternative_call(cpuidle_get_tick); >> + after = alternative_call(cpuidle_get_tick); >> + >> + cstate_restore_tsc(); >> + >> + /* Now back in C0. */ >> + update_idle_stats(power, cx, before, after); >> + } else { >> + /* Never left C0. */ >> + after = alternative_call(cpuidle_get_tick); >> + update_idle_stats(power, cx, after, after); > > While adjusting this, could you also modify update_idle_stats to avoid > increasing cx->usage if before == after (or !sleep_ticks). I don't > think it's fine to increase the state counter if we never actually > entered it. I did consider it but then decided against. Even leaving this aspect aside the counter only counts _attempts_ to enter a certain state; the CPU may find reasons to never actually enter it. And what we have when before == after is still an attempt, albeit an unsuccessful one. Jan
On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote: > On 01.02.2022 12:04, Roger Pau Monné wrote: > > On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote: > >> While we don't want to skip calling update_idle_stats(), arrange for it > >> to not increment the overall time spent in the state we didn't really > >> enter. > >> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >> --- > >> RFC: If we wanted to also move the tracing, then I think the part ahead > >> of the if() also would need moving. At that point we could as well > >> move update_last_cx_stat(), too, which afaict would allow skipping > >> update_idle_stats() on the "else" path (which therefore would go > >> away). Yet then, with the setting of power->safe_state moved up a > >> little (which imo it should have been anyway) the two > >> cpu_is_haltable() invocations would only have the lapic_timer_off() > >> invocation left in between. This would then seem to call for simply > >> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance. > > > > It's possible for lapic_timer_off to take a non-trivial amount of time > > when virtualized, but it's likely we won't be using mwait in that > > case, so not sure it matter much to have the two cpu_is_haltable calls > > if there's just a lapic_timer_off between them. > > > >> TBD: For the tracing I wonder if that really needs to come ahead of the > >> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite > >> certainly TRACE_6D() doesn't. > > > > Would be good if it could be moved after the local_irq_enable call, as > > it's not as trivial as I've expected, and will just add latency to any > > pending interrupt waiting to be serviced. FWIW, I haven't spotted a > > need to call it with interrupt disabled. > > Okay, I guess I'll to the larger rework then. > > >> --- a/xen/arch/x86/cpu/mwait-idle.c > >> +++ b/xen/arch/x86/cpu/mwait-idle.c > >> @@ -854,17 +854,23 @@ static void mwait_idle(void) > >> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); > >> > >> local_irq_disable(); > >> - } > >> > >> - after = alternative_call(cpuidle_get_tick); > >> + after = alternative_call(cpuidle_get_tick); > >> + > >> + cstate_restore_tsc(); > >> + > >> + /* Now back in C0. */ > >> + update_idle_stats(power, cx, before, after); > >> + } else { > >> + /* Never left C0. */ > >> + after = alternative_call(cpuidle_get_tick); > >> + update_idle_stats(power, cx, after, after); > > > > While adjusting this, could you also modify update_idle_stats to avoid > > increasing cx->usage if before == after (or !sleep_ticks). I don't > > think it's fine to increase the state counter if we never actually > > entered it. > > I did consider it but then decided against. Even leaving this aspect > aside the counter only counts _attempts_ to enter a certain state; > the CPU may find reasons to never actually enter it. And what we have > when before == after is still an attempt, albeit an unsuccessful one. Right, in which case: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Not sure whether you would like to commit this now and do the lager rework as a followup patch. That would be fine by me. Thanks, Roger.
On 01.02.2022 13:42, Roger Pau Monné wrote: > On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote: >> On 01.02.2022 12:04, Roger Pau Monné wrote: >>> On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote: >>>> While we don't want to skip calling update_idle_stats(), arrange for it >>>> to not increment the overall time spent in the state we didn't really >>>> enter. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> --- >>>> RFC: If we wanted to also move the tracing, then I think the part ahead >>>> of the if() also would need moving. At that point we could as well >>>> move update_last_cx_stat(), too, which afaict would allow skipping >>>> update_idle_stats() on the "else" path (which therefore would go >>>> away). Yet then, with the setting of power->safe_state moved up a >>>> little (which imo it should have been anyway) the two >>>> cpu_is_haltable() invocations would only have the lapic_timer_off() >>>> invocation left in between. This would then seem to call for simply >>>> ditching the 2nd one - acpi-idle also doesn't have a 2nd instance. >>> >>> It's possible for lapic_timer_off to take a non-trivial amount of time >>> when virtualized, but it's likely we won't be using mwait in that >>> case, so not sure it matter much to have the two cpu_is_haltable calls >>> if there's just a lapic_timer_off between them. >>> >>>> TBD: For the tracing I wonder if that really needs to come ahead of the >>>> local_irq_enable(). Maybe trace_exit_reason() needs to, but quite >>>> certainly TRACE_6D() doesn't. >>> >>> Would be good if it could be moved after the local_irq_enable call, as >>> it's not as trivial as I've expected, and will just add latency to any >>> pending interrupt waiting to be serviced. FWIW, I haven't spotted a >>> need to call it with interrupt disabled. >> >> Okay, I guess I'll to the larger rework then. >> >>>> --- a/xen/arch/x86/cpu/mwait-idle.c >>>> +++ b/xen/arch/x86/cpu/mwait-idle.c >>>> @@ -854,17 +854,23 @@ static void mwait_idle(void) >>>> mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); >>>> >>>> local_irq_disable(); >>>> - } >>>> >>>> - after = alternative_call(cpuidle_get_tick); >>>> + after = alternative_call(cpuidle_get_tick); >>>> + >>>> + cstate_restore_tsc(); >>>> + >>>> + /* Now back in C0. */ >>>> + update_idle_stats(power, cx, before, after); >>>> + } else { >>>> + /* Never left C0. */ >>>> + after = alternative_call(cpuidle_get_tick); >>>> + update_idle_stats(power, cx, after, after); >>> >>> While adjusting this, could you also modify update_idle_stats to avoid >>> increasing cx->usage if before == after (or !sleep_ticks). I don't >>> think it's fine to increase the state counter if we never actually >>> entered it. >> >> I did consider it but then decided against. Even leaving this aspect >> aside the counter only counts _attempts_ to enter a certain state; >> the CPU may find reasons to never actually enter it. And what we have >> when before == after is still an attempt, albeit an unsuccessful one. > > Right, in which case: > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, but ... > Not sure whether you would like to commit this now and do the lager > rework as a followup patch. That would be fine by me. ... no, I'd rather do this in a single step. In its current shape the patch is actually moving us in the opposite direction. Jan
--- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -854,17 +854,23 @@ static void mwait_idle(void) mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); local_irq_disable(); - } - after = alternative_call(cpuidle_get_tick); + after = alternative_call(cpuidle_get_tick); + + cstate_restore_tsc(); + + /* Now back in C0. */ + update_idle_stats(power, cx, before, after); + } else { + /* Never left C0. */ + after = alternative_call(cpuidle_get_tick); + update_idle_stats(power, cx, after, after); + } - cstate_restore_tsc(); trace_exit_reason(irq_traced); TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, irq_traced[0], irq_traced[1], irq_traced[2], irq_traced[3]); - /* Now back in C0. */ - update_idle_stats(power, cx, before, after); local_irq_enable(); if (!(lapic_timer_reliable_states & (1 << cx->type)))
While we don't want to skip calling update_idle_stats(), arrange for it to not increment the overall time spent in the state we didn't really enter. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- RFC: If we wanted to also move the tracing, then I think the part ahead of the if() also would need moving. At that point we could as well move update_last_cx_stat(), too, which afaict would allow skipping update_idle_stats() on the "else" path (which therefore would go away). Yet then, with the setting of power->safe_state moved up a little (which imo it should have been anyway) the two cpu_is_haltable() invocations would only have the lapic_timer_off() invocation left in between. This would then seem to call for simply ditching the 2nd one - acpi-idle also doesn't have a 2nd instance. TBD: For the tracing I wonder if that really needs to come ahead of the local_irq_enable(). Maybe trace_exit_reason() needs to, but quite certainly TRACE_6D() doesn't. --- v3: Also move cstate_restore_tsc() invocation and split ones to update_idle_stats(). v2: New.