diff mbox series

[v3,2/2] x86/mwait-idle: squash stats update when not actually entering C-state

Message ID 6a9152e5-1a7d-c569-3483-66f022027597@suse.com (mailing list archive)
State New, archived
Headers show
Series x86/mwait-idle: updates from Linux (and more) | expand

Commit Message

Jan Beulich Jan. 27, 2022, 3:13 p.m. UTC
While we don't want to skip calling update_idle_stats(), arrange for it
to not increment the overall time spent in the state we didn't really
enter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: If we wanted to also move the tracing, then I think the part ahead
     of the if() also would need moving. At that point we could as well
     move update_last_cx_stat(), too, which afaict would allow skipping
     update_idle_stats() on the "else" path (which therefore would go
     away). Yet then, with the setting of power->safe_state moved up a
     little (which imo it should have been anyway) the two
     cpu_is_haltable() invocations would only have the lapic_timer_off()
     invocation left in between. This would then seem to call for simply
     ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
TBD: For the tracing I wonder if that really needs to come ahead of the
     local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
     certainly TRACE_6D() doesn't.
---
v3: Also move cstate_restore_tsc() invocation and split ones to
    update_idle_stats().
v2: New.

Comments

Roger Pau Monné Feb. 1, 2022, 11:04 a.m. UTC | #1
On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
> While we don't want to skip calling update_idle_stats(), arrange for it
> to not increment the overall time spent in the state we didn't really
> enter.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: If we wanted to also move the tracing, then I think the part ahead
>      of the if() also would need moving. At that point we could as well
>      move update_last_cx_stat(), too, which afaict would allow skipping
>      update_idle_stats() on the "else" path (which therefore would go
>      away). Yet then, with the setting of power->safe_state moved up a
>      little (which imo it should have been anyway) the two
>      cpu_is_haltable() invocations would only have the lapic_timer_off()
>      invocation left in between. This would then seem to call for simply
>      ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.

It's possible for lapic_timer_off to take a non-trivial amount of time
when virtualized, but it's likely we won't be using mwait in that
case, so not sure it matter much to have the two cpu_is_haltable calls
if there's just a lapic_timer_off between them.

> TBD: For the tracing I wonder if that really needs to come ahead of the
>      local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
>      certainly TRACE_6D() doesn't.

Would be good if it could be moved after the local_irq_enable call, as
it's not as trivial as I've expected, and will just add latency to any
pending interrupt waiting to be serviced. FWIW, I haven't spotted a
need to call it with interrupt disabled.

> ---
> v3: Also move cstate_restore_tsc() invocation and split ones to
>     update_idle_stats().
> v2: New.
> 
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -854,17 +854,23 @@ static void mwait_idle(void)
>  		mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>  
>  		local_irq_disable();
> -	}
>  
> -	after = alternative_call(cpuidle_get_tick);
> +		after = alternative_call(cpuidle_get_tick);
> +
> +		cstate_restore_tsc();
> +
> +		/* Now back in C0. */
> +		update_idle_stats(power, cx, before, after);
> +	} else {
> +		/* Never left C0. */
> +		after = alternative_call(cpuidle_get_tick);
> +		update_idle_stats(power, cx, after, after);

While adjusting this, could you also modify update_idle_stats to avoid
increasing cx->usage if before == after (or !sleep_ticks). I don't
think it's fine to increase the state counter if we never actually
entered it.

I was also going to suggest that you don't set 'after' and just use
update_idle_stats(power, cx, before, before); but seeing as TRACE_6D
also makes use of 'after' there's not much point without further
rework. I also see the RFC note at the top, so while I think this is
an improvement, I agree it would be nice to avoid the trace altogether
if we never actually enter the state. If you want to rework the patch
or send a followup that's fine for me.

Thanks, Roger.
Jan Beulich Feb. 1, 2022, 11:37 a.m. UTC | #2
On 01.02.2022 12:04, Roger Pau Monné wrote:
> On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
>> While we don't want to skip calling update_idle_stats(), arrange for it
>> to not increment the overall time spent in the state we didn't really
>> enter.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: If we wanted to also move the tracing, then I think the part ahead
>>      of the if() also would need moving. At that point we could as well
>>      move update_last_cx_stat(), too, which afaict would allow skipping
>>      update_idle_stats() on the "else" path (which therefore would go
>>      away). Yet then, with the setting of power->safe_state moved up a
>>      little (which imo it should have been anyway) the two
>>      cpu_is_haltable() invocations would only have the lapic_timer_off()
>>      invocation left in between. This would then seem to call for simply
>>      ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
> 
> It's possible for lapic_timer_off to take a non-trivial amount of time
> when virtualized, but it's likely we won't be using mwait in that
> case, so not sure it matter much to have the two cpu_is_haltable calls
> if there's just a lapic_timer_off between them.
> 
>> TBD: For the tracing I wonder if that really needs to come ahead of the
>>      local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
>>      certainly TRACE_6D() doesn't.
> 
> Would be good if it could be moved after the local_irq_enable call, as
> it's not as trivial as I've expected, and will just add latency to any
> pending interrupt waiting to be serviced. FWIW, I haven't spotted a
> need to call it with interrupt disabled.

Okay, I guess I'll to the larger rework then.

>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -854,17 +854,23 @@ static void mwait_idle(void)
>>  		mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>>  
>>  		local_irq_disable();
>> -	}
>>  
>> -	after = alternative_call(cpuidle_get_tick);
>> +		after = alternative_call(cpuidle_get_tick);
>> +
>> +		cstate_restore_tsc();
>> +
>> +		/* Now back in C0. */
>> +		update_idle_stats(power, cx, before, after);
>> +	} else {
>> +		/* Never left C0. */
>> +		after = alternative_call(cpuidle_get_tick);
>> +		update_idle_stats(power, cx, after, after);
> 
> While adjusting this, could you also modify update_idle_stats to avoid
> increasing cx->usage if before == after (or !sleep_ticks). I don't
> think it's fine to increase the state counter if we never actually
> entered it.

I did consider it but then decided against. Even leaving this aspect
aside the counter only counts _attempts_ to enter a certain state;
the CPU may find reasons to never actually enter it. And what we have
when before == after is still an attempt, albeit an unsuccessful one.

Jan
Roger Pau Monné Feb. 1, 2022, 12:42 p.m. UTC | #3
On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote:
> On 01.02.2022 12:04, Roger Pau Monné wrote:
> > On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
> >> While we don't want to skip calling update_idle_stats(), arrange for it
> >> to not increment the overall time spent in the state we didn't really
> >> enter.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> RFC: If we wanted to also move the tracing, then I think the part ahead
> >>      of the if() also would need moving. At that point we could as well
> >>      move update_last_cx_stat(), too, which afaict would allow skipping
> >>      update_idle_stats() on the "else" path (which therefore would go
> >>      away). Yet then, with the setting of power->safe_state moved up a
> >>      little (which imo it should have been anyway) the two
> >>      cpu_is_haltable() invocations would only have the lapic_timer_off()
> >>      invocation left in between. This would then seem to call for simply
> >>      ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
> > 
> > It's possible for lapic_timer_off to take a non-trivial amount of time
> > when virtualized, but it's likely we won't be using mwait in that
> > case, so not sure it matter much to have the two cpu_is_haltable calls
> > if there's just a lapic_timer_off between them.
> > 
> >> TBD: For the tracing I wonder if that really needs to come ahead of the
> >>      local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
> >>      certainly TRACE_6D() doesn't.
> > 
> > Would be good if it could be moved after the local_irq_enable call, as
> > it's not as trivial as I've expected, and will just add latency to any
> > pending interrupt waiting to be serviced. FWIW, I haven't spotted a
> > need to call it with interrupt disabled.
> 
> Okay, I guess I'll to the larger rework then.
> 
> >> --- a/xen/arch/x86/cpu/mwait-idle.c
> >> +++ b/xen/arch/x86/cpu/mwait-idle.c
> >> @@ -854,17 +854,23 @@ static void mwait_idle(void)
> >>  		mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
> >>  
> >>  		local_irq_disable();
> >> -	}
> >>  
> >> -	after = alternative_call(cpuidle_get_tick);
> >> +		after = alternative_call(cpuidle_get_tick);
> >> +
> >> +		cstate_restore_tsc();
> >> +
> >> +		/* Now back in C0. */
> >> +		update_idle_stats(power, cx, before, after);
> >> +	} else {
> >> +		/* Never left C0. */
> >> +		after = alternative_call(cpuidle_get_tick);
> >> +		update_idle_stats(power, cx, after, after);
> > 
> > While adjusting this, could you also modify update_idle_stats to avoid
> > increasing cx->usage if before == after (or !sleep_ticks). I don't
> > think it's fine to increase the state counter if we never actually
> > entered it.
> 
> I did consider it but then decided against. Even leaving this aspect
> aside the counter only counts _attempts_ to enter a certain state;
> the CPU may find reasons to never actually enter it. And what we have
> when before == after is still an attempt, albeit an unsuccessful one.

Right, in which case:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Not sure whether you would like to commit this now and do the lager
rework as a followup patch. That would be fine by me.

Thanks, Roger.
Jan Beulich Feb. 1, 2022, 12:45 p.m. UTC | #4
On 01.02.2022 13:42, Roger Pau Monné wrote:
> On Tue, Feb 01, 2022 at 12:37:27PM +0100, Jan Beulich wrote:
>> On 01.02.2022 12:04, Roger Pau Monné wrote:
>>> On Thu, Jan 27, 2022 at 04:13:47PM +0100, Jan Beulich wrote:
>>>> While we don't want to skip calling update_idle_stats(), arrange for it
>>>> to not increment the overall time spent in the state we didn't really
>>>> enter.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> RFC: If we wanted to also move the tracing, then I think the part ahead
>>>>      of the if() also would need moving. At that point we could as well
>>>>      move update_last_cx_stat(), too, which afaict would allow skipping
>>>>      update_idle_stats() on the "else" path (which therefore would go
>>>>      away). Yet then, with the setting of power->safe_state moved up a
>>>>      little (which imo it should have been anyway) the two
>>>>      cpu_is_haltable() invocations would only have the lapic_timer_off()
>>>>      invocation left in between. This would then seem to call for simply
>>>>      ditching the 2nd one - acpi-idle also doesn't have a 2nd instance.
>>>
>>> It's possible for lapic_timer_off to take a non-trivial amount of time
>>> when virtualized, but it's likely we won't be using mwait in that
>>> case, so not sure it matter much to have the two cpu_is_haltable calls
>>> if there's just a lapic_timer_off between them.
>>>
>>>> TBD: For the tracing I wonder if that really needs to come ahead of the
>>>>      local_irq_enable(). Maybe trace_exit_reason() needs to, but quite
>>>>      certainly TRACE_6D() doesn't.
>>>
>>> Would be good if it could be moved after the local_irq_enable call, as
>>> it's not as trivial as I've expected, and will just add latency to any
>>> pending interrupt waiting to be serviced. FWIW, I haven't spotted a
>>> need to call it with interrupt disabled.
>>
>> Okay, I guess I'll to the larger rework then.
>>
>>>> --- a/xen/arch/x86/cpu/mwait-idle.c
>>>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>>>> @@ -854,17 +854,23 @@ static void mwait_idle(void)
>>>>  		mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
>>>>  
>>>>  		local_irq_disable();
>>>> -	}
>>>>  
>>>> -	after = alternative_call(cpuidle_get_tick);
>>>> +		after = alternative_call(cpuidle_get_tick);
>>>> +
>>>> +		cstate_restore_tsc();
>>>> +
>>>> +		/* Now back in C0. */
>>>> +		update_idle_stats(power, cx, before, after);
>>>> +	} else {
>>>> +		/* Never left C0. */
>>>> +		after = alternative_call(cpuidle_get_tick);
>>>> +		update_idle_stats(power, cx, after, after);
>>>
>>> While adjusting this, could you also modify update_idle_stats to avoid
>>> increasing cx->usage if before == after (or !sleep_ticks). I don't
>>> think it's fine to increase the state counter if we never actually
>>> entered it.
>>
>> I did consider it but then decided against. Even leaving this aspect
>> aside the counter only counts _attempts_ to enter a certain state;
>> the CPU may find reasons to never actually enter it. And what we have
>> when before == after is still an attempt, albeit an unsuccessful one.
> 
> Right, in which case:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, but ...

> Not sure whether you would like to commit this now and do the lager
> rework as a followup patch. That would be fine by me.

... no, I'd rather do this in a single step. In its current shape the
patch is actually moving us in the opposite direction.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -854,17 +854,23 @@  static void mwait_idle(void)
 		mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
 
 		local_irq_disable();
-	}
 
-	after = alternative_call(cpuidle_get_tick);
+		after = alternative_call(cpuidle_get_tick);
+
+		cstate_restore_tsc();
+
+		/* Now back in C0. */
+		update_idle_stats(power, cx, before, after);
+	} else {
+		/* Never left C0. */
+		after = alternative_call(cpuidle_get_tick);
+		update_idle_stats(power, cx, after, after);
+	}
 
-	cstate_restore_tsc();
 	trace_exit_reason(irq_traced);
 	TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
 		irq_traced[0], irq_traced[1], irq_traced[2], irq_traced[3]);
 
-	/* Now back in C0. */
-	update_idle_stats(power, cx, before, after);
 	local_irq_enable();
 
 	if (!(lapic_timer_reliable_states & (1 << cx->type)))