mbox series

[v1,0/3] thermal/debugfs: Fix and clean up trip point statistics updates

Message ID 4918025.31r3eYUQgx@kreacher (mailing list archive)
Headers show
Series thermal/debugfs: Fix and clean up trip point statistics updates | expand

Message

Rafael J. Wysocki April 17, 2024, 1:07 p.m. UTC
Hi Everyone,

The first patch in this series addresses the problem of updating trip
point statistics prematurely for trip points that have just been
crossed on the way down (please see the patch changelog for details).

The way it does that renders the following cleanup patch inapplicable:

https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/

The remaining two patches in the series are cleanups on top of the
first one.

This series is based on an older patch series posted last week:

https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/

but it can be trivially rebased on top of the current linux-next.

Thanks!

Comments

Lukasz Luba April 22, 2024, 11:37 a.m. UTC | #1
Hi Rafael,

On 4/17/24 14:07, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> The first patch in this series addresses the problem of updating trip
> point statistics prematurely for trip points that have just been
> crossed on the way down (please see the patch changelog for details).
> 
> The way it does that renders the following cleanup patch inapplicable:
> 
> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
> 
> The remaining two patches in the series are cleanups on top of the
> first one.
> 
> This series is based on an older patch series posted last week:
> 
> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
> 
> but it can be trivially rebased on top of the current linux-next.
> 
> Thanks!
> 
> 
> 

I've checked this patch patch set on top of your bleeding-edge
which has thermal re-work as well. The patch set looks good
and works properly.

Although, I have found some issue in this debug info files and
I'm not sure if this is expected or not. If not I can address this
and send some small fix for it.

When I read the cooling device residency statistics, I don't
get updates for the first time the state is used. It can only
be counted when that state was known and finished it's usage.

IMO it is not the right behavior, isn't it?

Experiment:
My trip points are 70degC and 75degC and I'm setting emulated
temperature to cross them and get cooling states 1 then 0.
As you can see the statistics counter only starts showing value after
after trip crossing down.
------------------------------------8<-----------------------------------

root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
root@arm:~#
root@arm:~#
root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp 

root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
root@arm:~# echo 76000 > /sys/class/thermal/thermal_zone0/emul_temp 

root@arm:~#
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       518197
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       518197
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       518197
root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp 

root@arm:~#
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       520066
1       17567
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       522653
1       17567
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       526151
1       17567
root@arm:~# echo 66000 > /sys/class/thermal/thermal_zone0/emul_temp 

root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       537366
1       17567
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       544936
1       17567
root@arm:~# cat 
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State   Residency
0       556694
1       17567
root@arm:~#

------------------------------->8----------------------------------------

Please let me know what do you think about that behavior.

Regards,
Lukasz
Rafael J. Wysocki April 22, 2024, 2:20 p.m. UTC | #2
Hi Lukasz,

On Mon, Apr 22, 2024 at 1:37 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Rafael,
>
> On 4/17/24 14:07, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > The first patch in this series addresses the problem of updating trip
> > point statistics prematurely for trip points that have just been
> > crossed on the way down (please see the patch changelog for details).
> >
> > The way it does that renders the following cleanup patch inapplicable:
> >
> > https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
> >
> > The remaining two patches in the series are cleanups on top of the
> > first one.
> >
> > This series is based on an older patch series posted last week:
> >
> > https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
> >
> > but it can be trivially rebased on top of the current linux-next.
> >
> > Thanks!
> >
> >
> >
>
> I've checked this patch patch set on top of your bleeding-edge
> which has thermal re-work as well. The patch set looks good
> and works properly.
>
> Although, I have found some issue in this debug info files and
> I'm not sure if this is expected or not. If not I can address this
> and send some small fix for it.
>
> When I read the cooling device residency statistics, I don't
> get updates for the first time the state is used. It can only
> be counted when that state was known and finished it's usage.
>
> IMO it is not the right behavior, isn't it?

I agree, it is not.

> Experiment:
> My trip points are 70degC and 75degC and I'm setting emulated
> temperature to cross them and get cooling states 1 then 0.
> As you can see the statistics counter only starts showing value after
> after trip crossing down.
> ------------------------------------8<-----------------------------------
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> root@arm:~#
> root@arm:~#
> root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> root@arm:~# echo 76000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~#
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       518197
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       518197
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       518197
> root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~#
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       520066
> 1       17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       522653
> 1       17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       526151
> 1       17567
> root@arm:~# echo 66000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       537366
> 1       17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       544936
> 1       17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State   Residency
> 0       556694
> 1       17567
> root@arm:~#
>
> ------------------------------->8----------------------------------------
>
> Please let me know what do you think about that behavior.

If you have thought about fixing it already, please do so.

Thanks!
Daniel Lezcano April 22, 2024, 3:34 p.m. UTC | #3
On 22/04/2024 13:37, Lukasz Luba wrote:
> Hi Rafael,
> 
> On 4/17/24 14:07, Rafael J. Wysocki wrote:
>> Hi Everyone,
>>
>> The first patch in this series addresses the problem of updating trip
>> point statistics prematurely for trip points that have just been
>> crossed on the way down (please see the patch changelog for details).
>>
>> The way it does that renders the following cleanup patch inapplicable:
>>
>> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
>>
>> The remaining two patches in the series are cleanups on top of the
>> first one.
>>
>> This series is based on an older patch series posted last week:
>>
>> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
>>
>> but it can be trivially rebased on top of the current linux-next.
>>
>> Thanks!
>>
>>
>>
> 
> I've checked this patch patch set on top of your bleeding-edge
> which has thermal re-work as well. The patch set looks good
> and works properly.
> 
> Although, I have found some issue in this debug info files and
> I'm not sure if this is expected or not. If not I can address this
> and send some small fix for it.
> 
> When I read the cooling device residency statistics, I don't
> get updates for the first time the state is used. It can only
> be counted when that state was known and finished it's usage.
> 
> IMO it is not the right behavior, isn't it?

Do you mean the right behavior is a regression or we should expect at 
least the residency to be showed even if the mitigation state is not 
closed ?
Rafael J. Wysocki April 22, 2024, 3:48 p.m. UTC | #4
On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 22/04/2024 13:37, Lukasz Luba wrote:
> > Hi Rafael,
> >
> > On 4/17/24 14:07, Rafael J. Wysocki wrote:
> >> Hi Everyone,
> >>
> >> The first patch in this series addresses the problem of updating trip
> >> point statistics prematurely for trip points that have just been
> >> crossed on the way down (please see the patch changelog for details).
> >>
> >> The way it does that renders the following cleanup patch inapplicable:
> >>
> >> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
> >>
> >> The remaining two patches in the series are cleanups on top of the
> >> first one.
> >>
> >> This series is based on an older patch series posted last week:
> >>
> >> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
> >>
> >> but it can be trivially rebased on top of the current linux-next.
> >>
> >> Thanks!
> >>
> >>
> >>
> >
> > I've checked this patch patch set on top of your bleeding-edge
> > which has thermal re-work as well. The patch set looks good
> > and works properly.
> >
> > Although, I have found some issue in this debug info files and
> > I'm not sure if this is expected or not. If not I can address this
> > and send some small fix for it.
> >
> > When I read the cooling device residency statistics, I don't
> > get updates for the first time the state is used. It can only
> > be counted when that state was known and finished it's usage.
> >
> > IMO it is not the right behavior, isn't it?
>
> Do you mean the right behavior is a regression

It has not changed AFAICS.

> or we should expect at least the residency to be showed even if the
> mitigation state is not closed ?

Well, in fact the device has already been in that state for some time
and the mitigation can continue for a while.
Daniel Lezcano April 22, 2024, 4:12 p.m. UTC | #5
On 22/04/2024 17:48, Rafael J. Wysocki wrote:
> On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano

[ ... ]

>> or we should expect at least the residency to be showed even if the
>> mitigation state is not closed ?
> 
> Well, in fact the device has already been in that state for some time
> and the mitigation can continue for a while.

Yes, but when to update the residency time ?

When we cross a trip point point ?

or

When we read the information ?

The former is what we are currently doing AFAIR and the latter must add 
the delta between the last update and the current time for the current 
state, right ?
Rafael J. Wysocki April 23, 2024, 12:26 p.m. UTC | #6
On Mon, Apr 22, 2024 at 6:12 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 22/04/2024 17:48, Rafael J. Wysocki wrote:
> > On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
>
> [ ... ]
>
> >> or we should expect at least the residency to be showed even if the
> >> mitigation state is not closed ?
> >
> > Well, in fact the device has already been in that state for some time
> > and the mitigation can continue for a while.
>
> Yes, but when to update the residency time ?
>
> When we cross a trip point point ?
>
> or
>
> When we read the information ?
>
> The former is what we are currently doing AFAIR

Not really.

Records are added by thermal_debug_cdev_state_update() which only runs
when __thermal_cdev_update() is called, ie. from governors.

Moreover, it assumes the initial state to be 0 and checks if the new
state is equal to the current one before doing anything else, so it
will not make a record for the 0 state until the first transition.

> and the latter must add the delta between the last update and the current time for the current
> state, right ?

Yes, and it is doing this already AFAICS.

The problem is that it only creates a record for the old state, so if
the new one is seen for the first time, there will be no record for it
until it changes to some other state.

The appended patch (modulo GMail-induced white space breakage) should
help with this, but the initial state handling needs to be addressed
separately.

---
 drivers/thermal/thermal_debugfs.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
     }

     cdev_dbg->current_state = new_state;
+
+    /*
+     * Create a record for the new state if it is not there, so its
+     * duration will be printed by cdev_dt_seq_show() as expected if it
+     * runs before the next state transition.
+     */
+    thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations,
new_state);
+
     transition = (old_state << 16) | new_state;

     /*
Lukasz Luba April 23, 2024, 1:38 p.m. UTC | #7
On 4/23/24 13:26, Rafael J. Wysocki wrote:
> On Mon, Apr 22, 2024 at 6:12 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 22/04/2024 17:48, Rafael J. Wysocki wrote:
>>> On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
>>
>> [ ... ]
>>
>>>> or we should expect at least the residency to be showed even if the
>>>> mitigation state is not closed ?
>>>
>>> Well, in fact the device has already been in that state for some time
>>> and the mitigation can continue for a while.
>>
>> Yes, but when to update the residency time ?
>>
>> When we cross a trip point point ?
>>
>> or
>>
>> When we read the information ?
>>
>> The former is what we are currently doing AFAIR
> 
> Not really.
> 
> Records are added by thermal_debug_cdev_state_update() which only runs
> when __thermal_cdev_update() is called, ie. from governors.
> 
> Moreover, it assumes the initial state to be 0 and checks if the new
> state is equal to the current one before doing anything else, so it
> will not make a record for the 0 state until the first transition.

Correct, AFAIKS.

> 
>> and the latter must add the delta between the last update and the current time for the current
>> state, right ?
> 
> Yes, and it is doing this already AFAICS.
> 
> The problem is that it only creates a record for the old state, so if
> the new one is seen for the first time, there will be no record for it
> until it changes to some other state.

Exactly, it's not totally wrong what we have now, just some missing part
that needs to be added in the code, while we are counting/updating
these stats.

> 
> The appended patch (modulo GMail-induced white space breakage) should
> help with this, but the initial state handling needs to be addressed
> separately.
> 
> ---
>   drivers/thermal/thermal_debugfs.c |    8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
>       }
> 
>       cdev_dbg->current_state = new_state;
> +
> +    /*
> +     * Create a record for the new state if it is not there, so its
> +     * duration will be printed by cdev_dt_seq_show() as expected if it
> +     * runs before the next state transition.
> +     */
> +    thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations,
> new_state);
> +
>       transition = (old_state << 16) | new_state;
> 
>       /*

Yes, something like this should do the trick. We will get the record
entry in the list, so we at least enter the list loop in the
cdev_dt_seq_show().