diff mbox series

[RFC,5/5] ACPI: processor: do update when maximum cooling state changed

Message ID 20200408041917.2329-5-rui.zhang@intel.com (mailing list archive)
State Rejected
Delegated to: Zhang Rui
Headers show
Series [RFC,1/5] thermal: rename thermal_cooling_device_stats_update() | expand

Commit Message

Zhang, Rui April 8, 2020, 4:19 a.m. UTC
ACPI processor cooling device supports 1 cooling state before cpufreq
driver probed, and 4 cooling states after cpufreq driver probed.
Thus update the statistics table when the cpufeq driver is
probed/unprobed.

This fixes an OOB issue when updating the statistics of the processor
cooling device.

Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 drivers/acpi/processor_thermal.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Daniel Lezcano April 9, 2020, 1:34 p.m. UTC | #1
Hi Rui,


On 08/04/2020 06:19, Zhang Rui wrote:
> ACPI processor cooling device supports 1 cooling state before cpufreq
> driver probed, and 4 cooling states after cpufreq driver probed.

What is this one state ?

> Thus update the statistics table when the cpufeq driver is
> probed/unprobed.

To be honest, the series seems to skirt a problem in the acpi processor.

If there is a new policy, then there is a new cooling device. Why not
unregister the old one and register the new one ?


> This fixes an OOB issue when updating the statistics of the processor
> cooling device.
> 
> Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
> Reported-by: Takashi Iwai <tiwai@suse.de>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> ---
>  drivers/acpi/processor_thermal.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c
> index 41feb88ee92d..179d1b50ee2b 100644
> --- a/drivers/acpi/processor_thermal.c
> +++ b/drivers/acpi/processor_thermal.c
> @@ -142,6 +142,7 @@ void acpi_thermal_cpufreq_init(struct cpufreq_policy *policy)
>  		if (ret < 0)
>  			pr_err("Failed to add freq constraint for CPU%d (%d)\n",
>  			       cpu, ret);
> +		thermal_cdev_stats_update_max(pr->cdev);
>  	}
>  }
>  
> @@ -154,6 +155,7 @@ void acpi_thermal_cpufreq_exit(struct cpufreq_policy *policy)
>  
>  		if (pr)
>  			freq_qos_remove_request(&pr->thermal_req);
> +		thermal_cdev_stats_update_max(pr->cdev);
>  	}
>  }
>  #else				/* ! CONFIG_CPU_FREQ */
>
Zhang, Rui April 10, 2020, 8:02 a.m. UTC | #2
On Thu, 2020-04-09 at 15:34 +0200, Daniel Lezcano wrote:
> Hi Rui,
> 
> 
> On 08/04/2020 06:19, Zhang Rui wrote:
> > ACPI processor cooling device supports 1 cooling state before
> > cpufreq
> > driver probed, and 4 cooling states after cpufreq driver probed.
> 
> What is this one state ?

One state means its original state and we can not set it to others.

ACPI processor cooling states are combined of p-state cooling states
and a couple of optional t-state cooling states.

Say, if we have a processor device supports 7 throttling states, it
actually supports 8 cooling states with cpufreq driver unprobed, and 11
cooling states with cpufreq driver probed.

> 
> > Thus update the statistics table when the cpufeq driver is
> > probed/unprobed.
> 
> To be honest, the series seems to skirt a problem in the acpi
> processor.
> 
> If there is a new policy, then there is a new cooling device. Why not
> unregister the old one and register the new one ?
> 
Good point.
IMO, the real problem is that do we support dynamic max_cooling_state
or not in the thermal framework.
Previously, I thought we don't have a hard rule of static
max_cooling_state because we always invoke .get_max_state() callback
when we need it. But after a second thought, actually we do have this
limitation. For example, when binding cooling devices to thermal zones,
the upper limit is set based on the return value of .get_max_state() at
the binding time, and we never update the upper limit later.
So this ACPI processor issue is not just about statistics table update
issue, we actually lose some of its cooling states.

Thus, a new max_state means that all the previous setting of the
cooling_device, i.e. all the thermal instances of the cooling device,
needs to get updated.

And to fix this, it's better to
a. unregister and re-register the cooling device as you suggested.
or
b. introduce an API that updates the cooling device entirely instead of
statistics table only.

For either of the above solutions, we'd better to cleanup the code to 
invoke .get_max_state() during registration/max_state_reset phase,
once, and then always use cached value later.
And plus, if we want to follow solution a), it's better to remove
.get_max_state() callback and use an integer instead so that every
driver knows this limitation.
I'd vote for solution a) if there is no soc thermal driver that may
return dynamic max_states.

Do I still miss something?

thanks,
rui
> 
> > This fixes an OOB issue when updating the statistics of the
> > processor
> > cooling device.
> > 
> > Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in
> > sysfs")
> > Reported-by: Takashi Iwai <tiwai@suse.de>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> >  drivers/acpi/processor_thermal.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/acpi/processor_thermal.c
> > b/drivers/acpi/processor_thermal.c
> > index 41feb88ee92d..179d1b50ee2b 100644
> > --- a/drivers/acpi/processor_thermal.c
> > +++ b/drivers/acpi/processor_thermal.c
> > @@ -142,6 +142,7 @@ void acpi_thermal_cpufreq_init(struct
> > cpufreq_policy *policy)
> >  		if (ret < 0)
> >  			pr_err("Failed to add freq constraint for CPU%d
> > (%d)\n",
> >  			       cpu, ret);
> > +		thermal_cdev_stats_update_max(pr->cdev);
> >  	}
> >  }
> >  
> > @@ -154,6 +155,7 @@ void acpi_thermal_cpufreq_exit(struct
> > cpufreq_policy *policy)
> >  
> >  		if (pr)
> >  			freq_qos_remove_request(&pr->thermal_req);
> > +		thermal_cdev_stats_update_max(pr->cdev);
> >  	}
> >  }
> >  #else				/* ! CONFIG_CPU_FREQ */
> > 
> 
>
Daniel Lezcano April 10, 2020, 12:10 p.m. UTC | #3
On 10/04/2020 10:02, Zhang Rui wrote:
> On Thu, 2020-04-09 at 15:34 +0200, Daniel Lezcano wrote:
>> Hi Rui,
>>
>>
>> On 08/04/2020 06:19, Zhang Rui wrote:
>>> ACPI processor cooling device supports 1 cooling state before
>>> cpufreq
>>> driver probed, and 4 cooling states after cpufreq driver probed.
>>
>> What is this one state ?
> 
> One state means its original state and we can not set it to others.
> 
> ACPI processor cooling states are combined of p-state cooling states
> and a couple of optional t-state cooling states.
> 
> Say, if we have a processor device supports 7 throttling states, it
> actually supports 8 cooling states with cpufreq driver unprobed, and 11
> cooling states with cpufreq driver probed.
> 
>>
>>> Thus update the statistics table when the cpufeq driver is
>>> probed/unprobed.
>>
>> To be honest, the series seems to skirt a problem in the acpi
>> processor.
>>
>> If there is a new policy, then there is a new cooling device. Why not
>> unregister the old one and register the new one ?
>>
> Good point.
> IMO, the real problem is that do we support dynamic max_cooling_state
> or not in the thermal framework.
> Previously, I thought we don't have a hard rule of static
> max_cooling_state because we always invoke .get_max_state() callback
> when we need it. But after a second thought, actually we do have this
> limitation. For example, when binding cooling devices to thermal zones,
> the upper limit is set based on the return value of .get_max_state() at
> the binding time, and we never update the upper limit later.
> So this ACPI processor issue is not just about statistics table update
> issue, we actually lose some of its cooling states.
> 
> Thus, a new max_state means that all the previous setting of the
> cooling_device, i.e. all the thermal instances of the cooling device,
> needs to get updated.
> 
> And to fix this, it's better to
> a. unregister and re-register the cooling device as you suggested.
> or
> b. introduce an API that updates the cooling device entirely instead of
> statistics table only.
> 
> For either of the above solutions, we'd better to cleanup the code to 
> invoke .get_max_state() during registration/max_state_reset phase,
> once, and then always use cached value later.
> And plus, if we want to follow solution a), it's better to remove
> .get_max_state() callback and use an integer instead so that every
> driver knows this limitation.
> I'd vote for solution a) if there is no soc thermal driver that may
> return dynamic max_states.
> 
> Do I still miss something?

I agree for the a) solution too.

But regarding the get_max_state() callback being converted to a integer,
the driver int3406_thermal.c computes the upper and lower limits which
are updated on a INT3406_BRIGHTNESS_LIMITS_CHANGED notification and
get_max_state() does uppper - lower.
Rafael J. Wysocki April 10, 2020, 2:10 p.m. UTC | #4
On Fri, Apr 10, 2020 at 10:02 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Thu, 2020-04-09 at 15:34 +0200, Daniel Lezcano wrote:
> > Hi Rui,
> >
> >
> > On 08/04/2020 06:19, Zhang Rui wrote:
> > > ACPI processor cooling device supports 1 cooling state before
> > > cpufreq
> > > driver probed, and 4 cooling states after cpufreq driver probed.
> >
> > What is this one state ?
>
> One state means its original state and we can not set it to others.
>
> ACPI processor cooling states are combined of p-state cooling states
> and a couple of optional t-state cooling states.
>
> Say, if we have a processor device supports 7 throttling states, it
> actually supports 8 cooling states with cpufreq driver unprobed, and 11
> cooling states with cpufreq driver probed.
>
> >
> > > Thus update the statistics table when the cpufeq driver is
> > > probed/unprobed.
> >
> > To be honest, the series seems to skirt a problem in the acpi
> > processor.
> >
> > If there is a new policy, then there is a new cooling device. Why not
> > unregister the old one and register the new one ?
> >
> Good point.
> IMO, the real problem is that do we support dynamic max_cooling_state
> or not in the thermal framework.
> Previously, I thought we don't have a hard rule of static
> max_cooling_state because we always invoke .get_max_state() callback
> when we need it. But after a second thought, actually we do have this
> limitation. For example, when binding cooling devices to thermal zones,
> the upper limit is set based on the return value of .get_max_state() at
> the binding time, and we never update the upper limit later.
> So this ACPI processor issue is not just about statistics table update
> issue, we actually lose some of its cooling states.
>
> Thus, a new max_state means that all the previous setting of the
> cooling_device, i.e. all the thermal instances of the cooling device,
> needs to get updated.
>
> And to fix this, it's better to
> a. unregister and re-register the cooling device as you suggested.
> or
> b. introduce an API that updates the cooling device entirely instead of
> statistics table only.
>
> For either of the above solutions, we'd better to cleanup the code to
> invoke .get_max_state() during registration/max_state_reset phase,
> once, and then always use cached value later.
> And plus, if we want to follow solution a), it's better to remove
> .get_max_state() callback and use an integer instead so that every
> driver knows this limitation.
> I'd vote for solution a) if there is no soc thermal driver that may
> return dynamic max_states.

I believe I mentioned one more option, which would be to introduce an
optional callback into struct thermal_cooling_device_ops to return the
maximum possible return value of .get_max_state(), say
.get_max_state_limit().  That would be used for the allocation of the
stats table and the drivers where the .get_max_state() return value
could not change might set the new callback to NULL.

Then, upon a change of the .get_max_state() return value, the driver
providing it would be responsible for rearranging the stats to reflect
the new set of available states.

Cheers!
Zhang, Rui April 11, 2020, 4:41 a.m. UTC | #5
On Fri, 2020-04-10 at 16:10 +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 10, 2020 at 10:02 AM Zhang Rui <rui.zhang@intel.com>
> wrote:
> > 
> > On Thu, 2020-04-09 at 15:34 +0200, Daniel Lezcano wrote:
> > > Hi Rui,
> > > 
> > > 
> > > On 08/04/2020 06:19, Zhang Rui wrote:
> > > > ACPI processor cooling device supports 1 cooling state before
> > > > cpufreq
> > > > driver probed, and 4 cooling states after cpufreq driver
> > > > probed.
> > > 
> > > What is this one state ?
> > 
> > One state means its original state and we can not set it to others.
> > 
> > ACPI processor cooling states are combined of p-state cooling
> > states
> > and a couple of optional t-state cooling states.
> > 
> > Say, if we have a processor device supports 7 throttling states, it
> > actually supports 8 cooling states with cpufreq driver unprobed,
> > and 11
> > cooling states with cpufreq driver probed.
> > 
> > > 
> > > > Thus update the statistics table when the cpufeq driver is
> > > > probed/unprobed.
> > > 
> > > To be honest, the series seems to skirt a problem in the acpi
> > > processor.
> > > 
> > > If there is a new policy, then there is a new cooling device. Why
> > > not
> > > unregister the old one and register the new one ?
> > > 
> > 
> > Good point.
> > IMO, the real problem is that do we support dynamic
> > max_cooling_state
> > or not in the thermal framework.
> > Previously, I thought we don't have a hard rule of static
> > max_cooling_state because we always invoke .get_max_state()
> > callback
> > when we need it. But after a second thought, actually we do have
> > this
> > limitation. For example, when binding cooling devices to thermal
> > zones,
> > the upper limit is set based on the return value of
> > .get_max_state() at
> > the binding time, and we never update the upper limit later.
> > So this ACPI processor issue is not just about statistics table
> > update
> > issue, we actually lose some of its cooling states.
> > 
> > Thus, a new max_state means that all the previous setting of the
> > cooling_device, i.e. all the thermal instances of the cooling
> > device,
> > needs to get updated.
> > 
> > And to fix this, it's better to
> > a. unregister and re-register the cooling device as you suggested.
> > or
> > b. introduce an API that updates the cooling device entirely
> > instead of
> > statistics table only.
> > 
> > For either of the above solutions, we'd better to cleanup the code
> > to
> > invoke .get_max_state() during registration/max_state_reset phase,
> > once, and then always use cached value later.
> > And plus, if we want to follow solution a), it's better to remove
> > .get_max_state() callback and use an integer instead so that every
> > driver knows this limitation.
> > I'd vote for solution a) if there is no soc thermal driver that may
> > return dynamic max_states.
> 
> I believe I mentioned one more option, which would be to introduce an
> optional callback into struct thermal_cooling_device_ops to return
> the
> maximum possible return value of .get_max_state(), say
> .get_max_state_limit().  That would be used for the allocation of the
> stats table and the drivers where the .get_max_state() return value
> could not change might set the new callback to NULL.
> 

For a dynamic max_state cooling device, now the problem is not just
about the statistics table.
Take the ACPI processor cooling device for example, we set its internal
limit based on the return value of .get_max_state() during the cooling
device registration, thus if it returns 1, actually we can not use deep
cooling state later at all.

Plus, when a max_state changed, the meaning of each cooling state may
also actually changed. Say, for ACPI processor, cooling state 2 means
60% of max_freq with cpufreq driver, but means ACPI processor T2
throttle state with cpuferq driver unprobed.

That says, if we don't do a full unregistration and registration in
this case, at least we need to keep the device node but re-evaluate
.get_max_state() and update all its thermal instances, as long as the
statistics table change.

thanks,
rui
> Then, upon a change of the .get_max_state() return value, the driver
> providing it would be responsible for rearranging the stats to
> reflect
> the new set of available states.
> 
> Cheers!
Zhang, Rui April 12, 2020, 6:13 a.m. UTC | #6
On Fri, 2020-04-10 at 14:10 +0200, Daniel Lezcano wrote:
> On 10/04/2020 10:02, Zhang Rui wrote:
> > On Thu, 2020-04-09 at 15:34 +0200, Daniel Lezcano wrote:
> > > Hi Rui,
> > > 
> > > 
> > > On 08/04/2020 06:19, Zhang Rui wrote:
> > > > ACPI processor cooling device supports 1 cooling state before
> > > > cpufreq
> > > > driver probed, and 4 cooling states after cpufreq driver
> > > > probed.
> > > 
> > > What is this one state ?
> > 
> > One state means its original state and we can not set it to others.
> > 
> > ACPI processor cooling states are combined of p-state cooling
> > states
> > and a couple of optional t-state cooling states.
> > 
> > Say, if we have a processor device supports 7 throttling states, it
> > actually supports 8 cooling states with cpufreq driver unprobed,
> > and 11
> > cooling states with cpufreq driver probed.
> > 
> > > 
> > > > Thus update the statistics table when the cpufeq driver is
> > > > probed/unprobed.
> > > 
> > > To be honest, the series seems to skirt a problem in the acpi
> > > processor.
> > > 
> > > If there is a new policy, then there is a new cooling device. Why
> > > not
> > > unregister the old one and register the new one ?
> > > 
> > 
> > Good point.
> > IMO, the real problem is that do we support dynamic
> > max_cooling_state
> > or not in the thermal framework.
> > Previously, I thought we don't have a hard rule of static
> > max_cooling_state because we always invoke .get_max_state()
> > callback
> > when we need it. But after a second thought, actually we do have
> > this
> > limitation. For example, when binding cooling devices to thermal
> > zones,
> > the upper limit is set based on the return value of
> > .get_max_state() at
> > the binding time, and we never update the upper limit later.
> > So this ACPI processor issue is not just about statistics table
> > update
> > issue, we actually lose some of its cooling states.
> > 
> > Thus, a new max_state means that all the previous setting of the
> > cooling_device, i.e. all the thermal instances of the cooling
> > device,
> > needs to get updated.
> > 
> > And to fix this, it's better to
> > a. unregister and re-register the cooling device as you suggested.
> > or
> > b. introduce an API that updates the cooling device entirely
> > instead of
> > statistics table only.
> > 
> > For either of the above solutions, we'd better to cleanup the code
> > to 
> > invoke .get_max_state() during registration/max_state_reset phase,
> > once, and then always use cached value later.
> > And plus, if we want to follow solution a), it's better to remove
> > .get_max_state() callback and use an integer instead so that every
> > driver knows this limitation.
> > I'd vote for solution a) if there is no soc thermal driver that may
> > return dynamic max_states.
> > 
> > Do I still miss something?
> 
> I agree for the a) solution too.
> 
> But regarding the get_max_state() callback being converted to a
> integer,
> the driver int3406_thermal.c computes the upper and lower limits
> which
> are updated on a INT3406_BRIGHTNESS_LIMITS_CHANGED notification and
> get_max_state() does uppper - lower.

Right, this is another case shows that it's better to support dynamic
max_state.
IMO, this is not difficult to do. We just need to introduce a new API,
which reuses the current cdev device, and reset its every thermal
instance, and update all the thermal zones the cdev is involved.
what do you think?

thanks,
rui
Daniel Lezcano April 12, 2020, 10:07 a.m. UTC | #7
Hi Rui,

On 12/04/2020 08:13, Zhang Rui wrote:
> On Fri, 2020-04-10 at 14:10 +0200, Daniel Lezcano wrote:

[ ... ]

>>> And to fix this, it's better to
>>> a. unregister and re-register the cooling device as you suggested.
>>> or
>>> b. introduce an API that updates the cooling device entirely
>>> instead of
>>> statistics table only.
>>>
>>> For either of the above solutions, we'd better to cleanup the code
>>> to 
>>> invoke .get_max_state() during registration/max_state_reset phase,
>>> once, and then always use cached value later.
>>> And plus, if we want to follow solution a), it's better to remove
>>> .get_max_state() callback and use an integer instead so that every
>>> driver knows this limitation.
>>> I'd vote for solution a) if there is no soc thermal driver that may
>>> return dynamic max_states.
>>>
>>> Do I still miss something?
>>
>> I agree for the a) solution too.
>>
>> But regarding the get_max_state() callback being converted to a
>> integer,
>> the driver int3406_thermal.c computes the upper and lower limits
>> which
>> are updated on a INT3406_BRIGHTNESS_LIMITS_CHANGED notification and
>> get_max_state() does uppper - lower.
> 
> Right, this is another case shows that it's better to support dynamic
> max_state.
> IMO, this is not difficult to do. We just need to introduce a new API,
> which reuses the current cdev device, and reset its every thermal
> instance, and update all the thermal zones the cdev is involved.
> what do you think?

I like how the thermal framework is designed but I think there are too
many API for the thermal framework and it deserves a simplification
rather than adding more of them.

There is no place where the get_max_state is cached except in the stats
structure.

In the function thermal_cooling_device_stats_update():

  Is it possible to just compare the 'new_state' parameter with
stats->max_state and if it is greater increase the stats table and
update max_state to the new_state ?
Zhang, Rui April 13, 2020, 2:01 a.m. UTC | #8
On Sun, 2020-04-12 at 12:07 +0200, Daniel Lezcano wrote:
> 
> 
> 
> 
> Hi Rui,
> 
> On 12/04/2020 08:13, Zhang Rui wrote:
> > On Fri, 2020-04-10 at 14:10 +0200, Daniel Lezcano wrote:
> 
> [ ... ]
> 
> > > > And to fix this, it's better to
> > > > a. unregister and re-register the cooling device as you
> > > > suggested.
> > > > or
> > > > b. introduce an API that updates the cooling device entirely
> > > > instead of
> > > > statistics table only.
> > > > 
> > > > For either of the above solutions, we'd better to cleanup the
> > > > code
> > > > to 
> > > > invoke .get_max_state() during registration/max_state_reset
> > > > phase,
> > > > once, and then always use cached value later.
> > > > And plus, if we want to follow solution a), it's better to
> > > > remove
> > > > .get_max_state() callback and use an integer instead so that
> > > > every
> > > > driver knows this limitation.
> > > > I'd vote for solution a) if there is no soc thermal driver that
> > > > may
> > > > return dynamic max_states.
> > > > 
> > > > Do I still miss something?
> > > 
> > > I agree for the a) solution too.
> > > 
> > > But regarding the get_max_state() callback being converted to a
> > > integer,
> > > the driver int3406_thermal.c computes the upper and lower limits
> > > which
> > > are updated on a INT3406_BRIGHTNESS_LIMITS_CHANGED notification
> > > and
> > > get_max_state() does uppper - lower.
> > 
> > Right, this is another case shows that it's better to support
> > dynamic
> > max_state.
> > IMO, this is not difficult to do. We just need to introduce a new
> > API,
> > which reuses the current cdev device, and reset its every thermal
> > instance, and update all the thermal zones the cdev is involved.
> > what do you think?
> 
> I like how the thermal framework is designed but I think there are
> too
> many API for the thermal framework and it deserves a simplification
> rather than adding more of them.

I agree.
> 
> There is no place where the get_max_state is cached except in the
> stats
> structure.

why we can not have a cdev->max_state field, and get it updated right
after .get_max_state().
and .get_max_state()  is only invoked
a) during cooling device registration
b) when cooling device update its max_state via the new API.

> 
> In the function thermal_cooling_device_stats_update():
> 
>   Is it possible to just compare the 'new_state' parameter with
> stats->max_state and if it is greater increase the stats table and
> update max_state to the new_state ?
> 
the problem is that thermal_cooling_device_stats_update() is invoked
only if thermal zone are updated or the cur_state sysfs attribute is
changed.
There is no way for a cooling device driver to tell thermal framework
that it has changed.
Say, for the problem on hand, the statistics table will not be updated
in time when cpufreq driver probed.

thanks,
rui
Daniel Lezcano April 13, 2020, 6:06 p.m. UTC | #9
On 13/04/2020 04:01, Zhang Rui wrote:
> On Sun, 2020-04-12 at 12:07 +0200, Daniel Lezcano wrote:

[ ... ]

> why we can not have a cdev->max_state field, and get it updated right
> after .get_max_state().
> and .get_max_state()  is only invoked
> a) during cooling device registration
> b) when cooling device update its max_state via the new API.
> 
>>
>> In the function thermal_cooling_device_stats_update():
>>
>>   Is it possible to just compare the 'new_state' parameter with
>> stats->max_state and if it is greater increase the stats table and
>> update max_state to the new_state ?
>>
> the problem is that thermal_cooling_device_stats_update() is invoked
> only if thermal zone are updated or the cur_state sysfs attribute is
> changed.
> There is no way for a cooling device driver to tell thermal framework
> that it has changed.
> Say, for the problem on hand, the statistics table will not be updated
> in time when cpufreq driver probed.

Except I'm missing something, the statistics are only read from
userspace via sysfs.

userspace is not notified about a stat change. Is it really a problem
the table is not updated right at the probe time ? Does it really matter
if the user sees the old table until an update happens on a new higher
max state ?

The table is always consistent whenever the userspace reads the content.

A new entry will appear only if it is used, no?
Zhang, Rui April 16, 2020, 4:46 a.m. UTC | #10
On Mon, 2020-04-13 at 20:06 +0200, Daniel Lezcano wrote:
> On 13/04/2020 04:01, Zhang Rui wrote:
> > On Sun, 2020-04-12 at 12:07 +0200, Daniel Lezcano wrote:
> 
> [ ... ]
> 
> > why we can not have a cdev->max_state field, and get it updated
> > right
> > after .get_max_state().
> > and .get_max_state()  is only invoked
> > a) during cooling device registration
> > b) when cooling device update its max_state via the new API.
> > 
> > > 
> > > In the function thermal_cooling_device_stats_update():
> > > 
> > >   Is it possible to just compare the 'new_state' parameter with
> > > stats->max_state and if it is greater increase the stats table
> > > and
> > > update max_state to the new_state ?
> > > 
> > 
> > the problem is that thermal_cooling_device_stats_update() is
> > invoked
> > only if thermal zone are updated or the cur_state sysfs attribute
> > is
> > changed.
> > There is no way for a cooling device driver to tell thermal
> > framework
> > that it has changed.
> > Say, for the problem on hand, the statistics table will not be
> > updated
> > in time when cpufreq driver probed.
> 
> Except I'm missing something, the statistics are only read from
> userspace via sysfs.

I agree.
> 
> userspace is not notified about a stat change. Is it really a problem
> the table is not updated right at the probe time ?

>  Does it really matter
> if the user sees the old table until an update happens on a new
> higher
> max state ?
> 
> The table is always consistent whenever the userspace reads the
> content.

> 
> A new entry will appear only if it is used, no?
> 
Hmm, IMO, stats table is not the biggest problem here.
The problem is that thermal framework is not aware of the max_state
change, and the thermal instances are never updated according to the
new max_state.
So, we should invoke .get_max_state() in thermal_zone_device_update()
and update the thermal instances accordingly.
And then, what we need to do is just to do stats update right after
.get_max_state() being invoked.

About how to update the stats table, I think adding new entries is not
enough, because the meaning of each cooling state may change when
max_state changes, thus I'd prefer a full reset/resizing of the table.

thanks,
rui
> 
> 
> 
> 
>
Daniel Lezcano April 16, 2020, 7:58 a.m. UTC | #11
On 16/04/2020 06:46, Zhang Rui wrote:
> On Mon, 2020-04-13 at 20:06 +0200, Daniel Lezcano wrote:
>> On 13/04/2020 04:01, Zhang Rui wrote:
>>> On Sun, 2020-04-12 at 12:07 +0200, Daniel Lezcano wrote:
>>
>> [ ... ]
>>
>>> why we can not have a cdev->max_state field, and get it updated
>>> right
>>> after .get_max_state().
>>> and .get_max_state()  is only invoked
>>> a) during cooling device registration
>>> b) when cooling device update its max_state via the new API.
>>>
>>>>
>>>> In the function thermal_cooling_device_stats_update():
>>>>
>>>>   Is it possible to just compare the 'new_state' parameter with
>>>> stats->max_state and if it is greater increase the stats table
>>>> and
>>>> update max_state to the new_state ?
>>>>
>>>
>>> the problem is that thermal_cooling_device_stats_update() is
>>> invoked
>>> only if thermal zone are updated or the cur_state sysfs attribute
>>> is
>>> changed.
>>> There is no way for a cooling device driver to tell thermal
>>> framework
>>> that it has changed.
>>> Say, for the problem on hand, the statistics table will not be
>>> updated
>>> in time when cpufreq driver probed.
>>
>> Except I'm missing something, the statistics are only read from
>> userspace via sysfs.
> 
> I agree.
>>
>> userspace is not notified about a stat change. Is it really a problem
>> the table is not updated right at the probe time ?
> 
>>  Does it really matter
>> if the user sees the old table until an update happens on a new
>> higher
>> max state ?
>>
>> The table is always consistent whenever the userspace reads the
>> content.
> 
>>
>> A new entry will appear only if it is used, no?
>>
> Hmm, IMO, stats table is not the biggest problem here.
> The problem is that thermal framework is not aware of the max_state
> change, and the thermal instances are never updated according to the
> new max_state.
> So, we should invoke .get_max_state() in thermal_zone_device_update()
> and update the thermal instances accordingly.
> And then, what we need to do is just to do stats update right after
> .get_max_state() being invoked.
> 
> About how to update the stats table, I think adding new entries is not
> enough, because the meaning of each cooling state may change when
> max_state changes, thus I'd prefer a full reset/resizing of the table.

Ok, I understand. Are planning to use the existing API:

 thermal_zone_device_update(tz, THERMAL_TABLE_CHANGED);

?
Zhang, Rui April 17, 2020, 2:09 a.m. UTC | #12
On Thu, 2020-04-16 at 09:58 +0200, Daniel Lezcano wrote:
> On 16/04/2020 06:46, Zhang Rui wrote:
> > On Mon, 2020-04-13 at 20:06 +0200, Daniel Lezcano wrote:
> > > On 13/04/2020 04:01, Zhang Rui wrote:
> > > > On Sun, 2020-04-12 at 12:07 +0200, Daniel Lezcano wrote:
> > > 
> > > [ ... ]
> > > 
> > > > why we can not have a cdev->max_state field, and get it updated
> > > > right
> > > > after .get_max_state().
> > > > and .get_max_state()  is only invoked
> > > > a) during cooling device registration
> > > > b) when cooling device update its max_state via the new API.
> > > > 
> > > > > 
> > > > > In the function thermal_cooling_device_stats_update():
> > > > > 
> > > > >   Is it possible to just compare the 'new_state' parameter
> > > > > with
> > > > > stats->max_state and if it is greater increase the stats
> > > > > table
> > > > > and
> > > > > update max_state to the new_state ?
> > > > > 
> > > > 
> > > > the problem is that thermal_cooling_device_stats_update() is
> > > > invoked
> > > > only if thermal zone are updated or the cur_state sysfs
> > > > attribute
> > > > is
> > > > changed.
> > > > There is no way for a cooling device driver to tell thermal
> > > > framework
> > > > that it has changed.
> > > > Say, for the problem on hand, the statistics table will not be
> > > > updated
> > > > in time when cpufreq driver probed.
> > > 
> > > Except I'm missing something, the statistics are only read from
> > > userspace via sysfs.
> > 
> > I agree.
> > > 
> > > userspace is not notified about a stat change. Is it really a
> > > problem
> > > the table is not updated right at the probe time ?
> > >  Does it really matter
> > > if the user sees the old table until an update happens on a new
> > > higher
> > > max state ?
> > > 
> > > The table is always consistent whenever the userspace reads the
> > > content.
> > > 
> > > A new entry will appear only if it is used, no?
> > > 
> > 
> > Hmm, IMO, stats table is not the biggest problem here.
> > The problem is that thermal framework is not aware of the max_state
> > change, and the thermal instances are never updated according to
> > the
> > new max_state.
> > So, we should invoke .get_max_state() in
> > thermal_zone_device_update()
> > and update the thermal instances accordingly.
> > And then, what we need to do is just to do stats update right after
> > .get_max_state() being invoked.
> > 
> > About how to update the stats table, I think adding new entries is
> > not
> > enough, because the meaning of each cooling state may change when
> > max_state changes, thus I'd prefer a full reset/resizing of the
> > table.
> 
> Ok, I understand. Are planning to use the existing API:
> 
>  thermal_zone_device_update(tz, THERMAL_TABLE_CHANGED);

we can not use THERMAL_TABLE_CHANGED because cooling device driver can
not invoke thermal_zone_device_update().

So, the cooling device max_state is changed, but the thermal framework
will not know this until the cooling device being used for a
temperature change.
For the cooling devices that are not used by any thermal zones, the
stats files are probably out of date, but maybe nobody cares. Or we can
fix this by updating stats table for every stats sysfs attribute
access.

thanks,
rui
diff mbox series

Patch

diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c
index 41feb88ee92d..179d1b50ee2b 100644
--- a/drivers/acpi/processor_thermal.c
+++ b/drivers/acpi/processor_thermal.c
@@ -142,6 +142,7 @@  void acpi_thermal_cpufreq_init(struct cpufreq_policy *policy)
 		if (ret < 0)
 			pr_err("Failed to add freq constraint for CPU%d (%d)\n",
 			       cpu, ret);
+		thermal_cdev_stats_update_max(pr->cdev);
 	}
 }
 
@@ -154,6 +155,7 @@  void acpi_thermal_cpufreq_exit(struct cpufreq_policy *policy)
 
 		if (pr)
 			freq_qos_remove_request(&pr->thermal_req);
+		thermal_cdev_stats_update_max(pr->cdev);
 	}
 }
 #else				/* ! CONFIG_CPU_FREQ */