diff mbox series

[1/1] thermal/drivers/imx_sc_thermal: return -EAGAIN when SCFW turn off resource

Message ID 20230712210505.1536416-1-Frank.Li@nxp.com (mailing list archive)
State New
Delegated to: Daniel Lezcano
Headers show
Series [1/1] thermal/drivers/imx_sc_thermal: return -EAGAIN when SCFW turn off resource | expand

Commit Message

Frank Li July 12, 2023, 9:05 p.m. UTC
Avoid endless print following message when SCFW turns off resource.
 [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)

Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
 drivers/thermal/imx_sc_thermal.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Daniel Lezcano July 13, 2023, 12:49 p.m. UTC | #1
On 12/07/2023 23:05, Frank Li wrote:
> Avoid endless print following message when SCFW turns off resource.
>   [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> 
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> ---
>   drivers/thermal/imx_sc_thermal.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> index 8d6b4ef23746..0533d58f199f 100644
> --- a/drivers/thermal/imx_sc_thermal.c
> +++ b/drivers/thermal/imx_sc_thermal.c
> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
>   	hdr->size = 2;
>   
>   	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> -	if (ret)
> +	if (ret == -EPERM) /* NO POWER */
> +		return -EAGAIN;

Isn't there a chain call somewhere when the resource is turned off, so 
the thermal zone can be disabled?

> +	else if (ret)
>   		return ret;
>   
>   	*temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;
Frank Li July 14, 2023, 5:19 p.m. UTC | #2
On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> On 12/07/2023 23:05, Frank Li wrote:
> > Avoid endless print following message when SCFW turns off resource.
> >   [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > 
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> >   drivers/thermal/imx_sc_thermal.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > index 8d6b4ef23746..0533d58f199f 100644
> > --- a/drivers/thermal/imx_sc_thermal.c
> > +++ b/drivers/thermal/imx_sc_thermal.c
> > @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> >   	hdr->size = 2;
> >   	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > -	if (ret)
> > +	if (ret == -EPERM) /* NO POWER */
> > +		return -EAGAIN;
> 
> Isn't there a chain call somewhere when the resource is turned off, so the
> thermal zone can be disabled?

A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
am not sure if it is good to depend on "name", which add coupling between
two drivers and if there are external thermal devices(such as) has the
same name, it will wrong turn off.

If add power domain notification in thermal driver, I am not how to get
other devices's pd in thermal driver.

Any example I can refer?

Or this is simple enough solution. 

Frank

> 
> > +	else if (ret)
> >   		return ret;
> >   	*temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;
> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Daniel Lezcano Aug. 16, 2023, 8:44 a.m. UTC | #3
Hi Frank,

sorry for the delay

On 14/07/2023 19:19, Frank Li wrote:
> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
>> On 12/07/2023 23:05, Frank Li wrote:
>>> Avoid endless print following message when SCFW turns off resource.
>>>    [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
>>>
>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>>> ---
>>>    drivers/thermal/imx_sc_thermal.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
>>> index 8d6b4ef23746..0533d58f199f 100644
>>> --- a/drivers/thermal/imx_sc_thermal.c
>>> +++ b/drivers/thermal/imx_sc_thermal.c
>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
>>>    	hdr->size = 2;
>>>    	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
>>> -	if (ret)
>>> +	if (ret == -EPERM) /* NO POWER */
>>> +		return -EAGAIN;
>>
>> Isn't there a chain call somewhere when the resource is turned off, so the
>> thermal zone can be disabled?
> 
> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> am not sure if it is good to depend on "name", which add coupling between
> two drivers and if there are external thermal devices(such as) has the
> same name, it will wrong turn off.

Correct

> If add power domain notification in thermal driver, I am not how to get
> other devices's pd in thermal driver.
> 
> Any example I can refer?
> 
> Or this is simple enough solution.

The solution works for removing the error message but it does not solve 
the root cause of the issue. The thermal zone keeps monitoring while the 
sensor is down.

So the question is why the sensor is shut down if it is in use?



>>
>>> +	else if (ret)
>>>    		return ret;
>>>    	*temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;
>>
>> -- 
>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
Frank Li Aug. 16, 2023, 4:28 p.m. UTC | #4
On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> 
> Hi Frank,
> 
> sorry for the delay
> 
> On 14/07/2023 19:19, Frank Li wrote:
> > On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > > On 12/07/2023 23:05, Frank Li wrote:
> > > > Avoid endless print following message when SCFW turns off resource.
> > > >    [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > > > 
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> > > >    drivers/thermal/imx_sc_thermal.c | 4 +++-
> > > >    1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > > > index 8d6b4ef23746..0533d58f199f 100644
> > > > --- a/drivers/thermal/imx_sc_thermal.c
> > > > +++ b/drivers/thermal/imx_sc_thermal.c
> > > > @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > > >    	hdr->size = 2;
> > > >    	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > > > -	if (ret)
> > > > +	if (ret == -EPERM) /* NO POWER */
> > > > +		return -EAGAIN;
> > > 
> > > Isn't there a chain call somewhere when the resource is turned off, so the
> > > thermal zone can be disabled?
> > 
> > A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > am not sure if it is good to depend on "name", which add coupling between
> > two drivers and if there are external thermal devices(such as) has the
> > same name, it will wrong turn off.
> 
> Correct
> 
> > If add power domain notification in thermal driver, I am not how to get
> > other devices's pd in thermal driver.
> > 
> > Any example I can refer?
> > 
> > Or this is simple enough solution.
> 
> The solution works for removing the error message but it does not solve the
> root cause of the issue. The thermal zone keeps monitoring while the sensor
> is down.
> 
> So the question is why the sensor is shut down if it is in use?

Do you know if there are any code I reference? I supposed it is quite common.

Frank

> 
> 
> 
> > > 
> > > > +	else if (ret)
> > > >    		return ret;
> > > >    	*temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;
> > > 
> > > -- 
> > > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> > > 
> > > Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> > > <http://twitter.com/#!/linaroorg> Twitter |
> > > <http://www.linaro.org/linaro-blog/> Blog
> > > 
> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Daniel Lezcano Aug. 16, 2023, 4:47 p.m. UTC | #5
On 16/08/2023 18:28, Frank Li wrote:
> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
>>
>> Hi Frank,
>>
>> sorry for the delay
>>
>> On 14/07/2023 19:19, Frank Li wrote:
>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
>>>> On 12/07/2023 23:05, Frank Li wrote:
>>>>> Avoid endless print following message when SCFW turns off resource.
>>>>>     [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
>>>>>
>>>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>>>>> ---
>>>>>     drivers/thermal/imx_sc_thermal.c | 4 +++-
>>>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
>>>>> index 8d6b4ef23746..0533d58f199f 100644
>>>>> --- a/drivers/thermal/imx_sc_thermal.c
>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
>>>>>     	hdr->size = 2;
>>>>>     	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
>>>>> -	if (ret)
>>>>> +	if (ret == -EPERM) /* NO POWER */
>>>>> +		return -EAGAIN;
>>>>
>>>> Isn't there a chain call somewhere when the resource is turned off, so the
>>>> thermal zone can be disabled?
>>>
>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
>>> am not sure if it is good to depend on "name", which add coupling between
>>> two drivers and if there are external thermal devices(such as) has the
>>> same name, it will wrong turn off.
>>
>> Correct
>>
>>> If add power domain notification in thermal driver, I am not how to get
>>> other devices's pd in thermal driver.
>>>
>>> Any example I can refer?
>>>
>>> Or this is simple enough solution.
>>
>> The solution works for removing the error message but it does not solve the
>> root cause of the issue. The thermal zone keeps monitoring while the sensor
>> is down.
>>
>> So the question is why the sensor is shut down if it is in use?
> 
> Do you know if there are any code I reference? I supposed it is quite common.

Sorry, I don't get your comment

What I meant is why is the sensor turned off if it is in use ?
Frank Li Aug. 16, 2023, 5:07 p.m. UTC | #6
On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> On 16/08/2023 18:28, Frank Li wrote:
> > On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> > > 
> > > Hi Frank,
> > > 
> > > sorry for the delay
> > > 
> > > On 14/07/2023 19:19, Frank Li wrote:
> > > > On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > > > > On 12/07/2023 23:05, Frank Li wrote:
> > > > > > Avoid endless print following message when SCFW turns off resource.
> > > > > >     [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > > > > > 
> > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > ---
> > > > > >     drivers/thermal/imx_sc_thermal.c | 4 +++-
> > > > > >     1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > > > > > index 8d6b4ef23746..0533d58f199f 100644
> > > > > > --- a/drivers/thermal/imx_sc_thermal.c
> > > > > > +++ b/drivers/thermal/imx_sc_thermal.c
> > > > > > @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > > > > >     	hdr->size = 2;
> > > > > >     	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > > > > > -	if (ret)
> > > > > > +	if (ret == -EPERM) /* NO POWER */
> > > > > > +		return -EAGAIN;
> > > > > 
> > > > > Isn't there a chain call somewhere when the resource is turned off, so the
> > > > > thermal zone can be disabled?
> > > > 
> > > > A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > > > get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > > > am not sure if it is good to depend on "name", which add coupling between
> > > > two drivers and if there are external thermal devices(such as) has the
> > > > same name, it will wrong turn off.
> > > 
> > > Correct
> > > 
> > > > If add power domain notification in thermal driver, I am not how to get
> > > > other devices's pd in thermal driver.
> > > > 
> > > > Any example I can refer?
> > > > 
> > > > Or this is simple enough solution.
> > > 
> > > The solution works for removing the error message but it does not solve the
> > > root cause of the issue. The thermal zone keeps monitoring while the sensor
> > > is down.
> > > 
> > > So the question is why the sensor is shut down if it is in use?
> > 
> > Do you know if there are any code I reference? I supposed it is quite common.
> 
> Sorry, I don't get your comment
> 
> What I meant is why is the sensor turned off if it is in use ?

One typical example is cpu hotplug. The sensor is located CPU power domain.
If CPU hotplug off,  CPU power domain will be turn off.

It doesn't make sensor keep monitor such sensor when CPU already power off.
It doesn't make sensor to keep CPU power on just because want to get sensor
data.

Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
work.  GPU1 may turn off when less loading.

Ideally, thermal can get notification from power domain driver.
when such power domain turn off,  disable thermal zone.

So far, I have not idea how to do that.

> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Daniel Lezcano Aug. 16, 2023, 8:45 p.m. UTC | #7
On 16/08/2023 19:07, Frank Li wrote:
> On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
>> On 16/08/2023 18:28, Frank Li wrote:
>>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
>>>>
>>>> Hi Frank,
>>>>
>>>> sorry for the delay
>>>>
>>>> On 14/07/2023 19:19, Frank Li wrote:
>>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
>>>>>> On 12/07/2023 23:05, Frank Li wrote:
>>>>>>> Avoid endless print following message when SCFW turns off resource.
>>>>>>>      [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
>>>>>>>
>>>>>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>>>>>>> ---
>>>>>>>      drivers/thermal/imx_sc_thermal.c | 4 +++-
>>>>>>>      1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
>>>>>>> index 8d6b4ef23746..0533d58f199f 100644
>>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
>>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
>>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
>>>>>>>      	hdr->size = 2;
>>>>>>>      	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
>>>>>>> -	if (ret)
>>>>>>> +	if (ret == -EPERM) /* NO POWER */
>>>>>>> +		return -EAGAIN;
>>>>>>
>>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
>>>>>> thermal zone can be disabled?
>>>>>
>>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
>>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
>>>>> am not sure if it is good to depend on "name", which add coupling between
>>>>> two drivers and if there are external thermal devices(such as) has the
>>>>> same name, it will wrong turn off.
>>>>
>>>> Correct
>>>>
>>>>> If add power domain notification in thermal driver, I am not how to get
>>>>> other devices's pd in thermal driver.
>>>>>
>>>>> Any example I can refer?
>>>>>
>>>>> Or this is simple enough solution.
>>>>
>>>> The solution works for removing the error message but it does not solve the
>>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
>>>> is down.
>>>>
>>>> So the question is why the sensor is shut down if it is in use?
>>>
>>> Do you know if there are any code I reference? I supposed it is quite common.
>>
>> Sorry, I don't get your comment
>>
>> What I meant is why is the sensor turned off if it is in use ?
> 
> One typical example is cpu hotplug. The sensor is located CPU power domain.
> If CPU hotplug off,  CPU power domain will be turn off.
> 
> It doesn't make sensor keep monitor such sensor when CPU already power off.
> It doesn't make sensor to keep CPU power on just because want to get sensor
> data.
> 
> Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> work.  GPU1 may turn off when less loading.
> 
> Ideally, thermal can get notification from power domain driver.
> when such power domain turn off,  disable thermal zone.
> 
> So far, I have not idea how to do that.

Ulf,

do you have a guidance to link the thermal zone and the power domain in 
order to get a poweron/off notification leading to enable/disable the 
thermal zone ?
Ulf Hansson Aug. 16, 2023, 9:23 p.m. UTC | #8
On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
> On 16/08/2023 19:07, Frank Li wrote:
> > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> >> On 16/08/2023 18:28, Frank Li wrote:
> >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> >>>>
> >>>> Hi Frank,
> >>>>
> >>>> sorry for the delay
> >>>>
> >>>> On 14/07/2023 19:19, Frank Li wrote:
> >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> >>>>>>> Avoid endless print following message when SCFW turns off resource.
> >>>>>>>      [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> >>>>>>>
> >>>>>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> >>>>>>> ---
> >>>>>>>      drivers/thermal/imx_sc_thermal.c | 4 +++-
> >>>>>>>      1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> >>>>>>>         hdr->size = 2;
> >>>>>>>         ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> >>>>>>> -       if (ret)
> >>>>>>> +       if (ret == -EPERM) /* NO POWER */
> >>>>>>> +               return -EAGAIN;
> >>>>>>
> >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> >>>>>> thermal zone can be disabled?
> >>>>>
> >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> >>>>> am not sure if it is good to depend on "name", which add coupling between
> >>>>> two drivers and if there are external thermal devices(such as) has the
> >>>>> same name, it will wrong turn off.
> >>>>
> >>>> Correct
> >>>>
> >>>>> If add power domain notification in thermal driver, I am not how to get
> >>>>> other devices's pd in thermal driver.
> >>>>>
> >>>>> Any example I can refer?
> >>>>>
> >>>>> Or this is simple enough solution.
> >>>>
> >>>> The solution works for removing the error message but it does not solve the
> >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> >>>> is down.
> >>>>
> >>>> So the question is why the sensor is shut down if it is in use?
> >>>
> >>> Do you know if there are any code I reference? I supposed it is quite common.
> >>
> >> Sorry, I don't get your comment
> >>
> >> What I meant is why is the sensor turned off if it is in use ?
> >
> > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > If CPU hotplug off,  CPU power domain will be turn off.
> >
> > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > It doesn't make sensor to keep CPU power on just because want to get sensor
> > data.
> >
> > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > work.  GPU1 may turn off when less loading.
> >
> > Ideally, thermal can get notification from power domain driver.
> > when such power domain turn off,  disable thermal zone.
> >
> > So far, I have not idea how to do that.
>
> Ulf,
>
> do you have a guidance to link the thermal zone and the power domain in
> order to get a poweron/off notification leading to enable/disable the
> thermal zone ?

I don't know the details here, so apologize for my ignorance to start
with. What platform is this?

A vague idea could be to hook up the thermal sensor to the
corresponding CPU power domain. Assuming the CPU power domain is
modelled as a genpd provider, then this allows the driver for the
thermal sensor to register for power-on/off notifications of the genpd
(see dev_pm_genpd_add_notifier()).

Can this work?

Kind regards
Uffe
Daniel Lezcano Aug. 17, 2023, 3:22 p.m. UTC | #9
Hi Ulf,

thanks for your answer

On 16/08/2023 23:23, Ulf Hansson wrote:
> On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:

[ ... ]

>>>>>>> If add power domain notification in thermal driver, I am not how to get
>>>>>>> other devices's pd in thermal driver.
>>>>>>>
>>>>>>> Any example I can refer?
>>>>>>>
>>>>>>> Or this is simple enough solution.
>>>>>>
>>>>>> The solution works for removing the error message but it does not solve the
>>>>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
>>>>>> is down.
>>>>>>
>>>>>> So the question is why the sensor is shut down if it is in use?
>>>>>
>>>>> Do you know if there are any code I reference? I supposed it is quite common.
>>>>
>>>> Sorry, I don't get your comment
>>>>
>>>> What I meant is why is the sensor turned off if it is in use ?
>>>
>>> One typical example is cpu hotplug. The sensor is located CPU power domain.
>>> If CPU hotplug off,  CPU power domain will be turn off.
>>>
>>> It doesn't make sensor keep monitor such sensor when CPU already power off.
>>> It doesn't make sensor to keep CPU power on just because want to get sensor
>>> data.
>>>
>>> Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
>>> work.  GPU1 may turn off when less loading.
>>>
>>> Ideally, thermal can get notification from power domain driver.
>>> when such power domain turn off,  disable thermal zone.
>>>
>>> So far, I have not idea how to do that.
>>
>> Ulf,
>>
>> do you have a guidance to link the thermal zone and the power domain in
>> order to get a poweron/off notification leading to enable/disable the
>> thermal zone ?
> 
> I don't know the details here, so apologize for my ignorance to start
> with. What platform is this?

I will let Frank answer this

> A vague idea could be to hook up the thermal sensor to the
> corresponding CPU power domain. Assuming the CPU power domain is
> modelled as a genpd provider, then this allows the driver for the
> thermal sensor to register for power-on/off notifications of the genpd
> (see dev_pm_genpd_add_notifier()).
> 
> Can this work?

Yes indeed it sounds like what should be achieved. Assuming it is not 
modeled with genpd how would you describe those in order to have the 
sensor belonging to one specific power domain?
Frank Li Aug. 17, 2023, 3:30 p.m. UTC | #10
On Wed, Aug 16, 2023 at 11:23:17PM +0200, Ulf Hansson wrote:
> On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> >
> > On 16/08/2023 19:07, Frank Li wrote:
> > > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> > >> On 16/08/2023 18:28, Frank Li wrote:
> > >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> > >>>>
> > >>>> Hi Frank,
> > >>>>
> > >>>> sorry for the delay
> > >>>>
> > >>>> On 14/07/2023 19:19, Frank Li wrote:
> > >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> > >>>>>>> Avoid endless print following message when SCFW turns off resource.
> > >>>>>>>      [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > >>>>>>>
> > >>>>>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > >>>>>>> ---
> > >>>>>>>      drivers/thermal/imx_sc_thermal.c | 4 +++-
> > >>>>>>>      1 file changed, 3 insertions(+), 1 deletion(-)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> > >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> > >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > >>>>>>>         hdr->size = 2;
> > >>>>>>>         ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > >>>>>>> -       if (ret)
> > >>>>>>> +       if (ret == -EPERM) /* NO POWER */
> > >>>>>>> +               return -EAGAIN;
> > >>>>>>
> > >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> > >>>>>> thermal zone can be disabled?
> > >>>>>
> > >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > >>>>> am not sure if it is good to depend on "name", which add coupling between
> > >>>>> two drivers and if there are external thermal devices(such as) has the
> > >>>>> same name, it will wrong turn off.
> > >>>>
> > >>>> Correct
> > >>>>
> > >>>>> If add power domain notification in thermal driver, I am not how to get
> > >>>>> other devices's pd in thermal driver.
> > >>>>>
> > >>>>> Any example I can refer?
> > >>>>>
> > >>>>> Or this is simple enough solution.
> > >>>>
> > >>>> The solution works for removing the error message but it does not solve the
> > >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> > >>>> is down.
> > >>>>
> > >>>> So the question is why the sensor is shut down if it is in use?
> > >>>
> > >>> Do you know if there are any code I reference? I supposed it is quite common.
> > >>
> > >> Sorry, I don't get your comment
> > >>
> > >> What I meant is why is the sensor turned off if it is in use ?
> > >
> > > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > > If CPU hotplug off,  CPU power domain will be turn off.
> > >
> > > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > > It doesn't make sensor to keep CPU power on just because want to get sensor
> > > data.
> > >
> > > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > > work.  GPU1 may turn off when less loading.
> > >
> > > Ideally, thermal can get notification from power domain driver.
> > > when such power domain turn off,  disable thermal zone.
> > >
> > > So far, I have not idea how to do that.
> >
> > Ulf,
> >
> > do you have a guidance to link the thermal zone and the power domain in
> > order to get a poweron/off notification leading to enable/disable the
> > thermal zone ?
> 
> I don't know the details here, so apologize for my ignorance to start
> with. What platform is this?

i.MX8QM.

> 
> A vague idea could be to hook up the thermal sensor to the
> corresponding CPU power domain. Assuming the CPU power domain is
> modelled as a genpd provider, then this allows the driver for the
> thermal sensor to register for power-on/off notifications of the genpd
> (see dev_pm_genpd_add_notifier()).
> 
> Can this work?

I don't think. dev_pm_genpd_ad_notifier() need a dev, which binded to pd.

tsens: thermal-sensor {
	compatible = "fsl,imx-sc-thermal";
        tsens-num = <6>;
        #thermal-sensor-cells = <1>;
};

we have 6 thermal-sensor, which assocated with 6 pd, 
	IMX_SC_R_SYSTEM, IMX_SC_R_PMIC_0,
        IMX_SC_R_AP_0, IMX_SC_R_AP_1,
        IMX_SC_R_GPU_0_PID0, IMX_SC_R_GPU_1_PID0,
        IMX_SC_R_DRC_0

We don't want to hold PD on just because want to get temperature. GPU pd
consume much power.

I want to register one callback at thermal-sensor driver, when GPU pd on,
enable thermal-zone. when GPU pd off, disable thermal zone.

we can do more common way. 

	gpu-thermal1 {
                        polling-delay-passive = <250>;
                        polling-delay = <2000>;
	>>>		pd=<&GPU1_PD>
                        thermal-sensors = <&tsens IMX_SC_R_GPU_1_PID0>;

                };

if GPU1_PD on, then gpu-thermal1 enable,
if GPU1_PD off, then gpu-thermal1 disable.

> 
> Kind regards
> Uffe
Ulf Hansson Aug. 17, 2023, 9:40 p.m. UTC | #11
On Thu, 17 Aug 2023 at 17:31, Frank Li <Frank.li@nxp.com> wrote:
>
> On Wed, Aug 16, 2023 at 11:23:17PM +0200, Ulf Hansson wrote:
> > On Wed, 16 Aug 2023 at 22:46, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> > >
> > > On 16/08/2023 19:07, Frank Li wrote:
> > > > On Wed, Aug 16, 2023 at 06:47:17PM +0200, Daniel Lezcano wrote:
> > > >> On 16/08/2023 18:28, Frank Li wrote:
> > > >>> On Wed, Aug 16, 2023 at 10:44:32AM +0200, Daniel Lezcano wrote:
> > > >>>>
> > > >>>> Hi Frank,
> > > >>>>
> > > >>>> sorry for the delay
> > > >>>>
> > > >>>> On 14/07/2023 19:19, Frank Li wrote:
> > > >>>>> On Thu, Jul 13, 2023 at 02:49:54PM +0200, Daniel Lezcano wrote:
> > > >>>>>> On 12/07/2023 23:05, Frank Li wrote:
> > > >>>>>>> Avoid endless print following message when SCFW turns off resource.
> > > >>>>>>>      [ 1818.342337] thermal thermal_zone0: failed to read out thermal zone (-1)
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > >>>>>>> ---
> > > >>>>>>>      drivers/thermal/imx_sc_thermal.c | 4 +++-
> > > >>>>>>>      1 file changed, 3 insertions(+), 1 deletion(-)
> > > >>>>>>>
> > > >>>>>>> diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> index 8d6b4ef23746..0533d58f199f 100644
> > > >>>>>>> --- a/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> +++ b/drivers/thermal/imx_sc_thermal.c
> > > >>>>>>> @@ -58,7 +58,9 @@ static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
> > > >>>>>>>         hdr->size = 2;
> > > >>>>>>>         ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
> > > >>>>>>> -       if (ret)
> > > >>>>>>> +       if (ret == -EPERM) /* NO POWER */
> > > >>>>>>> +               return -EAGAIN;
> > > >>>>>>
> > > >>>>>> Isn't there a chain call somewhere when the resource is turned off, so the
> > > >>>>>> thermal zone can be disabled?
> > > >>>>>
> > > >>>>> A possible place in drivers/firmware/imx/scu-pd.c. but I am not sure how to
> > > >>>>> get thermal devices. I just found a API thermal_zone_get_zone_by_name(). I
> > > >>>>> am not sure if it is good to depend on "name", which add coupling between
> > > >>>>> two drivers and if there are external thermal devices(such as) has the
> > > >>>>> same name, it will wrong turn off.
> > > >>>>
> > > >>>> Correct
> > > >>>>
> > > >>>>> If add power domain notification in thermal driver, I am not how to get
> > > >>>>> other devices's pd in thermal driver.
> > > >>>>>
> > > >>>>> Any example I can refer?
> > > >>>>>
> > > >>>>> Or this is simple enough solution.
> > > >>>>
> > > >>>> The solution works for removing the error message but it does not solve the
> > > >>>> root cause of the issue. The thermal zone keeps monitoring while the sensor
> > > >>>> is down.
> > > >>>>
> > > >>>> So the question is why the sensor is shut down if it is in use?
> > > >>>
> > > >>> Do you know if there are any code I reference? I supposed it is quite common.
> > > >>
> > > >> Sorry, I don't get your comment
> > > >>
> > > >> What I meant is why is the sensor turned off if it is in use ?
> > > >
> > > > One typical example is cpu hotplug. The sensor is located CPU power domain.
> > > > If CPU hotplug off,  CPU power domain will be turn off.
> > > >
> > > > It doesn't make sensor keep monitor such sensor when CPU already power off.
> > > > It doesn't make sensor to keep CPU power on just because want to get sensor
> > > > data.
> > > >
> > > > Anthor example is GPU, if there are GPU0 and GPU1. Most case just GPU0
> > > > work.  GPU1 may turn off when less loading.
> > > >
> > > > Ideally, thermal can get notification from power domain driver.
> > > > when such power domain turn off,  disable thermal zone.
> > > >
> > > > So far, I have not idea how to do that.
> > >
> > > Ulf,
> > >
> > > do you have a guidance to link the thermal zone and the power domain in
> > > order to get a poweron/off notification leading to enable/disable the
> > > thermal zone ?
> >
> > I don't know the details here, so apologize for my ignorance to start
> > with. What platform is this?
>
> i.MX8QM.

Thanks!

>
> >
> > A vague idea could be to hook up the thermal sensor to the
> > corresponding CPU power domain. Assuming the CPU power domain is
> > modelled as a genpd provider, then this allows the driver for the
> > thermal sensor to register for power-on/off notifications of the genpd
> > (see dev_pm_genpd_add_notifier()).
> >
> > Can this work?
>
> I don't think. dev_pm_genpd_ad_notifier() need a dev, which binded to pd.

Yes, correct.

>
> tsens: thermal-sensor {
>         compatible = "fsl,imx-sc-thermal";
>         tsens-num = <6>;
>         #thermal-sensor-cells = <1>;
> };

Are you saying that the above doesn't have a corresponding struct
device created for it? That sounds like a problem that can be fixed,
right? Not sure if it makes sense though.

>
> we have 6 thermal-sensor, which assocated with 6 pd,
>         IMX_SC_R_SYSTEM, IMX_SC_R_PMIC_0,
>         IMX_SC_R_AP_0, IMX_SC_R_AP_1,
>         IMX_SC_R_GPU_0_PID0, IMX_SC_R_GPU_1_PID0,
>         IMX_SC_R_DRC_0
>
> We don't want to hold PD on just because want to get temperature. GPU pd
> consume much power.

Of course, that would be a bad idea it seems like.

The corresponding struct device that's hooked up to a genpd, can
remain runtime suspended as long as you think it makes sense. Thus it
would not keep the PM domain powered on when it isn't needed.

>
> I want to register one callback at thermal-sensor driver, when GPU pd on,
> enable thermal-zone. when GPU pd off, disable thermal zone.

Right, that should work fine too, I think. It seems like this is just
a matter of modelling this correctly in DT, I have no strong opinion
in this regard.

>
> we can do more common way.
>
>         gpu-thermal1 {
>                         polling-delay-passive = <250>;
>                         polling-delay = <2000>;
>         >>>             pd=<&GPU1_PD>
>                         thermal-sensors = <&tsens IMX_SC_R_GPU_1_PID0>;
>
>                 };
>
> if GPU1_PD on, then gpu-thermal1 enable,
> if GPU1_PD off, then gpu-thermal1 disable.
>

Sounds like it's worth a try! Please keep me posted.

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
index 8d6b4ef23746..0533d58f199f 100644
--- a/drivers/thermal/imx_sc_thermal.c
+++ b/drivers/thermal/imx_sc_thermal.c
@@ -58,7 +58,9 @@  static int imx_sc_thermal_get_temp(struct thermal_zone_device *tz, int *temp)
 	hdr->size = 2;
 
 	ret = imx_scu_call_rpc(thermal_ipc_handle, &msg, true);
-	if (ret)
+	if (ret == -EPERM) /* NO POWER */
+		return -EAGAIN;
+	else if (ret)
 		return ret;
 
 	*temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;