diff mbox

[2/3] thermal: core: fix: Initialize the max_state variable to 0

Message ID 1411547232-21493-3-git-send-email-l.majewski@samsung.com (mailing list archive)
State Superseded, archived
Delegated to: Zhang Rui
Headers show

Commit Message

Lukasz Majewski Sept. 24, 2014, 8:27 a.m. UTC
Pointer to the uninitialized max_state variable is passed to get the
maximal cooling state.
For CPU cooling device (cpu_cooling.c) the cpufreq_get_max_state() is
called, which even when error occurs will return the max_state variable
unchanged.
Since error for ->get_max_state() is not checked, the automatically
allocated value of max_state is used for (upper > max_state) comparison.
For any possible max_state value it is very unlikely that it will be less
than upper.
As a consequence, the cooling device is bind even without the backed
cpufreq table initialized.

This initialization will prevent from accidental binding trip points to
cpu freq cooling frequencies when cpufreq driver itself is not yet fully
initialized.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
---
 drivers/thermal/thermal_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eduardo Valentin Oct. 2, 2014, 10:26 p.m. UTC | #1
Hello Lukasz,

On Wed, Sep 24, 2014 at 10:27:11AM +0200, Lukasz Majewski wrote:
> Pointer to the uninitialized max_state variable is passed to get the
> maximal cooling state.
> For CPU cooling device (cpu_cooling.c) the cpufreq_get_max_state() is
> called, which even when error occurs will return the max_state variable
> unchanged.
> Since error for ->get_max_state() is not checked, the automatically


Good that you added a fix in your series for this.


> allocated value of max_state is used for (upper > max_state) comparison.
> For any possible max_state value it is very unlikely that it will be less
> than upper.
> As a consequence, the cooling device is bind even without the backed
> cpufreq table initialized.
> 
> This initialization will prevent from accidental binding trip points to
> cpu freq cooling frequencies when cpufreq driver itself is not yet fully
> initialized.

Although I agree with the fix, as long as we also include a check for
the .get_max_state return value,  I believe the problem you are
describing is about initialization sequence.

In general, I believe we need a better
sequencing between thermal and cpufreq subsystems. One way out is to
include a check for cpufreq driver in the thermal driver, and return
-EPROBE_DEFER  when cpufreq is not ready.

> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> ---
>  drivers/thermal/thermal_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 454884a..747618a 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -927,7 +927,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
>  	struct thermal_instance *pos;
>  	struct thermal_zone_device *pos1;
>  	struct thermal_cooling_device *pos2;
> -	unsigned long max_state;
> +	unsigned long max_state = 0;
>  	int result;
>  
>  	if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
> -- 
> 2.0.0.rc2
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukasz Majewski Oct. 3, 2014, 8:26 a.m. UTC | #2
Hi Eduardo,

> Hello Lukasz,
> 
> On Wed, Sep 24, 2014 at 10:27:11AM +0200, Lukasz Majewski wrote:
> > Pointer to the uninitialized max_state variable is passed to get the
> > maximal cooling state.
> > For CPU cooling device (cpu_cooling.c) the cpufreq_get_max_state()
> > is called, which even when error occurs will return the max_state
> > variable unchanged.
> > Since error for ->get_max_state() is not checked, the automatically
> 
> 
> Good that you added a fix in your series for this.
> 
> 
> > allocated value of max_state is used for (upper > max_state)
> > comparison. For any possible max_state value it is very unlikely
> > that it will be less than upper.
> > As a consequence, the cooling device is bind even without the backed
> > cpufreq table initialized.
> > 
> > This initialization will prevent from accidental binding trip
> > points to cpu freq cooling frequencies when cpufreq driver itself
> > is not yet fully initialized.
> 
> Although I agree with the fix, as long as we also include a check for
> the .get_max_state return value,  I believe the problem you are
> describing is about initialization sequence.

As you pointed out - the problem here is with initialization sequence.
Thermal and cpufreq cores are initialized very early.

However, the get_max_state() for cpu_cooling.c device (as it is the
pervasive way of cooling things) accesses cpufreq policy to get the
freq_table and count available states.

The issue here is with late initialization of cpufreq policy.

Up till now the cpu_cooling device was bind even when the
get_max_state() returned -EINVAL and everything worked after late
cpufreq policy initialization. However, during this time window the
thermal driver is not configured.

> 
> In general, I believe we need a better
> sequencing between thermal and cpufreq subsystems. One way out is to
> include a check for cpufreq driver in the thermal driver, and return
> -EPROBE_DEFER  when cpufreq is not ready.

I think that we could return -EPROBE_DEFER when cpufreq's policy is not
yet available and subscribe to cpufreq notifier to call
bind_cooling_device then.

Let's wait for Zhang opinion since he looks after the thermal core code.

> 
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > ---
> >  drivers/thermal/thermal_core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/thermal_core.c
> > b/drivers/thermal/thermal_core.c index 454884a..747618a 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -927,7 +927,7 @@ int thermal_zone_bind_cooling_device(struct
> > thermal_zone_device *tz, struct thermal_instance *pos;
> >  	struct thermal_zone_device *pos1;
> >  	struct thermal_cooling_device *pos2;
> > -	unsigned long max_state;
> > +	unsigned long max_state = 0;
> >  	int result;
> >  
> >  	if (trip >= tz->trips || (trip < 0 && trip !=
> > THERMAL_TRIPS_NONE)) -- 
> > 2.0.0.rc2
> > 
>
Eduardo Valentin Oct. 6, 2014, 6:01 p.m. UTC | #3
On Fri, Oct 03, 2014 at 10:26:28AM +0200, Lukasz Majewski wrote:
> Hi Eduardo,
> 
> > Hello Lukasz,
> > 
> > On Wed, Sep 24, 2014 at 10:27:11AM +0200, Lukasz Majewski wrote:
> > > Pointer to the uninitialized max_state variable is passed to get the
> > > maximal cooling state.
> > > For CPU cooling device (cpu_cooling.c) the cpufreq_get_max_state()
> > > is called, which even when error occurs will return the max_state
> > > variable unchanged.
> > > Since error for ->get_max_state() is not checked, the automatically
> > 
> > 
> > Good that you added a fix in your series for this.
> > 
> > 
> > > allocated value of max_state is used for (upper > max_state)
> > > comparison. For any possible max_state value it is very unlikely
> > > that it will be less than upper.
> > > As a consequence, the cooling device is bind even without the backed
> > > cpufreq table initialized.
> > > 
> > > This initialization will prevent from accidental binding trip
> > > points to cpu freq cooling frequencies when cpufreq driver itself
> > > is not yet fully initialized.
> > 
> > Although I agree with the fix, as long as we also include a check for
> > the .get_max_state return value,  I believe the problem you are
> > describing is about initialization sequence.
> 
> As you pointed out - the problem here is with initialization sequence.
> Thermal and cpufreq cores are initialized very early.
> 
> However, the get_max_state() for cpu_cooling.c device (as it is the
> pervasive way of cooling things) accesses cpufreq policy to get the
> freq_table and count available states.
> 
> The issue here is with late initialization of cpufreq policy.
> 
> Up till now the cpu_cooling device was bind even when the
> get_max_state() returned -EINVAL and everything worked after late
> cpufreq policy initialization. However, during this time window the
> thermal driver is not configured.
> 
> > 
> > In general, I believe we need a better
> > sequencing between thermal and cpufreq subsystems. One way out is to
> > include a check for cpufreq driver in the thermal driver, and return
> > -EPROBE_DEFER  when cpufreq is not ready.
> 
> I think that we could return -EPROBE_DEFER when cpufreq's policy is not
> yet available and subscribe to cpufreq notifier to call
> bind_cooling_device then.

I agree with this approach, as long as we keep the existing users of
cpufreq cooling in a working state.

Cheers

> 
> Let's wait for Zhang opinion since he looks after the thermal core code.
> 
> > 
> > > 
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > ---
> > >  drivers/thermal/thermal_core.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/thermal/thermal_core.c
> > > b/drivers/thermal/thermal_core.c index 454884a..747618a 100644
> > > --- a/drivers/thermal/thermal_core.c
> > > +++ b/drivers/thermal/thermal_core.c
> > > @@ -927,7 +927,7 @@ int thermal_zone_bind_cooling_device(struct
> > > thermal_zone_device *tz, struct thermal_instance *pos;
> > >  	struct thermal_zone_device *pos1;
> > >  	struct thermal_cooling_device *pos2;
> > > -	unsigned long max_state;
> > > +	unsigned long max_state = 0;
> > >  	int result;
> > >  
> > >  	if (trip >= tz->trips || (trip < 0 && trip !=
> > > THERMAL_TRIPS_NONE)) -- 
> > > 2.0.0.rc2
> > > 
> > 
> 
> 
> 
> -- 
> Best regards,
> 
> Lukasz Majewski
> 
> Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 454884a..747618a 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -927,7 +927,7 @@  int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
 	struct thermal_instance *pos;
 	struct thermal_zone_device *pos1;
 	struct thermal_cooling_device *pos2;
-	unsigned long max_state;
+	unsigned long max_state = 0;
 	int result;
 
 	if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))