Message ID | 1479513177-81504-1-git-send-email-briannorris@chromium.org (mailing list archive) |
---|---|
State | Deferred, archived |
Delegated to: | Zhang Rui |
Headers | show |
Brian, 在 2016年11月19日 07:52, Brian Norris 写道: > If using CONFIG_THERMAL_EMULATION, there's a corner case where we might > get an error from the zone's get_temp() callback, but we'll ignore that > and keep using its value. Let's just error out properly instead. > > Signed-off-by: Brian Norris <briannorris@chromium.org> Tested-by: Caesar Wang <wxt@rock-chips.com> [ 8.111296] thermal thermal_zone4: power_allocator: sustainable_power will be estimated [ 8.119420] thermal_zone_get_temp:537 the ret=-19, no such device, look like the A/D value had no ready yet. .. Anyway, this patch is useful for improving thermal. -Caesar > --- > drivers/thermal/thermal_core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 911fd964c742..0fa497f10d25 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > mutex_lock(&tz->lock); > > ret = tz->ops->get_temp(tz, temp); > + if (ret) > + goto exit_unlock; > > if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) { > for (count = 0; count < tz->trips; count++) { > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > *temp = tz->emul_temperature; > } > > +exit_unlock: > mutex_unlock(&tz->lock); > exit: > return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 18, 2016 at 03:52:55PM -0800, Brian Norris wrote: > If using CONFIG_THERMAL_EMULATION, there's a corner case where we might > get an error from the zone's get_temp() callback, but we'll ignore that > and keep using its value. Let's just error out properly instead. > > Signed-off-by: Brian Norris <briannorris@chromium.org> > --- > drivers/thermal/thermal_core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 911fd964c742..0fa497f10d25 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > mutex_lock(&tz->lock); > > ret = tz->ops->get_temp(tz, temp); > + if (ret) > + goto exit_unlock; Yeah, but the follow through is intentional, if I am not mistaken. > > if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) { Even if the driver is not able to read real temperature, but emul temp is configured, then there is still opportunity to report the emulated temperature. > for (count = 0; count < tz->trips; count++) { > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > *temp = tz->emul_temperature; And if you check the lines at the bottom of the loop, you will see that, in the fail case, we will stil compare to what is the content of temp, which might be problematic. I would prefer we consider the patch I sent some time ago: https://patchwork.kernel.org/patch/7876381/ > } > > +exit_unlock: > mutex_unlock(&tz->lock); > exit: > return ret; > -- > 2.8.0.rc3.226.g39d4020 > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: > On Fri, Nov 18, 2016 at 03:52:55PM -0800, Brian Norris wrote: > > If using CONFIG_THERMAL_EMULATION, there's a corner case where we might > > get an error from the zone's get_temp() callback, but we'll ignore that > > and keep using its value. Let's just error out properly instead. > > > > Signed-off-by: Brian Norris <briannorris@chromium.org> > > --- > > drivers/thermal/thermal_core.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > > index 911fd964c742..0fa497f10d25 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > > mutex_lock(&tz->lock); > > > > ret = tz->ops->get_temp(tz, temp); > > + if (ret) > > + goto exit_unlock; > > Yeah, but the follow through is intentional, if I am not mistaken. OK...but it has a bug. It potentially utilizes an uninitialized value for *temp. > > > > if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) { > > Even if the driver is not able to read real temperature, but emul temp > is configured, then there is still opportunity to report the emulated > temperature. OK, maybe, but you should avoid doing this comparison then: 513 if (!ret && *temp < crit_temp) 514 *temp = tz->emul_temperature; Note that 'ret' might be 0 (from the calls to ->get_trip_type()), and then you're comparing with the uninitialized value of *temp. So you need some solution that accounts for this and decides to ignore the real temperature properly. > > for (count = 0; count < tz->trips; count++) { > > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > > *temp = tz->emul_temperature; > > And if you check the lines at the bottom of the loop, you will see that, > in the fail case, we will stil compare to what is the content of temp, > which might be problematic. Yes...are you saying the same thing I am above? > I would prefer we consider the patch I sent > some time ago: > https://patchwork.kernel.org/patch/7876381/ Honestly I didn't look that deeply into the framework here (and I also don't use CONFIG_THERMAL_EMULATION), I was just fixing something that was obviously wrong. But on first read, that patch looks good to me -- although it'd be good to note the uninitialized value fix in the comit log. Any reason that didn't end up getting merged? It looks like it got reviewed, and you're a thermal subsystem maintainer... Brian -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Brian, On Fri, 2016-11-18 at 21:30 -0800, Brian Norris wrote: > Hi, > > On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: > > > > On Fri, Nov 18, 2016 at 03:52:55PM -0800, Brian Norris wrote: > > > > > > If using CONFIG_THERMAL_EMULATION, there's a corner case where we > > > might > > > get an error from the zone's get_temp() callback, but we'll > > > ignore that > > > and keep using its value. Let's just error out properly instead. > > > > > > Signed-off-by: Brian Norris <briannorris@chromium.org> > > > --- > > > drivers/thermal/thermal_core.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > b/drivers/thermal/thermal_core.c > > > index 911fd964c742..0fa497f10d25 100644 > > > --- a/drivers/thermal/thermal_core.c > > > +++ b/drivers/thermal/thermal_core.c > > > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct > > > thermal_zone_device *tz, int *temp) > > > mutex_lock(&tz->lock); > > > > > > ret = tz->ops->get_temp(tz, temp); > > > + if (ret) > > > + goto exit_unlock; > > Yeah, but the follow through is intentional, if I am not mistaken. > OK...but it has a bug. It potentially utilizes an uninitialized value > for *temp. > Agreed. > > > > > > > > > > > if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz- > > > >emul_temperature) { > > Even if the driver is not able to read real temperature, but emul > > temp > > is configured, then there is still opportunity to report the > > emulated > > temperature. > OK, maybe, but you should avoid doing this comparison then: > > 513 if (!ret && *temp < crit_temp) > 514 *temp = tz->emul_temperature; > > Note that 'ret' might be 0 (from the calls to ->get_trip_type()), and > then > you're comparing with the uninitialized value of *temp. So you need > some > solution that accounts for this and decides to ignore the real > temperature properly. > right. > > > > > > > > for (count = 0; count < tz->trips; count++) { > > > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct > > > thermal_zone_device *tz, int *temp) > > > *temp = tz->emul_temperature; > > And if you check the lines at the bottom of the loop, you will see > > that, > > in the fail case, we will stil compare to what is the content of > > temp, > > which might be problematic. > Yes...are you saying the same thing I am above? > > > > > I would prefer we consider the patch I sent > > some time ago: > > https://patchwork.kernel.org/patch/7876381/ > Honestly I didn't look that deeply into the framework here (and I > also > don't use CONFIG_THERMAL_EMULATION), I was just fixing something that > was obviously wrong. > > But on first read, that patch looks good to me -- although it'd be > good > to note the uninitialized value fix in the comit log. Any reason that > didn't end up getting merged? It looks like it got reviewed, and > you're > a thermal subsystem maintainer... > hmmm, I forgot why I missed this one in the end. Eduardo, would you mind refresh and resend the patch? thanks, rui > Brian > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 22, 2016 at 03:52:25PM +0800, Zhang Rui wrote: > Hi, Brian, > > On Fri, 2016-11-18 at 21:30 -0800, Brian Norris wrote: > > Hi, > > > > On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: > > > > > > On Fri, Nov 18, 2016 at 03:52:55PM -0800, Brian Norris wrote: > > > > > > > > If using CONFIG_THERMAL_EMULATION, there's a corner case where we > > > > might > > > > get an error from the zone's get_temp() callback, but we'll > > > > ignore that > > > > and keep using its value. Let's just error out properly instead. > > > > > > > > Signed-off-by: Brian Norris <briannorris@chromium.org> > > > > --- > > > > drivers/thermal/thermal_core.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > > b/drivers/thermal/thermal_core.c > > > > index 911fd964c742..0fa497f10d25 100644 > > > > --- a/drivers/thermal/thermal_core.c > > > > +++ b/drivers/thermal/thermal_core.c > > > > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct > > > > thermal_zone_device *tz, int *temp) > > > > mutex_lock(&tz->lock); > > > > > > > > ret = tz->ops->get_temp(tz, temp); > > > > + if (ret) > > > > + goto exit_unlock; > > > Yeah, but the follow through is intentional, if I am not mistaken. > > OK...but it has a bug. It potentially utilizes an uninitialized value > > for *temp. > > > Agreed. I also agree that this section of current get_temp is buggy. That is why I sent the patch some time ago. > > > > > > > > > > > > > > > if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz- > > > > >emul_temperature) { > > > Even if the driver is not able to read real temperature, but emul > > > temp > > > is configured, then there is still opportunity to report the > > > emulated > > > temperature. > > OK, maybe, but you should avoid doing this comparison then: > > > > 513 if (!ret && *temp < crit_temp) > > 514 *temp = tz->emul_temperature; > > > > Note that 'ret' might be 0 (from the calls to ->get_trip_type()), and > > then > > you're comparing with the uninitialized value of *temp. So you need > > some > > solution that accounts for this and decides to ignore the real > > temperature properly. > > > right. > > > > > > > > > > > for (count = 0; count < tz->trips; count++) { > > > > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct > > > > thermal_zone_device *tz, int *temp) > > > > *temp = tz->emul_temperature; > > > And if you check the lines at the bottom of the loop, you will see > > > that, > > > in the fail case, we will stil compare to what is the content of > > > temp, > > > which might be problematic. > > Yes...are you saying the same thing I am above? Yes, Brian, we are concerned about the same bug. > > > > > > > > I would prefer we consider the patch I sent > > > some time ago: > > > https://patchwork.kernel.org/patch/7876381/ > > Honestly I didn't look that deeply into the framework here (and I > > also > > don't use CONFIG_THERMAL_EMULATION), I was just fixing something that > > was obviously wrong. Yeah, but that is why we need people to look the code considering all features. :-) > > > > But on first read, that patch looks good to me -- although it'd be > > good > > to note the uninitialized value fix in the comit log. Any reason that > > didn't end up getting merged? It looks like it got reviewed, and > > you're > > a thermal subsystem maintainer... > > I do not remember why Rui postponed it. A note of clarification, for things that touch thermal core, I agree with Rui that they go through his tree. Besides, I tend to avoid acking and sending my own patches without proper review, which was not the case of that patch, that was just postponed and fell into the cracks somehow. > hmmm, I forgot why I missed this one in the end. > Eduardo, > would you mind refresh and resend the patch? Yeah sure. I have at least three extra patch sets on thermal core on my queue. But I would like to get first the thermal sysfs reorg in first. This fix is one of the changes that will go on top of the thermal sysfs reorg. BR, Eduardo -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 22, 2016 at 03:00:47AM -0800, Eduardo Valentin wrote: > On Tue, Nov 22, 2016 at 03:52:25PM +0800, Zhang Rui wrote: > > On Fri, 2016-11-18 at 21:30 -0800, Brian Norris wrote: > > > On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: > > > > I would prefer we consider the patch I sent > > > > some time ago: > > > > https://patchwork.kernel.org/patch/7876381/ > > > Honestly I didn't look that deeply into the framework here (and I > > > also > > > don't use CONFIG_THERMAL_EMULATION), I was just fixing something that > > > was obviously wrong. > > Yeah, but that is why we need people to look the code considering all > features. :-) Well, there are bugfixes and there are features. My patch fixed the bug in the simplest way possible; it didn't break CONFIG_THERMAL_EMULATION any further than it already was, and it'll still work if get_temp() doesn't return an error. I'd say your patch is essentially adding a feature, and IMO that's not the best way to fix a bug. You can fix the bug and *then* add the feature. Anyway, I'm not going to tell you how to run your subsystem. If your patch goes through, that's probably just as well. [...] > > hmmm, I forgot why I missed this one in the end. > > Eduardo, > > would you mind refresh and resend the patch? > > Yeah sure. I have at least three extra patch sets on thermal core on > my queue. But I would like to get first the thermal sysfs reorg in > first. This fix is one of the changes that will go on top of the thermal > sysfs reorg. So, the bugfix depends on feature work? I guess I'll check back in another year to see what the status of the bugfix is :) Brian -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 22, 2016 at 2:27 PM, Brian Norris <briannorris@chromium.org> wrote: > On Tue, Nov 22, 2016 at 03:00:47AM -0800, Eduardo Valentin wrote: >> On Tue, Nov 22, 2016 at 03:52:25PM +0800, Zhang Rui wrote: >> > On Fri, 2016-11-18 at 21:30 -0800, Brian Norris wrote: >> > > On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: >> > > > I would prefer we consider the patch I sent >> > > > some time ago: >> > > > https://patchwork.kernel.org/patch/7876381/ >> > > Honestly I didn't look that deeply into the framework here (and I >> > > also >> > > don't use CONFIG_THERMAL_EMULATION), I was just fixing something that >> > > was obviously wrong. >> >> Yeah, but that is why we need people to look the code considering all >> features. :-) > > Well, there are bugfixes and there are features. My patch fixed the bug > in the simplest way possible; it didn't break CONFIG_THERMAL_EMULATION > any further than it already was, and it'll still work if get_temp() > doesn't return an error. > > I'd say your patch is essentially adding a feature, and IMO that's not > the best way to fix a bug. You can fix the bug and *then* add the > feature. > > Anyway, I'm not going to tell you how to run your subsystem. If your > patch goes through, that's probably just as well. > > [...] > >> > hmmm, I forgot why I missed this one in the end. >> > Eduardo, >> > would you mind refresh and resend the patch? >> >> Yeah sure. I have at least three extra patch sets on thermal core on >> my queue. But I would like to get first the thermal sysfs reorg in >> first. This fix is one of the changes that will go on top of the thermal >> sysfs reorg. > > So, the bugfix depends on feature work? I guess I'll check back in > another year to see what the status of the bugfix is :) Not quite a year, but the status is still the same ;) By the way, I do not quite understand why we want to mess with emulated temperature when hardware reports errors. I'd say when get_temp() fails we need to let upper layers know right away. Only when we read temperature successfully and we are sure that the temperature is not above critical level we should allow reporting emulated value. Can we please apply the patch? Thanks.
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 911fd964c742..0fa497f10d25 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) mutex_lock(&tz->lock); ret = tz->ops->get_temp(tz, temp); + if (ret) + goto exit_unlock; if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) { for (count = 0; count < tz->trips; count++) { @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) *temp = tz->emul_temperature; } +exit_unlock: mutex_unlock(&tz->lock); exit: return ret;
If using CONFIG_THERMAL_EMULATION, there's a corner case where we might get an error from the zone's get_temp() callback, but we'll ignore that and keep using its value. Let's just error out properly instead. Signed-off-by: Brian Norris <briannorris@chromium.org> --- drivers/thermal/thermal_core.c | 3 +++ 1 file changed, 3 insertions(+)