Message ID | 20170210161547.28625-1-lukasz.luba@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Zhang Rui |
Headers | show |
On Fri, 2017-02-10 at 16:15 +0000, Lukasz Luba wrote: > There is possible scenario when the system hits deadlock. > One of the examples is when IPA (power_allocator.c) calculates > power budget and calls every cooling device in the thermal zone > using function get_requested_power(). > This function in devfreq_cooling calls driver's code via > get_static_power(). > If the driver code tries to calculate properly static power > based on current temperature, it might call API functions: > tz = thermal_zone_get_zone_by_name(priv->tz_name) > and then: > thermal_zone_get_temp(tz, &temp) > > Here is the call graph: > > tz->gov->throttle() [=power_allocator_throttle] > power_allocator_throttle > allocate_power > ==> (lock tz->lock taken) > (for each cooling device:) > cdev->ops->get_requested_power > [=devfreq_cooling_get_requested_power] > get_static_power > (call driver specific function) > dfc->power_ops->get_static_power > [=mock_devfreq_get_static_power] > tz = thermal_zone_get_zone_by_name() > thermal_zone_get_temp(tz, &temp) > ===> (try to lock again tz->lock) > I didn't find mock_devfreq_get_static_power, thus I don't understand why thermal_zone_get_zone_by_name() and thermal_zone_get_temp() needs to be called in .get_static_power() callback. > This patch changes the mutex_lock into mutex_trylock > just to avoid the deadlock. > Well, I don't think this is the right solution. We should either avoid poking thermal lock in .get_static_power() callback, in the thermal driver, or avoid invoking the callback with lock hold, in power_allocator code. thanks, rui > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > Hi, > > This is a RFC to start some discussion. > I am not sure if it should go with stable label > (it affects few versions back though). > It is based on v4.10-rc7. > > Regards, > Lukasz Luba > > drivers/thermal/thermal_helpers.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/thermal/thermal_helpers.c > b/drivers/thermal/thermal_helpers.c > index 8cdf75a..7b01a85 100644 > --- a/drivers/thermal/thermal_helpers.c > +++ b/drivers/thermal/thermal_helpers.c > @@ -75,6 +75,12 @@ EXPORT_SYMBOL(get_thermal_instance); > * When a valid thermal zone reference is passed, it will fetch its > * temperature and fill @temp. > * > + * IMPORTANT NOTICE: > + * This function should not be used from driver's code which is > called > + * by IPA (power_allocator.c). The IPA already holds the tz->lock. > + * Therefore, it is possible to get the temperature in driver code > + * in a simple way: reading tz->temperature. > + * > * Return: On success returns 0, an error code otherwise > */ > int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) > @@ -87,7 +93,8 @@ int thermal_zone_get_temp(struct > thermal_zone_device *tz, int *temp) > if (!tz || IS_ERR(tz) || !tz->ops->get_temp) > goto exit; > > - mutex_lock(&tz->lock); > + if (!mutex_trylock(&tz->lock)) > + goto exit; > > ret = tz->ops->get_temp(tz, temp); >
>> >> tz->gov->throttle() [=power_allocator_throttle] >> power_allocator_throttle >> allocate_power >> ==> (lock tz->lock taken) >> (for each cooling device:) >> cdev->ops->get_requested_power >> [=devfreq_cooling_get_requested_power] >> get_static_power >> (call driver specific function) >> dfc->power_ops->get_static_power >> [=mock_devfreq_get_static_power] >> tz = thermal_zone_get_zone_by_name() >> thermal_zone_get_temp(tz, &temp) >> ===> (try to lock again tz->lock) >> > I didn't find mock_devfreq_get_static_power, thus I don't understand > why thermal_zone_get_zone_by_name() and thermal_zone_get_temp() needs > to be called in .get_static_power() callback. The mock_devfreq_get_static_power is just the mock function which mimics a driver function registered to get_static_power. Inside this function you can ask for thermal zone pointer to check the temperature in that thermal_zone. Based on that temperature you can scale your static power value before returning it to the framework. So in this case the driver code should not scale the static power based on temperature (for now). > >> This patch changes the mutex_lock into mutex_trylock >> just to avoid the deadlock. >> > Well, I don't think this is the right solution. > We should either avoid poking thermal lock in .get_static_power() > callback, in the thermal driver, or avoid invoking the callback with > lock hold, in power_allocator code. I agree that this is not the solution. It just highlights the problem. I will try to allocate some time for this issue. Regards, Lukasz Luba
===> (try to lock again tz->lock) This patch changes the mutex_lock into mutex_trylock just to avoid the deadlock. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- Hi, This is a RFC to start some discussion. I am not sure if it should go with stable label (it affects few versions back though). It is based on v4.10-rc7. Regards, Lukasz Luba drivers/thermal/thermal_helpers.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c index 8cdf75a..7b01a85 100644 --- a/drivers/thermal/thermal_helpers.c +++ b/drivers/thermal/thermal_helpers.c @@ -75,6 +75,12 @@ EXPORT_SYMBOL(get_thermal_instance); * When a valid thermal zone reference is passed, it will fetch its * temperature and fill @temp. * + * IMPORTANT NOTICE: + * This function should not be used from driver's code which is called + * by IPA (power_allocator.c). The IPA already holds the tz->lock. + * Therefore, it is possible to get the temperature in driver code + * in a simple way: reading tz->temperature. + * * Return: On success returns 0, an error code otherwise */ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) @@ -87,7 +93,8 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp) if (!tz || IS_ERR(tz) || !tz->ops->get_temp) goto exit; - mutex_lock(&tz->lock); + if (!mutex_trylock(&tz->lock)) + goto exit; ret = tz->ops->get_temp(tz, temp);