weird cooling_device/cur_state sysfs behaviour
diff mbox series

Message ID 20200320151055.h32rh5hpnv7xyuey@macmini.local
State New
Delegated to: Daniel Lezcano
Headers show
Series
  • weird cooling_device/cur_state sysfs behaviour
Related show

Commit Message

Willy Wolff March 20, 2020, 3:10 p.m. UTC
Hi all,
Hope you're doing well with the situation.

I fill that my board get a cold too...

When I write to a sysfs node, I have a weird behaviour about the function that is called behind.
This bug appears on an arm32 odroid-xu3, and only after v5.4, v5.3.18 behave correctly.


Here my modification to see what's going on:




And here the prob:
Thursday 2019-02-14 10:26:46 root@odroid-xu3-0 ~
$ echo 11 > /sys/devices/virtual/thermal/cooling_device1/cur_state                                                                                    
[  908.109955] CPU: 5 PID: 500 Comm: bash Not tainted 5.4.0-dirty #6
[  908.114910] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[  908.121014] [<c0112f08>] (unwind_backtrace) from [<c010df38>] (show_stack+0x10/0x14)
[  908.128796] [<c010df38>] (show_stack) from [<c0ae89cc>] (dump_stack+0xa4/0xd0)
[  908.135962] [<c0ae89cc>] (dump_stack) from [<c0759e0c>] (cur_state_store+0x2c/0x14c)
[  908.143686] [<c0759e0c>] (cur_state_store) from [<c033b360>] (kernfs_fop_write+0x100/0x1e0)
[  908.152035] [<c033b360>] (kernfs_fop_write) from [<c02a93f4>] (__vfs_write+0x2c/0x1c0)
[  908.159908] [<c02a93f4>] (__vfs_write) from [<c02ac25c>] (vfs_write+0xa4/0x188)
[  908.167163] [<c02ac25c>] (vfs_write) from [<c02ac4b0>] (ksys_write+0x58/0xd0)
[  908.174242] [<c02ac4b0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[  908.181874] Exception stack(0xeb915fa8 to 0xeb915ff0)
[  908.186800] 5fa0:                   00000074 0050ba08 00000001 0050ba08 00000003 00000000
[  908.195083] 5fc0: 00000074 0050ba08 b6eb3db0 00000004 00000003 00000003 00000000 00000000
[  908.203218] 5fe0: 00000004 bef3c830 b6e4b5b3 b6dd4746
[  908.210744] drivers/thermal/thermal_sysfs.c:712:cur_state_store buf = "11
[  908.210744] ", count = 3
[  908.221520] drivers/thermal/thermal_sysfs.c:735:cur_state_store result = 1, count = 3
[  908.228681] CPU: 5 PID: 500 Comm: bash Not tainted 5.4.0-dirty #6
[  908.234268] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[  908.240372] [<c0112f08>] (unwind_backtrace) from [<c010df38>] (show_stack+0x10/0x14)
[  908.248159] [<c010df38>] (show_stack) from [<c0ae89cc>] (dump_stack+0xa4/0xd0)
[  908.255325] [<c0ae89cc>] (dump_stack) from [<c0759e0c>] (cur_state_store+0x2c/0x14c)
[  908.263048] [<c0759e0c>] (cur_state_store) from [<c033b360>] (kernfs_fop_write+0x100/0x1e0)
[  908.271398] [<c033b360>] (kernfs_fop_write) from [<c02a93f4>] (__vfs_write+0x2c/0x1c0)
[  908.279271] [<c02a93f4>] (__vfs_write) from [<c02ac25c>] (vfs_write+0xa4/0x188)
[  908.286527] [<c02ac25c>] (vfs_write) from [<c02ac4b0>] (ksys_write+0x58/0xd0)
[  908.293608] [<c02ac4b0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[  908.301241] Exception stack(0xeb915fa8 to 0xeb915ff0)
[  908.306168] 5fa0:                   00000074 0050ba09 00000001 0050ba09 00000002 00000000
[  908.314448] 5fc0: 00000074 0050ba09 b6eb3db0 00000004 00000003 00000003 00000000 00000000
[  908.322581] 5fe0: 00000004 bef3c830 b6e4b5b3 b6dd4746
[  908.328660] drivers/thermal/thermal_sysfs.c:712:cur_state_store buf = "1
[  908.328660] ", count = 2
[  908.338432] drivers/thermal/thermal_sysfs.c:735:cur_state_store result = 1, count = 2
[  908.345695] CPU: 5 PID: 500 Comm: bash Not tainted 5.4.0-dirty #6
[  908.351172] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[  908.357276] [<c0112f08>] (unwind_backtrace) from [<c010df38>] (show_stack+0x10/0x14)
[  908.365062] [<c010df38>] (show_stack) from [<c0ae89cc>] (dump_stack+0xa4/0xd0)
[  908.372227] [<c0ae89cc>] (dump_stack) from [<c0759e0c>] (cur_state_store+0x2c/0x14c)
[  908.379951] [<c0759e0c>] (cur_state_store) from [<c033b360>] (kernfs_fop_write+0x100/0x1e0)
[  908.388302] [<c033b360>] (kernfs_fop_write) from [<c02a93f4>] (__vfs_write+0x2c/0x1c0)
[  908.396176] [<c02a93f4>] (__vfs_write) from [<c02ac25c>] (vfs_write+0xa4/0x188)
[  908.403433] [<c02ac25c>] (vfs_write) from [<c02ac4b0>] (ksys_write+0x58/0xd0)
[  908.410511] [<c02ac4b0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[  908.418145] Exception stack(0xeb915fa8 to 0xeb915ff0)
[  908.423068] 5fa0:                   00000074 0050ba0a 00000001 0050ba0a 00000001 00000000
[  908.431352] 5fc0: 00000074 0050ba0a b6eb3db0 00000004 00000003 00000003 00000000 00000000
[  908.439487] 5fe0: 00000004 bef3c830 b6e4b5b3 b6dd4746
[  908.445365] drivers/thermal/thermal_sysfs.c:712:cur_state_store buf = "
[  908.445365] ", count = 1
[  908.454762] drivers/thermal/thermal_sysfs.c:716:cur_state_store
-bash: echo: write error: Invalid argument




As you can see, the function is recalled with one character less, until the "buffer is empty".
I don't understand why. Can anybody help me for this thing?
Many thanks in advance.

Best Regards,
Willy

Comments

Daniel Lezcano March 20, 2020, 5:43 p.m. UTC | #1
On 20/03/2020 16:10, Willy Wolff wrote:
> Hi all,
> Hope you're doing well with the situation.
> 
> I fill that my board get a cold too...
> 
> When I write to a sysfs node, I have a weird behaviour about the function that is called behind.
> This bug appears on an arm32 odroid-xu3, and only after v5.4, v5.3.18 behave correctly.
> 
> 
> Here my modification to see what's going on:
> 
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index aa99edb4dff7..a437ae3f4b9f 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -706,11 +706,22 @@ cur_state_store(struct device *dev, struct device_attribute *attr,

[ ... ]

>         return result ? result : count;
>  }
>  
> +

[ ... ]

> 
> As you can see, the function is recalled with one character less, until the "buffer is empty".
> I don't understand why. Can anybody help me for this thing?
> Many thanks in advance.

[ ... ]

Very likely the problem is coming from:

        result = cdev->ops->set_cur_state(cdev, state);

which returns '1' as showed by the traces:

drivers/thermal/thermal_sysfs.c:735:cur_state_store result = 1, count = 3

And because of the return condition above:

	return result ? result : count;

 the function is returning result, so '1', which is interpreted by the
sysfs as "I wrote one character', so it recalls the function with the
two remaining characters, etc ...

The problem is from the governor AFAICT, which governor is it?
Daniel Lezcano March 20, 2020, 8:17 p.m. UTC | #2
Adding Rafael,

On 20/03/2020 18:43, Daniel Lezcano wrote:
> On 20/03/2020 16:10, Willy Wolff wrote:
>> Hi all,
>> Hope you're doing well with the situation.
>>
>> I fill that my board get a cold too...
>>
>> When I write to a sysfs node, I have a weird behaviour about the function that is called behind.
>> This bug appears on an arm32 odroid-xu3, and only after v5.4, v5.3.18 behave correctly.
>>
>>
>> Here my modification to see what's going on:
>>
>> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
>> index aa99edb4dff7..a437ae3f4b9f 100644
>> --- a/drivers/thermal/thermal_sysfs.c
>> +++ b/drivers/thermal/thermal_sysfs.c
>> @@ -706,11 +706,22 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> 
> [ ... ]
> 
>>         return result ? result : count;
>>  }
>>  
>> +
> 
> [ ... ]
> 
>>
>> As you can see, the function is recalled with one character less, until the "buffer is empty".
>> I don't understand why. Can anybody help me for this thing?
>> Many thanks in advance.
> 
> [ ... ]
> 
> Very likely the problem is coming from:
> 
>         result = cdev->ops->set_cur_state(cdev, state);
> 
> which returns '1' as showed by the traces:
> 
> drivers/thermal/thermal_sysfs.c:735:cur_state_store result = 1, count = 3
> 
> And because of the return condition above:
> 
> 	return result ? result : count;
> 
>  the function is returning result, so '1', which is interpreted by the
> sysfs as "I wrote one character', so it recalls the function with the
> two remaining characters, etc ...
> 
> The problem is from the governor AFAICT, which governor is it?

I went through the code and I believe the problem is coming from:

cpufreq_set_cur_state
 -> freq_qos_update_request
   -> freq_qos_apply
       -> pm_qos_update_target =>

" * This function returns 1 if the aggregated constraint value has changed,"

	freq_qos_apply() does:

		ret = pm_qos_update_target()
	...
	return ret;


At the first glance, it is related to commit 77751a466ebd1 (Nov 2019).
Willy Wolff March 21, 2020, 9:43 a.m. UTC | #3
Indeed, many thanks for you help.

On Fri, Mar 20, 2020 at 09:17:47pm +0100, Daniel Lezcano wrote:
> Adding Rafael,
> 
> On 20/03/2020 18:43, Daniel Lezcano wrote:
> > On 20/03/2020 16:10, Willy Wolff wrote:
> >> Hi all,
> >> Hope you're doing well with the situation.
> >>
> >> I fill that my board get a cold too...
> >>
> >> When I write to a sysfs node, I have a weird behaviour about the function that is called behind.
> >> This bug appears on an arm32 odroid-xu3, and only after v5.4, v5.3.18 behave correctly.
> >>
> >>
> >> Here my modification to see what's going on:
> >>
> >> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> >> index aa99edb4dff7..a437ae3f4b9f 100644
> >> --- a/drivers/thermal/thermal_sysfs.c
> >> +++ b/drivers/thermal/thermal_sysfs.c
> >> @@ -706,11 +706,22 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> > 
> > [ ... ]
> > 
> >>         return result ? result : count;
> >>  }
> >>  
> >> +
> > 
> > [ ... ]
> > 
> >>
> >> As you can see, the function is recalled with one character less, until the "buffer is empty".
> >> I don't understand why. Can anybody help me for this thing?
> >> Many thanks in advance.
> > 
> > [ ... ]
> > 
> > Very likely the problem is coming from:
> > 
> >         result = cdev->ops->set_cur_state(cdev, state);
> > 
> > which returns '1' as showed by the traces:
> > 
> > drivers/thermal/thermal_sysfs.c:735:cur_state_store result = 1, count = 3
> > 
> > And because of the return condition above:
> > 
> > 	return result ? result : count;
> > 
> >  the function is returning result, so '1', which is interpreted by the
> > sysfs as "I wrote one character', so it recalls the function with the
> > two remaining characters, etc ...
> > 
> > The problem is from the governor AFAICT, which governor is it?
> 
> I went through the code and I believe the problem is coming from:
> 
> cpufreq_set_cur_state
>  -> freq_qos_update_request
>    -> freq_qos_apply
>        -> pm_qos_update_target =>
> 
> " * This function returns 1 if the aggregated constraint value has changed,"
> 
> 	freq_qos_apply() does:
> 
> 		ret = pm_qos_update_target()
> 	...
> 	return ret;
> 
> 
> At the first glance, it is related to commit 77751a466ebd1 (Nov 2019).
> 
> 
> 
> 
> -- 
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

Patch
diff mbox series

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index aa99edb4dff7..a437ae3f4b9f 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -706,11 +706,22 @@  cur_state_store(struct device *dev, struct device_attribute *attr,
        unsigned long state;
        int result;
 
-       if (sscanf(buf, "%ld\n", &state) != 1)
+       dump_stack();
+
+       printk(KERN_INFO "%s:%d:%s buf = \"%s\", count = %zu\n",
+              __FILE__, __LINE__, __func__, buf, count);
+
+       if (sscanf(buf, "%ld\n", &state) != 1) {
+               printk(KERN_INFO "%s:%d:%s\n",
+                      __FILE__, __LINE__, __func__);
                return -EINVAL;
+       }
 
-       if ((long)state < 0)
+       if ((long)state < 0) {
+               printk(KERN_INFO "%s:%d:%s\n",
+                      __FILE__, __LINE__, __func__);
                return -EINVAL;
+       }
 
        mutex_lock(&cdev->lock);
 
@@ -719,9 +730,14 @@  cur_state_store(struct device *dev, struct device_attribute *attr,
                thermal_cooling_device_stats_update(cdev, state);
 
        mutex_unlock(&cdev->lock);
+
+       printk(KERN_INFO "%s:%d:%s result = %d, count = %zu\n",
+              __FILE__, __LINE__, __func__, result, count);
+
        return result ? result : count;
 }
 
+
 static struct device_attribute
 dev_attr_cdev_type = __ATTR(type, 0444, cdev_type_show, NULL);
 static DEVICE_ATTR_RO(max_state);