mbox series

[0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem

Message ID 20200127151731.8640-1-lukasz.luba@arm.com (mailing list archive)
Headers show
Series drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem | expand

Message

Lukasz Luba Jan. 27, 2020, 3:17 p.m. UTC
From: Lukasz Luba <lukasz.luba@arm.com>

Hi all,

This patch is a continuation of my previous work for fixing DEVFREQ monitoring
subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
under the hood which will work normally when the system is busy, but will not
cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.

This is especially important in the SMP systems with many CPUs, when the load
balance tries to keep some CPUs idle. The next service request could not be
triggered when the CPU went idle in the meantime.

The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
use internal counters to monitor their load, when the DEVFREQ work is not
triggered in a predictable way, these counters might overflow leaving the
device in undefined state.

To observe the difference, the trace output might be used, i.e.

echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
#your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
cat /sys/kernel/debug/tracing/trace

When there are some registered devfreq drivers, you should see the traces
'devfreq_moniotor' triggered in reliable intervals.

The patch set is based on Chanwoo's devfreq repository and branch
'devfreq-next' [2].

Regards,
Lukasz Luba

[1] https://lkml.org/lkml/2019/2/12/1179
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next


Lukasz Luba (1):
  drivers: devfreq: add DELAYED_WORK to monitoring subsystem

 drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
 drivers/devfreq/devfreq.c |  6 +++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

Comments

Lukasz Luba Jan. 30, 2020, 11:47 a.m. UTC | #1
Hi Chanwoo, MyungJoo,

Gentle ping. The issue is not only in the devfreq itself,
but also it affects thermal. The devfreq cooling rely on
busy_time and total_time updated by the devfreq monitoring
(in simple_ondemand).
Thermal uses DELAYED_WORK and is more reliable, but uses stale
data from devfreq_dev_stats. It is especially visible when
you have cgroup spanning one cluster. Android uses cgroups
heavily. You can make easily this setup using 'taskset',
run some benchmarks and observe 'devfreq_monitor' traces and
timestamps, i.e. for your exynos-bus.

The patch is really non-invasive and simple. It can be a good starting
point for testing and proposing other solutions.

Regards,
Lukasz

On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
> From: Lukasz Luba <lukasz.luba@arm.com>
> 
> Hi all,
> 
> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
> under the hood which will work normally when the system is busy, but will not
> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
> 
> This is especially important in the SMP systems with many CPUs, when the load
> balance tries to keep some CPUs idle. The next service request could not be
> triggered when the CPU went idle in the meantime.
> 
> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
> use internal counters to monitor their load, when the DEVFREQ work is not
> triggered in a predictable way, these counters might overflow leaving the
> device in undefined state.
> 
> To observe the difference, the trace output might be used, i.e.
> 
> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
> cat /sys/kernel/debug/tracing/trace
> 
> When there are some registered devfreq drivers, you should see the traces
> 'devfreq_moniotor' triggered in reliable intervals.
> 
> The patch set is based on Chanwoo's devfreq repository and branch
> 'devfreq-next' [2].
> 
> Regards,
> Lukasz Luba
> 
> [1] https://lkml.org/lkml/2019/2/12/1179
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
> 
> 
> Lukasz Luba (1):
>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
> 
>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>   drivers/devfreq/devfreq.c |  6 +++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
Lukasz Luba Jan. 30, 2020, 11:47 a.m. UTC | #2
Hi Chanwoo, MyungJoo,

Gentle ping. The issue is not only in the devfreq itself,
but also it affects thermal. The devfreq cooling rely on
busy_time and total_time updated by the devfreq monitoring
(in simple_ondemand).
Thermal uses DELAYED_WORK and is more reliable, but uses stale
data from devfreq_dev_stats. It is especially visible when
you have cgroup spanning one cluster. Android uses cgroups
heavily. You can make easily this setup using 'taskset',
run some benchmarks and observe 'devfreq_monitor' traces and
timestamps, i.e. for your exynos-bus.

The patch is really non-invasive and simple. It can be a good starting
point for testing and proposing other solutions.

Regards,
Lukasz

On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
> From: Lukasz Luba <lukasz.luba@arm.com>
> 
> Hi all,
> 
> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
> under the hood which will work normally when the system is busy, but will not
> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
> 
> This is especially important in the SMP systems with many CPUs, when the load
> balance tries to keep some CPUs idle. The next service request could not be
> triggered when the CPU went idle in the meantime.
> 
> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
> use internal counters to monitor their load, when the DEVFREQ work is not
> triggered in a predictable way, these counters might overflow leaving the
> device in undefined state.
> 
> To observe the difference, the trace output might be used, i.e.
> 
> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
> cat /sys/kernel/debug/tracing/trace
> 
> When there are some registered devfreq drivers, you should see the traces
> 'devfreq_moniotor' triggered in reliable intervals.
> 
> The patch set is based on Chanwoo's devfreq repository and branch
> 'devfreq-next' [2].
> 
> Regards,
> Lukasz Luba
> 
> [1] https://lkml.org/lkml/2019/2/12/1179
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
> 
> 
> Lukasz Luba (1):
>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
> 
>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>   drivers/devfreq/devfreq.c |  6 +++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
Chanwoo Choi Jan. 31, 2020, 12:42 a.m. UTC | #3
Hi Lukasz,

On 1/30/20 8:47 PM, Lukasz Luba wrote:
> Hi Chanwoo, MyungJoo,
> 
> Gentle ping. The issue is not only in the devfreq itself,
> but also it affects thermal. The devfreq cooling rely on
> busy_time and total_time updated by the devfreq monitoring
> (in simple_ondemand).
> Thermal uses DELAYED_WORK and is more reliable, but uses stale
> data from devfreq_dev_stats. It is especially visible when
> you have cgroup spanning one cluster. Android uses cgroups
> heavily. You can make easily this setup using 'taskset',
> run some benchmarks and observe 'devfreq_monitor' traces and
> timestamps, i.e. for your exynos-bus.
> 
> The patch is really non-invasive and simple. It can be a good starting
> point for testing and proposing other solutions.

Sorry for late reply. I'm preparing the RFC patch about my approach
to support this requirement as following:

As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
the redundant of power-consumption by preventing the unneeded wakeup
from idle state if there are no any interrupts and runnable threads.

Finally, I agree the requirement of delaywd_work for devfreq subsystem.
But, I would like to support both deferrable_work and delayed_work
on devfreq subsystem. It is better to select either deferrable_work
or delayed_work by user like Kamil's suggestion[1].
[1] https://lore.kernel.org/patchwork/patch/1164317/
- [2/4] PM / devfreq: add possibility for delayed work

But, I want to change the timer type for devfreq device
using simple_ondemand governor via sysfs as following:

Example:

1.
enum work_timer_type {
	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
	DEVFREQ_WORK_TIMER_DELAYED = 0,
};

struct devfreq_simple_ondemand_data {
	unsigned int upthreshold;
	unsigned int downdifferential;
	enum work_timer_type timer_type;
};

The developer of devfreq device driver can choose
the default work time type by initializing the 'timer_type of 
struct devfreq_simple_ondemand_data'.

2. Change the work timer type at the runtime
- Change the work timer type from 'deferrable' to 'delayed'
$ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
$ cat /sys/class/devfreq/devfreq0/work_timer_type
delayed

- Change the work timer type from 'delayed' to 'deferrable'
$ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
$ cat /sys/class/devfreq/devfreq0/work_timer_type
deferrable

I'm developing the RFC patch and then I'll send it as soon as possible.

> 
> Regards,
> Lukasz
> 
> On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
>> From: Lukasz Luba <lukasz.luba@arm.com>
>>
>> Hi all,
>>
>> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
>> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
>> under the hood which will work normally when the system is busy, but will not
>> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
>>
>> This is especially important in the SMP systems with many CPUs, when the load
>> balance tries to keep some CPUs idle. The next service request could not be
>> triggered when the CPU went idle in the meantime.
>>
>> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
>> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
>> use internal counters to monitor their load, when the DEVFREQ work is not
>> triggered in a predictable way, these counters might overflow leaving the
>> device in undefined state.
>>
>> To observe the difference, the trace output might be used, i.e.
>>
>> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
>> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
>> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
>> cat /sys/kernel/debug/tracing/trace
>>
>> When there are some registered devfreq drivers, you should see the traces
>> 'devfreq_moniotor' triggered in reliable intervals.
>>
>> The patch set is based on Chanwoo's devfreq repository and branch
>> 'devfreq-next' [2].
>>
>> Regards,
>> Lukasz Luba
>>
>> [1] https://protect2.fireeye.com/url?k=d26154c0-8fb20fd4-d260df8f-0cc47a31ce4e-ba68a61e16ee1965&u=https://lkml.org/lkml/2019/2/12/1179
>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
>>
>>
>> Lukasz Luba (1):
>>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
>>
>>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>>   drivers/devfreq/devfreq.c |  6 +++++-
>>   2 files changed, 24 insertions(+), 1 deletion(-)
>>
> 
>
Chanwoo Choi Jan. 31, 2020, 12:47 a.m. UTC | #4
On 1/31/20 9:42 AM, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>> Hi Chanwoo, MyungJoo,
>>
>> Gentle ping. The issue is not only in the devfreq itself,
>> but also it affects thermal. The devfreq cooling rely on
>> busy_time and total_time updated by the devfreq monitoring
>> (in simple_ondemand).
>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>> data from devfreq_dev_stats. It is especially visible when
>> you have cgroup spanning one cluster. Android uses cgroups
>> heavily. You can make easily this setup using 'taskset',
>> run some benchmarks and observe 'devfreq_monitor' traces and
>> timestamps, i.e. for your exynos-bus.
>>
>> The patch is really non-invasive and simple. It can be a good starting
>> point for testing and proposing other solutions.
> 
> Sorry for late reply. I'm preparing the RFC patch about my approach
> to support this requirement as following:
> 
> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
> the redundant of power-consumption by preventing the unneeded wakeup
> from idle state if there are no any interrupts and runnable threads.
> 
> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
> But, I would like to support both deferrable_work and delayed_work
> on devfreq subsystem. It is better to select either deferrable_work
> or delayed_work by user like Kamil's suggestion[1].
> [1] https://lore.kernel.org/patchwork/patch/1164317/
> - [2/4] PM / devfreq: add possibility for delayed work
> 
> But, I want to change the timer type for devfreq device
> using simple_ondemand governor via sysfs as following:
> 
> Example:
> 
> 1.
> enum work_timer_type {
> 	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
> 	DEVFREQ_WORK_TIMER_DELAYED = 0,
> };
> 
> struct devfreq_simple_ondemand_data {
> 	unsigned int upthreshold;
> 	unsigned int downdifferential;
> 	enum work_timer_type timer_type;
> };
> 
> The developer of devfreq device driver can choose
> the default work time type by initializing the 'timer_type of 
> struct devfreq_simple_ondemand_data'.
> 
> 2. Change the work timer type at the runtime
> - Change the work timer type from 'deferrable' to 'delayed'
> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
> $ cat /sys/class/devfreq/devfreq0/work_timer_type
> delayed
> 
> - Change the work timer type from 'delayed' to 'deferrable'
> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
> $ cat /sys/class/devfreq/devfreq0/work_timer_type
> deferrable
> 

And
Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
sysfs attribute only depends on simple_ondemand governor and are useful.

So, 'work_timer_type' sysfs attribute will be handled
at drivers/devfreq/governor_simpleondemand.c.

After posting my suggestion, we can discuss it.


> I'm developing the RFC patch and then I'll send it as soon as possible.
> 
>>
>> Regards,
>> Lukasz
>>
>> On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
>>> From: Lukasz Luba <lukasz.luba@arm.com>
>>>
>>> Hi all,
>>>
>>> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
>>> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
>>> under the hood which will work normally when the system is busy, but will not
>>> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
>>>
>>> This is especially important in the SMP systems with many CPUs, when the load
>>> balance tries to keep some CPUs idle. The next service request could not be
>>> triggered when the CPU went idle in the meantime.
>>>
>>> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
>>> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
>>> use internal counters to monitor their load, when the DEVFREQ work is not
>>> triggered in a predictable way, these counters might overflow leaving the
>>> device in undefined state.
>>>
>>> To observe the difference, the trace output might be used, i.e.
>>>
>>> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
>>> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
>>> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
>>> cat /sys/kernel/debug/tracing/trace
>>>
>>> When there are some registered devfreq drivers, you should see the traces
>>> 'devfreq_moniotor' triggered in reliable intervals.
>>>
>>> The patch set is based on Chanwoo's devfreq repository and branch
>>> 'devfreq-next' [2].
>>>
>>> Regards,
>>> Lukasz Luba
>>>
>>> [1] https://protect2.fireeye.com/url?k=d26154c0-8fb20fd4-d260df8f-0cc47a31ce4e-ba68a61e16ee1965&u=https://lkml.org/lkml/2019/2/12/1179
>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
>>>
>>>
>>> Lukasz Luba (1):
>>>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
>>>
>>>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>>>   drivers/devfreq/devfreq.c |  6 +++++-
>>>   2 files changed, 24 insertions(+), 1 deletion(-)
>>>
>>
>>
> 
>
Lukasz Luba Jan. 31, 2020, 9:38 a.m. UTC | #5
Hi Chanwoo,

On 1/31/20 12:47 AM, Chanwoo Choi wrote:
> On 1/31/20 9:42 AM, Chanwoo Choi wrote:
>> Hi Lukasz,
>>
>> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>>> Hi Chanwoo, MyungJoo,
>>>
>>> Gentle ping. The issue is not only in the devfreq itself,
>>> but also it affects thermal. The devfreq cooling rely on
>>> busy_time and total_time updated by the devfreq monitoring
>>> (in simple_ondemand).
>>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>>> data from devfreq_dev_stats. It is especially visible when
>>> you have cgroup spanning one cluster. Android uses cgroups
>>> heavily. You can make easily this setup using 'taskset',
>>> run some benchmarks and observe 'devfreq_monitor' traces and
>>> timestamps, i.e. for your exynos-bus.
>>>
>>> The patch is really non-invasive and simple. It can be a good starting
>>> point for testing and proposing other solutions.
>>
>> Sorry for late reply. I'm preparing the RFC patch about my approach
>> to support this requirement as following:
>>
>> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
>> the redundant of power-consumption by preventing the unneeded wakeup
>> from idle state if there are no any interrupts and runnable threads.
>>
>> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
>> But, I would like to support both deferrable_work and delayed_work
>> on devfreq subsystem. It is better to select either deferrable_work
>> or delayed_work by user like Kamil's suggestion[1].
>> [1] https://lore.kernel.org/patchwork/patch/1164317/
>> - [2/4] PM / devfreq: add possibility for delayed work
>>
>> But, I want to change the timer type for devfreq device
>> using simple_ondemand governor via sysfs as following:
>>
>> Example:
>>
>> 1.
>> enum work_timer_type {
>> 	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
>> 	DEVFREQ_WORK_TIMER_DELAYED = 0,
>> };
>>
>> struct devfreq_simple_ondemand_data {
>> 	unsigned int upthreshold;
>> 	unsigned int downdifferential;
>> 	enum work_timer_type timer_type;
>> };
>>
>> The developer of devfreq device driver can choose
>> the default work time type by initializing the 'timer_type of
>> struct devfreq_simple_ondemand_data'.
>>
>> 2. Change the work timer type at the runtime
>> - Change the work timer type from 'deferrable' to 'delayed'
>> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>> delayed
>>
>> - Change the work timer type from 'delayed' to 'deferrable'
>> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>> deferrable
>>
> 
> And
> Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
> if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
> sysfs attribute only depends on simple_ondemand governor and are useful.
> 
> So, 'work_timer_type' sysfs attribute will be handled
> at drivers/devfreq/governor_simpleondemand.c.
> 
> After posting my suggestion, we can discuss it.
> 
> 
>> I'm developing the RFC patch and then I'll send it as soon as possible.

Good, thank you for the explanation. For the first glance the design
looks OK, we can discuss it a bit more in you RFC series.
I would recommend to not make it conditional on simple_ondemand governor
just add a comment that for i.e. performance or passive governors it has
less sense to use this setting. There might be some other governors
loaded as modules, which could benefit from it, or in Android e.g.
https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-coral-4.14-android10/drivers/devfreq/governor_msm_adreno_tz.c

It would be good if it can land in mainline before v5.8-v5.9.

Regards,
Lukasz
Chanwoo Choi Feb. 3, 2020, 1:10 a.m. UTC | #6
On 1/31/20 6:38 PM, Lukasz Luba wrote:
> Hi Chanwoo,
> 
> On 1/31/20 12:47 AM, Chanwoo Choi wrote:
>> On 1/31/20 9:42 AM, Chanwoo Choi wrote:
>>> Hi Lukasz,
>>>
>>> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>>>> Hi Chanwoo, MyungJoo,
>>>>
>>>> Gentle ping. The issue is not only in the devfreq itself,
>>>> but also it affects thermal. The devfreq cooling rely on
>>>> busy_time and total_time updated by the devfreq monitoring
>>>> (in simple_ondemand).
>>>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>>>> data from devfreq_dev_stats. It is especially visible when
>>>> you have cgroup spanning one cluster. Android uses cgroups
>>>> heavily. You can make easily this setup using 'taskset',
>>>> run some benchmarks and observe 'devfreq_monitor' traces and
>>>> timestamps, i.e. for your exynos-bus.
>>>>
>>>> The patch is really non-invasive and simple. It can be a good starting
>>>> point for testing and proposing other solutions.
>>>
>>> Sorry for late reply. I'm preparing the RFC patch about my approach
>>> to support this requirement as following:
>>>
>>> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
>>> the redundant of power-consumption by preventing the unneeded wakeup
>>> from idle state if there are no any interrupts and runnable threads.
>>>
>>> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
>>> But, I would like to support both deferrable_work and delayed_work
>>> on devfreq subsystem. It is better to select either deferrable_work
>>> or delayed_work by user like Kamil's suggestion[1].
>>> [1] https://lore.kernel.org/patchwork/patch/1164317/
>>> - [2/4] PM / devfreq: add possibility for delayed work
>>>
>>> But, I want to change the timer type for devfreq device
>>> using simple_ondemand governor via sysfs as following:
>>>
>>> Example:
>>>
>>> 1.
>>> enum work_timer_type {
>>>     DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
>>>     DEVFREQ_WORK_TIMER_DELAYED = 0,
>>> };
>>>
>>> struct devfreq_simple_ondemand_data {
>>>     unsigned int upthreshold;
>>>     unsigned int downdifferential;
>>>     enum work_timer_type timer_type;
>>> };
>>>
>>> The developer of devfreq device driver can choose
>>> the default work time type by initializing the 'timer_type of
>>> struct devfreq_simple_ondemand_data'.
>>>
>>> 2. Change the work timer type at the runtime
>>> - Change the work timer type from 'deferrable' to 'delayed'
>>> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
>>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>>> delayed
>>>
>>> - Change the work timer type from 'delayed' to 'deferrable'
>>> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
>>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>>> deferrable
>>>
>>
>> And
>> Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
>> if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
>> sysfs attribute only depends on simple_ondemand governor and are useful.
>>
>> So, 'work_timer_type' sysfs attribute will be handled
>> at drivers/devfreq/governor_simpleondemand.c.
>>
>> After posting my suggestion, we can discuss it.
>>
>>
>>> I'm developing the RFC patch and then I'll send it as soon as possible.
> 
> Good, thank you for the explanation. For the first glance the design
> looks OK, we can discuss it a bit more in you RFC series.
> I would recommend to not make it conditional on simple_ondemand governor
> just add a comment that for i.e. performance or passive governors it has
> less sense to use this setting. There might be some other governors
> loaded as modules, which could benefit from it, or in Android e.g.
> https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-coral-4.14-android10/drivers/devfreq/governor_msm_adreno_tz.c

OK. Instead, I'll add the flag for governors. The governor flag
indicates each sysfs entries like polling_interval, work_timer_type.

If each governor want to use the specific sysfs attributes,
just set the flag when governor is defined.

Thanks.

> 
> It would be good if it can land in mainline before v5.8-v5.9.
> 
> Regards,
> Lukasz
> 
> 
> 
>