mbox series

[RFC,0/2] cpufreq_ext: Introduce cpufreq ext governor

Message ID 20240927101342.3240263-1-zouyipeng@huawei.com (mailing list archive)
Headers show
Series cpufreq_ext: Introduce cpufreq ext governor | expand

Message

Yipeng Zou Sept. 27, 2024, 10:13 a.m. UTC
Hi everyone,

I am currently working on a patch for a CPU frequency governor based on
BPF, which can use BPF to customize and implement various frequency
scaling strategies.

If you have any feedback or suggestions, please do let me know.

Motivation
----------

1. Customization

Existing cpufreq governors in the kernel are designed for general
scenarios, which may not always be optimal for specific or specialized
workloads.

The userspace governor allows direct control over cpufreq, but users
often require guidance from the kernel to achieve the desired frequency.

Cpufreq_ext aims to address this by providing a customizable framework that
can be tailored to the unique needs of different systems and applications.

While cpufreq governors can be implemented within a kernel module,
maintaining a ko tailored for specific scenarios can be challenging.
The complexity and overhead associated with kernel modules make it
difficult to quickly adapt and deploy custom frequency scaling strategies.

Cpufreq_ext leverages BPF to offer a more lightweight and flexible approach
to implementing customized strategies, allowing for easier maintenance and
deployment.

2. Integration with sched_ext:

sched_ext is a scheduler class whose behavior can be defined by a set of
BPF programs - the BPF scheduler.

Look for more about sched_ext in [1]:

	[1] https://www.kernel.org/doc/html/next/scheduler/sched-ext.html

The interaction between CPU frequency scaling and task scheduling is
critical for performance.

cpufreq_ext can work with sched_ext to ensure that both scheduling
decisions and frequency adjustments are made in a coordinated manner,
optimizing system responsiveness and power consumption.

Overview
--------

The cpufreq ext is a BPF based cpufreq governor, we can customize
cpufreq governor in BPF program.

CPUFreq ext works as common cpufreq governor with cpufreq policy.

		   --------------------------
		  |        BPF governor      |
		   --------------------------
			       |
			       v
			  BPF Register
			       |
			       v
	    --------------------------------------
	   |             CPUFreq ext              |
	    --------------------------------------
	      ^                ^               ^
	      |                |               |
	   ---------       ---------       ---------
	  | policy0 | ... | policy1 | ... | policyn |
	   ---------       ---------       ---------

We can register serval function hooks to cpufreq ext by BPF Struct OPS.

The first patch define a dbs_governor, and it's works like other
governor.

The second patch gives a sample how to use it, implement one
typical cpufreq governor, switch to max cpufreq when VIP task
is running on target cpu.

Detail
------

The cpufreq ext use bpf_struct_ops to register serval function hooks.

	struct cpufreq_governor_ext_ops {
		...
	}

Cpufreq_governor_ext_ops defines all the functions that BPF programs can
implement customly.

If you need to add a custom function, you only need to define it in this
struct.

At the moment we have defined the basic functions.

1. unsigned long (*get_next_freq)(struct cpufreq_policy *policy)

	Make decision how to adjust cpufreq here.
	The return value represents the CPU frequency that will be
	updated.

2. unsigned int (*get_sampling_rate)(struct cpufreq_policy *policy)

	Make decision how to adjust sampling_rate here.
	The return value represents the governor samplint rate that
	will be updated.

3. unsigned int (*init)(void)

	BPF governor init callback, return 0 means success.

4. void (*exit)(void)

	BPF governor exit callback.

5. char name[CPUFREQ_EXT_NAME_LEN]

	BPF governor name.

The cpufreq_ext also add sysfs interface which refer to governor status.

1. ext/stat attribute:

	Access to current BPF governor status.

	# cat /sys/devices/system/cpu/cpufreq/ext/stat
	Stat: CPUFREQ_EXT_INIT
	BPF governor: performance

There are number of constraints on the cpufreq_ext:

1. Only one ext governor can be registered at a time.

2. By default, it operates as a performance governor when no BPF
   governor is registered.

3. The cpufreq_ext governor must be selected before loading a BPF
   governor; otherwise, the installation of the BPF governor will fail.

TODO
----

The current patch is a starting point, and future work will focus on
expanding its capabilities.

I plan to leverage the BPF ecosystem to introduce innovative features,
such as real-time adjustments and optimizations based on system-wide
observations and analytics.

And I am looking forward to any insights, critiques, or suggestions you
may have.

Yipeng Zou (2):
  cpufreq_ext: Introduce cpufreq ext governor
  cpufreq_ext: Add bpf sample

 drivers/cpufreq/Kconfig        |  23 ++
 drivers/cpufreq/Makefile       |   1 +
 drivers/cpufreq/cpufreq_ext.c  | 525 +++++++++++++++++++++++++++++++++
 samples/bpf/.gitignore         |   1 +
 samples/bpf/Makefile           |   8 +-
 samples/bpf/cpufreq_ext.bpf.c  | 113 +++++++
 samples/bpf/cpufreq_ext_user.c |  48 +++
 7 files changed, 718 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cpufreq/cpufreq_ext.c
 create mode 100644 samples/bpf/cpufreq_ext.bpf.c
 create mode 100644 samples/bpf/cpufreq_ext_user.c

Comments

Alexei Starovoitov Sept. 29, 2024, 4:56 p.m. UTC | #1
On Fri, Sep 27, 2024 at 3:03 AM Yipeng Zou <zouyipeng@huawei.com> wrote:
>
> Hi everyone,
>
> I am currently working on a patch for a CPU frequency governor based on
> BPF, which can use BPF to customize and implement various frequency
> scaling strategies.
>
> If you have any feedback or suggestions, please do let me know.
>
> Motivation
> ----------
>
> 1. Customization
>
> Existing cpufreq governors in the kernel are designed for general
> scenarios, which may not always be optimal for specific or specialized
> workloads.
>
> The userspace governor allows direct control over cpufreq, but users
> often require guidance from the kernel to achieve the desired frequency.
>
> Cpufreq_ext aims to address this by providing a customizable framework that
> can be tailored to the unique needs of different systems and applications.
>
> While cpufreq governors can be implemented within a kernel module,
> maintaining a ko tailored for specific scenarios can be challenging.
> The complexity and overhead associated with kernel modules make it
> difficult to quickly adapt and deploy custom frequency scaling strategies.
>
> Cpufreq_ext leverages BPF to offer a more lightweight and flexible approach
> to implementing customized strategies, allowing for easier maintenance and
> deployment.
>
> 2. Integration with sched_ext:
>
> sched_ext is a scheduler class whose behavior can be defined by a set of
> BPF programs - the BPF scheduler.
>
> Look for more about sched_ext in [1]:
>
>         [1] https://www.kernel.org/doc/html/next/scheduler/sched-ext.html
>
> The interaction between CPU frequency scaling and task scheduling is
> critical for performance.
>
> cpufreq_ext can work with sched_ext to ensure that both scheduling
> decisions and frequency adjustments are made in a coordinated manner,
> optimizing system responsiveness and power consumption.

I think sched-ext already has a mechanism to influence cpufreq.
How is this different ?

Pls cc sched-ext folks in the future.

> Overview
> --------
>
> The cpufreq ext is a BPF based cpufreq governor, we can customize
> cpufreq governor in BPF program.
>
> CPUFreq ext works as common cpufreq governor with cpufreq policy.
>
>                    --------------------------
>                   |        BPF governor      |
>                    --------------------------
>                                |
>                                v
>                           BPF Register
>                                |
>                                v
>             --------------------------------------
>            |             CPUFreq ext              |
>             --------------------------------------
>               ^                ^               ^
>               |                |               |
>            ---------       ---------       ---------
>           | policy0 | ... | policy1 | ... | policyn |
>            ---------       ---------       ---------
>
> We can register serval function hooks to cpufreq ext by BPF Struct OPS.
>
> The first patch define a dbs_governor, and it's works like other
> governor.
>
> The second patch gives a sample how to use it, implement one
> typical cpufreq governor, switch to max cpufreq when VIP task
> is running on target cpu.
>
> Detail
> ------
>
> The cpufreq ext use bpf_struct_ops to register serval function hooks.
>
>         struct cpufreq_governor_ext_ops {
>                 ...
>         }
>
> Cpufreq_governor_ext_ops defines all the functions that BPF programs can
> implement customly.
>
> If you need to add a custom function, you only need to define it in this
> struct.
>
> At the moment we have defined the basic functions.
>
> 1. unsigned long (*get_next_freq)(struct cpufreq_policy *policy)
>
>         Make decision how to adjust cpufreq here.
>         The return value represents the CPU frequency that will be
>         updated.
>
> 2. unsigned int (*get_sampling_rate)(struct cpufreq_policy *policy)
>
>         Make decision how to adjust sampling_rate here.
>         The return value represents the governor samplint rate that
>         will be updated.
>
> 3. unsigned int (*init)(void)
>
>         BPF governor init callback, return 0 means success.
>
> 4. void (*exit)(void)
>
>         BPF governor exit callback.
>
> 5. char name[CPUFREQ_EXT_NAME_LEN]
>
>         BPF governor name.
>
> The cpufreq_ext also add sysfs interface which refer to governor status.
>
> 1. ext/stat attribute:
>
>         Access to current BPF governor status.
>
>         # cat /sys/devices/system/cpu/cpufreq/ext/stat
>         Stat: CPUFREQ_EXT_INIT
>         BPF governor: performance
>
> There are number of constraints on the cpufreq_ext:
>
> 1. Only one ext governor can be registered at a time.
>
> 2. By default, it operates as a performance governor when no BPF
>    governor is registered.
>
> 3. The cpufreq_ext governor must be selected before loading a BPF
>    governor; otherwise, the installation of the BPF governor will fail.
>
> TODO
> ----
>
> The current patch is a starting point, and future work will focus on
> expanding its capabilities.
>
> I plan to leverage the BPF ecosystem to introduce innovative features,
> such as real-time adjustments and optimizations based on system-wide
> observations and analytics.
>
> And I am looking forward to any insights, critiques, or suggestions you
> may have.
>
> Yipeng Zou (2):
>   cpufreq_ext: Introduce cpufreq ext governor
>   cpufreq_ext: Add bpf sample
>
>  drivers/cpufreq/Kconfig        |  23 ++
>  drivers/cpufreq/Makefile       |   1 +
>  drivers/cpufreq/cpufreq_ext.c  | 525 +++++++++++++++++++++++++++++++++
>  samples/bpf/.gitignore         |   1 +
>  samples/bpf/Makefile           |   8 +-
>  samples/bpf/cpufreq_ext.bpf.c  | 113 +++++++
>  samples/bpf/cpufreq_ext_user.c |  48 +++
>  7 files changed, 718 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cpufreq/cpufreq_ext.c
>  create mode 100644 samples/bpf/cpufreq_ext.bpf.c
>  create mode 100644 samples/bpf/cpufreq_ext_user.c
>
> --
> 2.34.1
>
Daniel Hodges Sept. 29, 2024, 11:23 p.m. UTC | #2
On Fri, Sep 27, 2024 at 06:13:40PM +0800, Yipeng Zou wrote:
> Hi everyone,
> 
> I am currently working on a patch for a CPU frequency governor based on
> BPF, which can use BPF to customize and implement various frequency
> scaling strategies.
> 
> If you have any feedback or suggestions, please do let me know.
> 
> Motivation
> ----------
> 
> 1. Customization
> 
> Existing cpufreq governors in the kernel are designed for general
> scenarios, which may not always be optimal for specific or specialized
> workloads.
> 
> The userspace governor allows direct control over cpufreq, but users
> often require guidance from the kernel to achieve the desired frequency.
> 
> Cpufreq_ext aims to address this by providing a customizable framework that
> can be tailored to the unique needs of different systems and applications.
> 
> While cpufreq governors can be implemented within a kernel module,
> maintaining a ko tailored for specific scenarios can be challenging.
> The complexity and overhead associated with kernel modules make it
> difficult to quickly adapt and deploy custom frequency scaling strategies.
> 
> Cpufreq_ext leverages BPF to offer a more lightweight and flexible approach
> to implementing customized strategies, allowing for easier maintenance and
> deployment.
> 
> 2. Integration with sched_ext:
> 
> sched_ext is a scheduler class whose behavior can be defined by a set of
> BPF programs - the BPF scheduler.
> 
> Look for more about sched_ext in [1]:
> 
> 	[1] https://www.kernel.org/doc/html/next/scheduler/sched-ext.html
> 
> The interaction between CPU frequency scaling and task scheduling is
> critical for performance.
> 
> cpufreq_ext can work with sched_ext to ensure that both scheduling
> decisions and frequency adjustments are made in a coordinated manner,
> optimizing system responsiveness and power consumption.

Hi Yipeng, I prototyped something really similar earlier this year and
the conclusion I came to was that a governor might not be the right
abstraction for struct_ops. One issue is that depending on the frequency
driver being used it may have it governor implmentation included (ex:
intel_pstate). For sched_ext there is already a kfunc
(scx_bpf_cpuperf_set) which is a calls into cpufreq_update_util and that
has been working well so far.

> Overview
> --------
> 
> The cpufreq ext is a BPF based cpufreq governor, we can customize
> cpufreq governor in BPF program.
> 
> CPUFreq ext works as common cpufreq governor with cpufreq policy.
> 
> 		   --------------------------
> 		  |        BPF governor      |
> 		   --------------------------
> 			       |
> 			       v
> 			  BPF Register
> 			       |
> 			       v
> 	    --------------------------------------
> 	   |             CPUFreq ext              |
> 	    --------------------------------------
> 	      ^                ^               ^
> 	      |                |               |
> 	   ---------       ---------       ---------
> 	  | policy0 | ... | policy1 | ... | policyn |
> 	   ---------       ---------       ---------
> 
> We can register serval function hooks to cpufreq ext by BPF Struct OPS.
> 
> The first patch define a dbs_governor, and it's works like other
> governor.
> 
> The second patch gives a sample how to use it, implement one
> typical cpufreq governor, switch to max cpufreq when VIP task
> is running on target cpu.
> 
> Detail
> ------
> 
> The cpufreq ext use bpf_struct_ops to register serval function hooks.
> 
> 	struct cpufreq_governor_ext_ops {
> 		...
> 	}
> 
> Cpufreq_governor_ext_ops defines all the functions that BPF programs can
> implement customly.
> 
> If you need to add a custom function, you only need to define it in this
> struct.
> 
> At the moment we have defined the basic functions.
> 
> 1. unsigned long (*get_next_freq)(struct cpufreq_policy *policy)
> 
> 	Make decision how to adjust cpufreq here.
> 	The return value represents the CPU frequency that will be
> 	updated.
> 
> 2. unsigned int (*get_sampling_rate)(struct cpufreq_policy *policy)
> 
> 	Make decision how to adjust sampling_rate here.
> 	The return value represents the governor samplint rate that
> 	will be updated.
> 

Why does the governor need a sampling rate? Could this be done with a
bpf timer instead?

> 3. unsigned int (*init)(void)
> 
> 	BPF governor init callback, return 0 means success.
> 
> 4. void (*exit)(void)
> 
> 	BPF governor exit callback.
> 
> 5. char name[CPUFREQ_EXT_NAME_LEN]
> 
> 	BPF governor name.
> 
I'm guessing it would be useful to have the governor dispatch on almost
all the governor methods. IIRC I had something like:

	int	(*start)(struct cpufreq_policy *policy);
	void	(*stop)(struct cpufreq_policy *policy);
	void	(*limits)(struct cpufreq_policy *policy);
	int	(*store_setspeed)(struct cpufreq_policy *policy,
				  unsigned int freq);

> The cpufreq_ext also add sysfs interface which refer to governor status.
> 
> 1. ext/stat attribute:
> 
> 	Access to current BPF governor status.
> 
> 	# cat /sys/devices/system/cpu/cpufreq/ext/stat
> 	Stat: CPUFREQ_EXT_INIT
> 	BPF governor: performance
> 
> There are number of constraints on the cpufreq_ext:
> 
> 1. Only one ext governor can be registered at a time.
> 
> 2. By default, it operates as a performance governor when no BPF
>    governor is registered.
> 
> 3. The cpufreq_ext governor must be selected before loading a BPF
>    governor; otherwise, the installation of the BPF governor will fail.
> 
> TODO
> ----
> 
> The current patch is a starting point, and future work will focus on
> expanding its capabilities.
> 
> I plan to leverage the BPF ecosystem to introduce innovative features,
> such as real-time adjustments and optimizations based on system-wide
> observations and analytics.
> 
> And I am looking forward to any insights, critiques, or suggestions you
> may have.
> 
> Yipeng Zou (2):
>   cpufreq_ext: Introduce cpufreq ext governor
>   cpufreq_ext: Add bpf sample
> 
>  drivers/cpufreq/Kconfig        |  23 ++
>  drivers/cpufreq/Makefile       |   1 +
>  drivers/cpufreq/cpufreq_ext.c  | 525 +++++++++++++++++++++++++++++++++
>  samples/bpf/.gitignore         |   1 +
>  samples/bpf/Makefile           |   8 +-
>  samples/bpf/cpufreq_ext.bpf.c  | 113 +++++++
>  samples/bpf/cpufreq_ext_user.c |  48 +++
>  7 files changed, 718 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cpufreq/cpufreq_ext.c
>  create mode 100644 samples/bpf/cpufreq_ext.bpf.c
>  create mode 100644 samples/bpf/cpufreq_ext_user.c
> 
> -- 
> 2.34.1
>
Tejun Heo Sept. 30, 2024, 6:22 p.m. UTC | #3
(cc'ing Daniel Hodges and quoting the whole body)

On Sun, Sep 29, 2024 at 09:56:02AM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 27, 2024 at 3:03 AM Yipeng Zou <zouyipeng@huawei.com> wrote:
> >
> > Hi everyone,
> >
> > I am currently working on a patch for a CPU frequency governor based on
> > BPF, which can use BPF to customize and implement various frequency
> > scaling strategies.
> >
> > If you have any feedback or suggestions, please do let me know.
> >
> > Motivation
> > ----------
> >
> > 1. Customization
> >
> > Existing cpufreq governors in the kernel are designed for general
> > scenarios, which may not always be optimal for specific or specialized
> > workloads.
> >
> > The userspace governor allows direct control over cpufreq, but users
> > often require guidance from the kernel to achieve the desired frequency.
> >
> > Cpufreq_ext aims to address this by providing a customizable framework that
> > can be tailored to the unique needs of different systems and applications.
> >
> > While cpufreq governors can be implemented within a kernel module,
> > maintaining a ko tailored for specific scenarios can be challenging.
> > The complexity and overhead associated with kernel modules make it
> > difficult to quickly adapt and deploy custom frequency scaling strategies.
> >
> > Cpufreq_ext leverages BPF to offer a more lightweight and flexible approach
> > to implementing customized strategies, allowing for easier maintenance and
> > deployment.
> >
> > 2. Integration with sched_ext:
> >
> > sched_ext is a scheduler class whose behavior can be defined by a set of
> > BPF programs - the BPF scheduler.
> >
> > Look for more about sched_ext in [1]:
> >
> >         [1] https://www.kernel.org/doc/html/next/scheduler/sched-ext.html
> >
> > The interaction between CPU frequency scaling and task scheduling is
> > critical for performance.
> >
> > cpufreq_ext can work with sched_ext to ensure that both scheduling
> > decisions and frequency adjustments are made in a coordinated manner,
> > optimizing system responsiveness and power consumption.
> 
> I think sched-ext already has a mechanism to influence cpufreq.
> How is this different ?

FWIW, sched_ext's cpufreq implementation is through the schedutil governor.
All that the BPF scheduler does is providing utilization signal to the
governor. This seems to work fine for sched_ext schedulers (this doesn't
preclude more direct BPF governor).

> Pls cc sched-ext folks in the future.

Yeah, it'd be great if you can cc Daniel, me and sched-ext@meta.com.

> > Overview
> > --------
> >
> > The cpufreq ext is a BPF based cpufreq governor, we can customize
> > cpufreq governor in BPF program.
> >
> > CPUFreq ext works as common cpufreq governor with cpufreq policy.
> >
> >                    --------------------------
> >                   |        BPF governor      |
> >                    --------------------------
> >                                |
> >                                v
> >                           BPF Register
> >                                |
> >                                v
> >             --------------------------------------
> >            |             CPUFreq ext              |
> >             --------------------------------------
> >               ^                ^               ^
> >               |                |               |
> >            ---------       ---------       ---------
> >           | policy0 | ... | policy1 | ... | policyn |
> >            ---------       ---------       ---------
> >
> > We can register serval function hooks to cpufreq ext by BPF Struct OPS.
> >
> > The first patch define a dbs_governor, and it's works like other
> > governor.
> >
> > The second patch gives a sample how to use it, implement one
> > typical cpufreq governor, switch to max cpufreq when VIP task
> > is running on target cpu.
> >
> > Detail
> > ------
> >
> > The cpufreq ext use bpf_struct_ops to register serval function hooks.
> >
> >         struct cpufreq_governor_ext_ops {
> >                 ...
> >         }
> >
> > Cpufreq_governor_ext_ops defines all the functions that BPF programs can
> > implement customly.
> >
> > If you need to add a custom function, you only need to define it in this
> > struct.
> >
> > At the moment we have defined the basic functions.
> >
> > 1. unsigned long (*get_next_freq)(struct cpufreq_policy *policy)
> >
> >         Make decision how to adjust cpufreq here.
> >         The return value represents the CPU frequency that will be
> >         updated.
> >
> > 2. unsigned int (*get_sampling_rate)(struct cpufreq_policy *policy)
> >
> >         Make decision how to adjust sampling_rate here.
> >         The return value represents the governor samplint rate that
> >         will be updated.
> >
> > 3. unsigned int (*init)(void)
> >
> >         BPF governor init callback, return 0 means success.
> >
> > 4. void (*exit)(void)
> >
> >         BPF governor exit callback.
> >
> > 5. char name[CPUFREQ_EXT_NAME_LEN]
> >
> >         BPF governor name.
> >
> > The cpufreq_ext also add sysfs interface which refer to governor status.
> >
> > 1. ext/stat attribute:
> >
> >         Access to current BPF governor status.
> >
> >         # cat /sys/devices/system/cpu/cpufreq/ext/stat
> >         Stat: CPUFREQ_EXT_INIT
> >         BPF governor: performance
> >
> > There are number of constraints on the cpufreq_ext:
> >
> > 1. Only one ext governor can be registered at a time.
> >
> > 2. By default, it operates as a performance governor when no BPF
> >    governor is registered.
> >
> > 3. The cpufreq_ext governor must be selected before loading a BPF
> >    governor; otherwise, the installation of the BPF governor will fail.
> >
> > TODO
> > ----
> >
> > The current patch is a starting point, and future work will focus on
> > expanding its capabilities.
> >
> > I plan to leverage the BPF ecosystem to introduce innovative features,
> > such as real-time adjustments and optimizations based on system-wide
> > observations and analytics.
> >
> > And I am looking forward to any insights, critiques, or suggestions you
> > may have.
> >
> > Yipeng Zou (2):
> >   cpufreq_ext: Introduce cpufreq ext governor
> >   cpufreq_ext: Add bpf sample
> >
> >  drivers/cpufreq/Kconfig        |  23 ++
> >  drivers/cpufreq/Makefile       |   1 +
> >  drivers/cpufreq/cpufreq_ext.c  | 525 +++++++++++++++++++++++++++++++++
> >  samples/bpf/.gitignore         |   1 +
> >  samples/bpf/Makefile           |   8 +-
> >  samples/bpf/cpufreq_ext.bpf.c  | 113 +++++++
> >  samples/bpf/cpufreq_ext_user.c |  48 +++
> >  7 files changed, 718 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/cpufreq/cpufreq_ext.c
> >  create mode 100644 samples/bpf/cpufreq_ext.bpf.c
> >  create mode 100644 samples/bpf/cpufreq_ext_user.c
> >
> > --
> > 2.34.1
> >