Message ID | 20240117024823.4186-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add bpf_iter_cpumask | expand |
On 1/16/24 6:48 PM, Yafang Shao wrote: > Add three new kfuncs for bpf_iter_cpumask. > - bpf_iter_cpumask_new > It is defined with KF_RCU_PROTECTED and KF_RCU. > KF_RCU_PROTECTED is defined because we must use it under the > protection of RCU. > KF_RCU is defined because the cpumask must be a RCU trusted pointer > such as task->cpus_ptr. I am not sure whether we need both or not. KF_RCU_PROTECTED means the function call needs within the rcu cs. KF_RCU means the argument usage needs within the rcu cs. We only need one of them (preferrably KF_RCU). > - bpf_iter_cpumask_next > - bpf_iter_cpumask_destroy > > These new kfuncs facilitate the iteration of percpu data, such as > runqueues, psi_cgroup_cpu, and more. > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > --- > kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 69 insertions(+) > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c > index 2e73533a3811..1840e48e6142 100644 > --- a/kernel/bpf/cpumask.c > +++ b/kernel/bpf/cpumask.c > @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) > return cpumask_weight(cpumask); > } > > +struct bpf_iter_cpumask { > + __u64 __opaque[2]; > +} __aligned(8); > + > +struct bpf_iter_cpumask_kern { > + const struct cpumask *mask; > + int cpu; > +} __aligned(8); > + > +/** > + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask > + * @it: The new bpf_iter_cpumask to be created. > + * @mask: The cpumask to be iterated over. > + * > + * This function initializes a new bpf_iter_cpumask structure for iterating over > + * the specified CPU mask. It assigns the provided cpumask to the newly created > + * bpf_iter_cpumask @it for subsequent iteration operations. > + * > + * On success, 0 is returen. On failure, ERR is returned. > + */ > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) > +{ > + struct bpf_iter_cpumask_kern *kit = (void *)it; > + > + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); > + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != > + __alignof__(struct bpf_iter_cpumask)); > + > + kit->mask = mask; > + kit->cpu = -1; > + return 0; > +} We have problem here. Let us say bpf_iter_cpumask_new() is called inside rcu cs. Once the control goes out of rcu cs, 'mask' could be freed, right? Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs as bpf_iter_cpumask_new(). But such a requirement seems odd. I think we can do things similar to bpf_iter_task_vma. You can allocate memory with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. This way, you do not need to worry about potential use-after-free issue. The memory can be freed with bpf_iter_cpumask_destroy(). > + > +/** > + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask > + * @it: The bpf_iter_cpumask > + * > + * This function retrieves a pointer to the number of the next CPU within the > + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the > + * cpumask. If there are no further CPUs available, it returns NULL. > + * > + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no > + * further CPUs. > + */ > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) > +{ > + struct bpf_iter_cpumask_kern *kit = (void *)it; > + const struct cpumask *mask = kit->mask; > + int cpu; > + > + cpu = cpumask_next(kit->cpu, mask); > + if (cpu >= nr_cpu_ids) > + return NULL; > + > + kit->cpu = cpu; > + return &kit->cpu; > +} > + > +/** > + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask > + * @it: The bpf_iter_cpumask to be destroyed. > + */ > +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) > +{ > +} > + > __bpf_kfunc_end_defs(); > > BTF_SET8_START(cpumask_kfunc_btf_ids) > @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) > BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) > BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) > BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) > +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU_PROTECTED | KF_RCU) > +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) > +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) > BTF_SET8_END(cpumask_kfunc_btf_ids) > > static const struct btf_kfunc_id_set cpumask_kfunc_set = {
Hi, On 1/19/2024 6:27 AM, Yonghong Song wrote: > > On 1/16/24 6:48 PM, Yafang Shao wrote: >> Add three new kfuncs for bpf_iter_cpumask. >> - bpf_iter_cpumask_new >> It is defined with KF_RCU_PROTECTED and KF_RCU. >> KF_RCU_PROTECTED is defined because we must use it under the >> protection of RCU. >> KF_RCU is defined because the cpumask must be a RCU trusted pointer >> such as task->cpus_ptr. > > I am not sure whether we need both or not. > > KF_RCU_PROTECTED means the function call needs within the rcu cs. > KF_RCU means the argument usage needs within the rcu cs. > We only need one of them (preferrably KF_RCU). > >> - bpf_iter_cpumask_next >> - bpf_iter_cpumask_destroy >> >> These new kfuncs facilitate the iteration of percpu data, such as >> runqueues, psi_cgroup_cpu, and more. >> >> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >> --- >> kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 69 insertions(+) >> >> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c >> index 2e73533a3811..1840e48e6142 100644 >> --- a/kernel/bpf/cpumask.c >> +++ b/kernel/bpf/cpumask.c >> @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct >> cpumask *cpumask) >> return cpumask_weight(cpumask); >> } >> +struct bpf_iter_cpumask { >> + __u64 __opaque[2]; >> +} __aligned(8); >> + >> +struct bpf_iter_cpumask_kern { >> + const struct cpumask *mask; >> + int cpu; >> +} __aligned(8); >> + >> +/** >> + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a >> specified cpumask >> + * @it: The new bpf_iter_cpumask to be created. >> + * @mask: The cpumask to be iterated over. >> + * >> + * This function initializes a new bpf_iter_cpumask structure for >> iterating over >> + * the specified CPU mask. It assigns the provided cpumask to the >> newly created >> + * bpf_iter_cpumask @it for subsequent iteration operations. >> + * >> + * On success, 0 is returen. On failure, ERR is returned. >> + */ >> +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, >> const struct cpumask *mask) >> +{ >> + struct bpf_iter_cpumask_kern *kit = (void *)it; >> + >> + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > >> sizeof(struct bpf_iter_cpumask)); >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != >> + __alignof__(struct bpf_iter_cpumask)); >> + >> + kit->mask = mask; >> + kit->cpu = -1; >> + return 0; >> +} > > We have problem here. Let us say bpf_iter_cpumask_new() is called > inside rcu cs. > Once the control goes out of rcu cs, 'mask' could be freed, right? > Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs > as bpf_iter_cpumask_new(). But such a requirement seems odd. So the case is possible when using bpf_iter_cpumask_new() and bpf_iter_cpumask_next() in sleepable program and these two kfuncs are used in two different rcu_read_lock/rcu_read_unlock code blocks, right ? > > I think we can do things similar to bpf_iter_task_vma. You can > allocate memory > with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. > This > way, you do not need to worry about potential use-after-free issue. > The memory can be freed with bpf_iter_cpumask_destroy(). > >> + >> +/** >> + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask >> + * @it: The bpf_iter_cpumask >> + * >> + * This function retrieves a pointer to the number of the next CPU >> within the >> + * specified bpf_iter_cpumask. It allows sequential access to CPUs >> within the >> + * cpumask. If there are no further CPUs available, it returns NULL. >> + * >> + * Returns a pointer to the number of the next CPU in the cpumask or >> NULL if no >> + * further CPUs. >> + */ >> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) >> +{ >> + struct bpf_iter_cpumask_kern *kit = (void *)it; >> + const struct cpumask *mask = kit->mask; >> + int cpu; >> + >> + cpu = cpumask_next(kit->cpu, mask); >> + if (cpu >= nr_cpu_ids) >> + return NULL; >> + >> + kit->cpu = cpu; >> + return &kit->cpu; >> +} >> + >> +/** >> + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask >> + * @it: The bpf_iter_cpumask to be destroyed. >> + */ >> +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) >> +{ >> +} >> + >> __bpf_kfunc_end_defs(); >> BTF_SET8_START(cpumask_kfunc_btf_ids) >> @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | >> KF_RCU_PROTECTED | KF_RCU) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) >> BTF_SET8_END(cpumask_kfunc_btf_ids) >> static const struct btf_kfunc_id_set cpumask_kfunc_set = { > > .
On 1/18/24 4:51 PM, Hou Tao wrote: > Hi, > > On 1/19/2024 6:27 AM, Yonghong Song wrote: >> On 1/16/24 6:48 PM, Yafang Shao wrote: >>> Add three new kfuncs for bpf_iter_cpumask. >>> - bpf_iter_cpumask_new >>> It is defined with KF_RCU_PROTECTED and KF_RCU. >>> KF_RCU_PROTECTED is defined because we must use it under the >>> protection of RCU. >>> KF_RCU is defined because the cpumask must be a RCU trusted pointer >>> such as task->cpus_ptr. >> I am not sure whether we need both or not. >> >> KF_RCU_PROTECTED means the function call needs within the rcu cs. >> KF_RCU means the argument usage needs within the rcu cs. >> We only need one of them (preferrably KF_RCU). >> >>> - bpf_iter_cpumask_next >>> - bpf_iter_cpumask_destroy >>> >>> These new kfuncs facilitate the iteration of percpu data, such as >>> runqueues, psi_cgroup_cpu, and more. >>> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> >>> --- >>> kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 69 insertions(+) >>> >>> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c >>> index 2e73533a3811..1840e48e6142 100644 >>> --- a/kernel/bpf/cpumask.c >>> +++ b/kernel/bpf/cpumask.c >>> @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct >>> cpumask *cpumask) >>> return cpumask_weight(cpumask); >>> } >>> +struct bpf_iter_cpumask { >>> + __u64 __opaque[2]; >>> +} __aligned(8); >>> + >>> +struct bpf_iter_cpumask_kern { >>> + const struct cpumask *mask; >>> + int cpu; >>> +} __aligned(8); >>> + >>> +/** >>> + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a >>> specified cpumask >>> + * @it: The new bpf_iter_cpumask to be created. >>> + * @mask: The cpumask to be iterated over. >>> + * >>> + * This function initializes a new bpf_iter_cpumask structure for >>> iterating over >>> + * the specified CPU mask. It assigns the provided cpumask to the >>> newly created >>> + * bpf_iter_cpumask @it for subsequent iteration operations. >>> + * >>> + * On success, 0 is returen. On failure, ERR is returned. >>> + */ >>> +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, >>> const struct cpumask *mask) >>> +{ >>> + struct bpf_iter_cpumask_kern *kit = (void *)it; >>> + >>> + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > >>> sizeof(struct bpf_iter_cpumask)); >>> + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != >>> + __alignof__(struct bpf_iter_cpumask)); >>> + >>> + kit->mask = mask; >>> + kit->cpu = -1; >>> + return 0; >>> +} >> We have problem here. Let us say bpf_iter_cpumask_new() is called >> inside rcu cs. >> Once the control goes out of rcu cs, 'mask' could be freed, right? >> Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs >> as bpf_iter_cpumask_new(). But such a requirement seems odd. > So the case is possible when using bpf_iter_cpumask_new() and > bpf_iter_cpumask_next() in sleepable program and these two kfuncs are > used in two different rcu_read_lock/rcu_read_unlock code blocks, right ? Right, or bpf_iter_cpumask_new() inside rcu cs and bpf_iter_cpumask_next() not. >> I think we can do things similar to bpf_iter_task_vma. You can >> allocate memory >> with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. >> This >> way, you do not need to worry about potential use-after-free issue. >> The memory can be freed with bpf_iter_cpumask_destroy(). >> >>> + >>> +/** >>> + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask >>> + * @it: The bpf_iter_cpumask >>> + * >>> + * This function retrieves a pointer to the number of the next CPU >>> within the >>> + * specified bpf_iter_cpumask. It allows sequential access to CPUs >>> within the >>> + * cpumask. If there are no further CPUs available, it returns NULL. >>> + * >>> + * Returns a pointer to the number of the next CPU in the cpumask or >>> NULL if no >>> + * further CPUs. >>> + */ >>> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) >>> +{ >>> + struct bpf_iter_cpumask_kern *kit = (void *)it; >>> + const struct cpumask *mask = kit->mask; >>> + int cpu; >>> + >>> + cpu = cpumask_next(kit->cpu, mask); >>> + if (cpu >= nr_cpu_ids) >>> + return NULL; >>> + >>> + kit->cpu = cpu; >>> + return &kit->cpu; >>> +} >>> + >>> +/** >>> + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask >>> + * @it: The bpf_iter_cpumask to be destroyed. >>> + */ >>> +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) >>> +{ >>> +} >>> + >>> __bpf_kfunc_end_defs(); >>> BTF_SET8_START(cpumask_kfunc_btf_ids) >>> @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) >>> BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) >>> BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) >>> BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) >>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | >>> KF_RCU_PROTECTED | KF_RCU) >>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) >>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) >>> BTF_SET8_END(cpumask_kfunc_btf_ids) >>> static const struct btf_kfunc_id_set cpumask_kfunc_set = { >> .
On Fri, Jan 19, 2024 at 6:27 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 1/16/24 6:48 PM, Yafang Shao wrote: > > Add three new kfuncs for bpf_iter_cpumask. > > - bpf_iter_cpumask_new > > It is defined with KF_RCU_PROTECTED and KF_RCU. > > KF_RCU_PROTECTED is defined because we must use it under the > > protection of RCU. > > KF_RCU is defined because the cpumask must be a RCU trusted pointer > > such as task->cpus_ptr. > > I am not sure whether we need both or not. > > KF_RCU_PROTECTED means the function call needs within the rcu cs. > KF_RCU means the argument usage needs within the rcu cs. > We only need one of them (preferrably KF_RCU). As you explained below, KF_RCU_PROTECTED is actually for bpf_iter_cpumask_next(). > > > - bpf_iter_cpumask_next > > - bpf_iter_cpumask_destroy > > > > These new kfuncs facilitate the iteration of percpu data, such as > > runqueues, psi_cgroup_cpu, and more. > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > --- > > kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 69 insertions(+) > > > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c > > index 2e73533a3811..1840e48e6142 100644 > > --- a/kernel/bpf/cpumask.c > > +++ b/kernel/bpf/cpumask.c > > @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) > > return cpumask_weight(cpumask); > > } > > > > +struct bpf_iter_cpumask { > > + __u64 __opaque[2]; > > +} __aligned(8); > > + > > +struct bpf_iter_cpumask_kern { > > + const struct cpumask *mask; > > + int cpu; > > +} __aligned(8); > > + > > +/** > > + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask > > + * @it: The new bpf_iter_cpumask to be created. > > + * @mask: The cpumask to be iterated over. > > + * > > + * This function initializes a new bpf_iter_cpumask structure for iterating over > > + * the specified CPU mask. It assigns the provided cpumask to the newly created > > + * bpf_iter_cpumask @it for subsequent iteration operations. > > + * > > + * On success, 0 is returen. On failure, ERR is returned. > > + */ > > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) > > +{ > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > + > > + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); > > + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != > > + __alignof__(struct bpf_iter_cpumask)); > > + > > + kit->mask = mask; > > + kit->cpu = -1; > > + return 0; > > +} > > We have problem here. Let us say bpf_iter_cpumask_new() is called inside rcu cs. > Once the control goes out of rcu cs, 'mask' could be freed, right? > Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs > as bpf_iter_cpumask_new(). But such a requirement seems odd. > > I think we can do things similar to bpf_iter_task_vma. You can allocate memory > with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. This > way, you do not need to worry about potential use-after-free issue. > The memory can be freed with bpf_iter_cpumask_destroy(). Good suggestion. That seems better. > > > + > > +/** > > + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask > > + * @it: The bpf_iter_cpumask > > + * > > + * This function retrieves a pointer to the number of the next CPU within the > > + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the > > + * cpumask. If there are no further CPUs available, it returns NULL. > > + * > > + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no > > + * further CPUs. > > + */ > > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) > > +{ > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > + const struct cpumask *mask = kit->mask; > > + int cpu; > > + > > + cpu = cpumask_next(kit->cpu, mask); > > + if (cpu >= nr_cpu_ids) > > + return NULL; > > + > > + kit->cpu = cpu; > > + return &kit->cpu; > > +} > > + > > +/** > > + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask > > + * @it: The bpf_iter_cpumask to be destroyed. > > + */ > > +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) > > +{ > > +} > > + > > __bpf_kfunc_end_defs(); > > > > BTF_SET8_START(cpumask_kfunc_btf_ids) > > @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU_PROTECTED | KF_RCU) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) > > BTF_SET8_END(cpumask_kfunc_btf_ids) > > > > static const struct btf_kfunc_id_set cpumask_kfunc_set = { -- Regards Yafang
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index 2e73533a3811..1840e48e6142 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) return cpumask_weight(cpumask); } +struct bpf_iter_cpumask { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_cpumask_kern { + const struct cpumask *mask; + int cpu; +} __aligned(8); + +/** + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask + * @it: The new bpf_iter_cpumask to be created. + * @mask: The cpumask to be iterated over. + * + * This function initializes a new bpf_iter_cpumask structure for iterating over + * the specified CPU mask. It assigns the provided cpumask to the newly created + * bpf_iter_cpumask @it for subsequent iteration operations. + * + * On success, 0 is returen. On failure, ERR is returned. + */ +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) +{ + struct bpf_iter_cpumask_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != + __alignof__(struct bpf_iter_cpumask)); + + kit->mask = mask; + kit->cpu = -1; + return 0; +} + +/** + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask + * @it: The bpf_iter_cpumask + * + * This function retrieves a pointer to the number of the next CPU within the + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the + * cpumask. If there are no further CPUs available, it returns NULL. + * + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no + * further CPUs. + */ +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) +{ + struct bpf_iter_cpumask_kern *kit = (void *)it; + const struct cpumask *mask = kit->mask; + int cpu; + + cpu = cpumask_next(kit->cpu, mask); + if (cpu >= nr_cpu_ids) + return NULL; + + kit->cpu = cpu; + return &kit->cpu; +} + +/** + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask + * @it: The bpf_iter_cpumask to be destroyed. + */ +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) +{ +} + __bpf_kfunc_end_defs(); BTF_SET8_START(cpumask_kfunc_btf_ids) @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU_PROTECTED | KF_RCU) +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) BTF_SET8_END(cpumask_kfunc_btf_ids) static const struct btf_kfunc_id_set cpumask_kfunc_set = {
Add three new kfuncs for bpf_iter_cpumask. - bpf_iter_cpumask_new It is defined with KF_RCU_PROTECTED and KF_RCU. KF_RCU_PROTECTED is defined because we must use it under the protection of RCU. KF_RCU is defined because the cpumask must be a RCU trusted pointer such as task->cpus_ptr. - bpf_iter_cpumask_next - bpf_iter_cpumask_destroy These new kfuncs facilitate the iteration of percpu data, such as runqueues, psi_cgroup_cpu, and more. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+)