Message ID | 20220302020412.128772-1-zhuyan34@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,sysctl-next] bpf: move bpf sysctls from kernel/sysctl.c to bpf module | expand |
On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote: > We're moving sysctls out of kernel/sysctl.c as its a mess. We > already moved all filesystem sysctls out. And with time the goal is > to move all sysctls out to their own susbsystem/actual user. > > kernel/sysctl.c has grown to an insane mess and its easy to run > into conflicts with it. The effort to move them out is part of this. > > Signed-off-by: Yan Zhu <zhuyan34@huawei.com> Daniel, let me know if this makes more sense now, and if so I can offer take it through sysctl-next to avoid conflicts more sysctl knobs get moved out from kernel/sysctl.c. Luis
On 3/2/22 9:39 PM, Luis Chamberlain wrote: > On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote: >> We're moving sysctls out of kernel/sysctl.c as its a mess. We >> already moved all filesystem sysctls out. And with time the goal is >> to move all sysctls out to their own susbsystem/actual user. >> >> kernel/sysctl.c has grown to an insane mess and its easy to run >> into conflicts with it. The effort to move them out is part of this. >> >> Signed-off-by: Yan Zhu <zhuyan34@huawei.com> > > Daniel, let me know if this makes more sense now, and if so I can > offer take it through sysctl-next to avoid conflicts more sysctl knobs > get moved out from kernel/sysctl.c. If this is a whole ongoing effort rather than drive-by patch, then it's fine with me. Btw, the patch itself should also drop the linux/bpf.h include from kernel/sysctl.c since nothing else is using it after the patch. Btw, related to cleanups.. historically, we have a bunch of other knobs for BPF under net (in net_core_table), that is: /proc/sys/net/core/bpf_jit_enable /proc/sys/net/core/bpf_jit_harden /proc/sys/net/core/bpf_jit_kallsyms /proc/sys/net/core/bpf_jit_limit Would be nice to consolidate all under e.g. /proc/sys/kernel/bpf_* for future going forward, and technically, they should be usable also w/o net configured into kernel. Is there infra to point the sysctl knobs e.g. under net/core/ to kernel/, or best way would be to have single struct ctl_table and register for both? Cheers, Daniel
On Fri, Mar 04, 2022 at 12:44:48AM +0100, Daniel Borkmann wrote: > On 3/2/22 9:39 PM, Luis Chamberlain wrote: > > On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote: > > > We're moving sysctls out of kernel/sysctl.c as its a mess. We > > > already moved all filesystem sysctls out. And with time the goal is > > > to move all sysctls out to their own susbsystem/actual user. > > > > > > kernel/sysctl.c has grown to an insane mess and its easy to run > > > into conflicts with it. The effort to move them out is part of this. > > > > > > Signed-off-by: Yan Zhu <zhuyan34@huawei.com> > > > > Daniel, let me know if this makes more sense now, and if so I can > > offer take it through sysctl-next to avoid conflicts more sysctl knobs > > get moved out from kernel/sysctl.c. > > If this is a whole ongoing effort rather than drive-by patch, It is ongoing effort, but it will take many releases before we tidy this whole thing up. > then it's > fine with me. OK great. Thanks for understanding the mess. > Btw, the patch itself should also drop the linux/bpf.h > include from kernel/sysctl.c since nothing else is using it after the > patch. I'll let Yan deal with that. > Btw, related to cleanups.. historically, we have a bunch of other knobs > for BPF under net (in net_core_table), that is: > > /proc/sys/net/core/bpf_jit_enable > /proc/sys/net/core/bpf_jit_harden > /proc/sys/net/core/bpf_jit_kallsyms > /proc/sys/net/core/bpf_jit_limit > > Would be nice to consolidate all under e.g. /proc/sys/kernel/bpf_* for Oh the actual "name" / directory location is not changing. What changes is just where in code you declare them. > future going forward, and technically, they should be usable also w/o > net configured into kernel. That's up to you, and just consider if you have scrupts using these already. You may need backward compatibility. You don't need networking to create the net directory for sysctls too. The first sysctl to create the directory creates it, if its not created, it will be created. > Is there infra to point the sysctl knobs > e.g. under net/core/ to kernel/, or best way would be to have single > struct ctl_table and register for both? Try proc_symlink(). Luis
On Thu, Mar 03, 2022 at 04:23:26PM -0800, Luis Chamberlain wrote: > On Fri, Mar 04, 2022 at 12:44:48AM +0100, Daniel Borkmann wrote: > > On 3/2/22 9:39 PM, Luis Chamberlain wrote: > > > On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote: > > > > We're moving sysctls out of kernel/sysctl.c as its a mess. We > > > > already moved all filesystem sysctls out. And with time the goal is > > > > to move all sysctls out to their own susbsystem/actual user. > > > > > > > > kernel/sysctl.c has grown to an insane mess and its easy to run > > > > into conflicts with it. The effort to move them out is part of this. > > > > > > > > Signed-off-by: Yan Zhu <zhuyan34@huawei.com> > > > > > > Daniel, let me know if this makes more sense now, and if so I can > > > offer take it through sysctl-next to avoid conflicts more sysctl knobs > > > get moved out from kernel/sysctl.c. > > > > If this is a whole ongoing effort rather than drive-by patch, > > It is ongoing effort, but it will take many releases before we tidy > this whole thing up. > > > then it's > > fine with me. > > OK great. Thanks for understanding the mess. > > > Btw, the patch itself should also drop the linux/bpf.h > > include from kernel/sysctl.c since nothing else is using it after the > > patch. > > I'll let Yan deal with that. Yan, feel free to resubmit based on sysctl-next [0]. [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=sysctl-next Luis
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 35646db3d950..50f85b47d478 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4888,3 +4888,83 @@ const struct bpf_verifier_ops bpf_syscall_verifier_ops = { const struct bpf_prog_ops bpf_syscall_prog_ops = { .test_run = bpf_prog_test_run_syscall, }; + +#ifdef CONFIG_SYSCTL +static int bpf_stats_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct static_key *key = (struct static_key *)table->data; + static int saved_val; + int val, ret; + struct ctl_table tmp = { + .data = &val, + .maxlen = sizeof(val), + .mode = table->mode, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + mutex_lock(&bpf_stats_enabled_mutex); + val = saved_val; + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); + if (write && !ret && val != saved_val) { + if (val) + static_key_slow_inc(key); + else + static_key_slow_dec(key); + saved_val = val; + } + mutex_unlock(&bpf_stats_enabled_mutex); + return ret; +} + +static int bpf_unpriv_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int ret, unpriv_enable = *(int *)table->data; + bool locked_state = unpriv_enable == 1; + struct ctl_table tmp = *table; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + tmp.data = &unpriv_enable; + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); + if (write && !ret) { + if (locked_state && unpriv_enable != 1) + return -EPERM; + *(int *)table->data = unpriv_enable; + } + return ret; +} + +static struct ctl_table bpf_syscall_table[] = { + { + .procname = "unprivileged_bpf_disabled", + .data = &sysctl_unprivileged_bpf_disabled, + .maxlen = sizeof(sysctl_unprivileged_bpf_disabled), + .mode = 0644, + .proc_handler = bpf_unpriv_handler, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + { + .procname = "bpf_stats_enabled", + .data = &bpf_stats_enabled_key.key, + .maxlen = sizeof(bpf_stats_enabled_key), + .mode = 0644, + .proc_handler = bpf_stats_handler, + }, + { } +}; + +static int __init bpf_syscall_sysctl_init(void) +{ + register_sysctl_init("kernel", bpf_syscall_table); + return 0; +} +late_initcall(bpf_syscall_sysctl_init); +#endif /* CONFIG_SYSCTL */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ae5e59396b5d..c64db3755d9c 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -146,59 +146,6 @@ static const int max_extfrag_threshold = 1000; #endif /* CONFIG_SYSCTL */ -#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SYSCTL) -static int bpf_stats_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - struct static_key *key = (struct static_key *)table->data; - static int saved_val; - int val, ret; - struct ctl_table tmp = { - .data = &val, - .maxlen = sizeof(val), - .mode = table->mode, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }; - - if (write && !capable(CAP_SYS_ADMIN)) - return -EPERM; - - mutex_lock(&bpf_stats_enabled_mutex); - val = saved_val; - ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); - if (write && !ret && val != saved_val) { - if (val) - static_key_slow_inc(key); - else - static_key_slow_dec(key); - saved_val = val; - } - mutex_unlock(&bpf_stats_enabled_mutex); - return ret; -} - -static int bpf_unpriv_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int ret, unpriv_enable = *(int *)table->data; - bool locked_state = unpriv_enable == 1; - struct ctl_table tmp = *table; - - if (write && !capable(CAP_SYS_ADMIN)) - return -EPERM; - - tmp.data = &unpriv_enable; - ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); - if (write && !ret) { - if (locked_state && unpriv_enable != 1) - return -EPERM; - *(int *)table->data = unpriv_enable; - } - return ret; -} -#endif /* CONFIG_BPF_SYSCALL && CONFIG_SYSCTL */ - /* * /proc/sys support */ @@ -2188,24 +2135,6 @@ static struct ctl_table kern_table[] = { .extra2 = SYSCTL_ONE, }, #endif -#ifdef CONFIG_BPF_SYSCALL - { - .procname = "unprivileged_bpf_disabled", - .data = &sysctl_unprivileged_bpf_disabled, - .maxlen = sizeof(sysctl_unprivileged_bpf_disabled), - .mode = 0644, - .proc_handler = bpf_unpriv_handler, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_TWO, - }, - { - .procname = "bpf_stats_enabled", - .data = &bpf_stats_enabled_key.key, - .maxlen = sizeof(bpf_stats_enabled_key), - .mode = 0644, - .proc_handler = bpf_stats_handler, - }, -#endif #if defined(CONFIG_TREE_RCU) { .procname = "panic_on_rcu_stall",
We're moving sysctls out of kernel/sysctl.c as its a mess. We already moved all filesystem sysctls out. And with time the goal is to move all sysctls out to their own susbsystem/actual user. kernel/sysctl.c has grown to an insane mess and its easy to run into conflicts with it. The effort to move them out is part of this. Signed-off-by: Yan Zhu <zhuyan34@huawei.com> --- v1->v2: 1.Added patch branch identifier sysctl-next. 2.Re-describe the reason for the patch submission. v2->v3: Re-describe the reason for the patch submission. --- kernel/bpf/syscall.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 71 ---------------------------------------------- 2 files changed, 80 insertions(+), 71 deletions(-)