diff mbox series

[v3,sysctl-next] bpf: move bpf sysctls from kernel/sysctl.c to bpf module

Message ID 20220302020412.128772-1-zhuyan34@huawei.com (mailing list archive)
State New, archived
Headers show
Series [v3,sysctl-next] bpf: move bpf sysctls from kernel/sysctl.c to bpf module | expand

Commit Message

Yan Zhu March 2, 2022, 2:04 a.m. UTC
We're moving sysctls out of kernel/sysctl.c as its a mess. We
already moved all filesystem sysctls out. And with time the goal is
to move all sysctls out to their own susbsystem/actual user.

kernel/sysctl.c has grown to an insane mess and its easy to run
into conflicts with it. The effort to move them out is part of this.

Signed-off-by: Yan Zhu <zhuyan34@huawei.com>

---
v1->v2:
  1.Added patch branch identifier sysctl-next.
  2.Re-describe the reason for the patch submission.

v2->v3:
  Re-describe the reason for the patch submission.
---
 kernel/bpf/syscall.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c      | 71 ----------------------------------------------
 2 files changed, 80 insertions(+), 71 deletions(-)

Comments

Luis Chamberlain March 2, 2022, 8:39 p.m. UTC | #1
On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote:
> We're moving sysctls out of kernel/sysctl.c as its a mess. We
> already moved all filesystem sysctls out. And with time the goal is
> to move all sysctls out to their own susbsystem/actual user.
> 
> kernel/sysctl.c has grown to an insane mess and its easy to run
> into conflicts with it. The effort to move them out is part of this.
> 
> Signed-off-by: Yan Zhu <zhuyan34@huawei.com>

Daniel, let me know if this makes more sense now, and if so I can
offer take it through sysctl-next to avoid conflicts more sysctl knobs
get moved out from kernel/sysctl.c.

  Luis
Daniel Borkmann March 3, 2022, 11:44 p.m. UTC | #2
On 3/2/22 9:39 PM, Luis Chamberlain wrote:
> On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote:
>> We're moving sysctls out of kernel/sysctl.c as its a mess. We
>> already moved all filesystem sysctls out. And with time the goal is
>> to move all sysctls out to their own susbsystem/actual user.
>>
>> kernel/sysctl.c has grown to an insane mess and its easy to run
>> into conflicts with it. The effort to move them out is part of this.
>>
>> Signed-off-by: Yan Zhu <zhuyan34@huawei.com>
> 
> Daniel, let me know if this makes more sense now, and if so I can
> offer take it through sysctl-next to avoid conflicts more sysctl knobs
> get moved out from kernel/sysctl.c.

If this is a whole ongoing effort rather than drive-by patch, then it's
fine with me. Btw, the patch itself should also drop the linux/bpf.h
include from kernel/sysctl.c since nothing else is using it after the
patch.

Btw, related to cleanups.. historically, we have a bunch of other knobs
for BPF under net (in net_core_table), that is:

   /proc/sys/net/core/bpf_jit_enable
   /proc/sys/net/core/bpf_jit_harden
   /proc/sys/net/core/bpf_jit_kallsyms
   /proc/sys/net/core/bpf_jit_limit

Would be nice to consolidate all under e.g. /proc/sys/kernel/bpf_* for
future going forward, and technically, they should be usable also w/o
net configured into kernel. Is there infra to point the sysctl knobs
e.g. under net/core/ to kernel/, or best way would be to have single
struct ctl_table and register for both?

Cheers,
Daniel
Luis Chamberlain March 4, 2022, 12:23 a.m. UTC | #3
On Fri, Mar 04, 2022 at 12:44:48AM +0100, Daniel Borkmann wrote:
> On 3/2/22 9:39 PM, Luis Chamberlain wrote:
> > On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote:
> > > We're moving sysctls out of kernel/sysctl.c as its a mess. We
> > > already moved all filesystem sysctls out. And with time the goal is
> > > to move all sysctls out to their own susbsystem/actual user.
> > > 
> > > kernel/sysctl.c has grown to an insane mess and its easy to run
> > > into conflicts with it. The effort to move them out is part of this.
> > > 
> > > Signed-off-by: Yan Zhu <zhuyan34@huawei.com>
> > 
> > Daniel, let me know if this makes more sense now, and if so I can
> > offer take it through sysctl-next to avoid conflicts more sysctl knobs
> > get moved out from kernel/sysctl.c.
> 
> If this is a whole ongoing effort rather than drive-by patch,

It is ongoing effort, but it will take many releases before we tidy
this whole thing up.

> then it's
> fine with me. 

OK great. Thanks for understanding the mess.

> Btw, the patch itself should also drop the linux/bpf.h
> include from kernel/sysctl.c since nothing else is using it after the
> patch.

I'll let Yan deal with that.

> Btw, related to cleanups.. historically, we have a bunch of other knobs
> for BPF under net (in net_core_table), that is:
> 
>   /proc/sys/net/core/bpf_jit_enable
>   /proc/sys/net/core/bpf_jit_harden
>   /proc/sys/net/core/bpf_jit_kallsyms
>   /proc/sys/net/core/bpf_jit_limit
> 
> Would be nice to consolidate all under e.g. /proc/sys/kernel/bpf_* for

Oh the actual "name" / directory location is not changing.
What changes is just where in code you declare them.

> future going forward, and technically, they should be usable also w/o
> net configured into kernel.

That's up to you, and just consider if you have scrupts using these
already. You may need backward compatibility. You don't need networking
to create the net directory for sysctls too. The first sysctl to create
the directory creates it, if its not created, it will be created.

> Is there infra to point the sysctl knobs
> e.g. under net/core/ to kernel/, or best way would be to have single
> struct ctl_table and register for both?

Try proc_symlink().

  Luis
Luis Chamberlain April 6, 2022, 10:41 p.m. UTC | #4
On Thu, Mar 03, 2022 at 04:23:26PM -0800, Luis Chamberlain wrote:
> On Fri, Mar 04, 2022 at 12:44:48AM +0100, Daniel Borkmann wrote:
> > On 3/2/22 9:39 PM, Luis Chamberlain wrote:
> > > On Wed, Mar 02, 2022 at 10:04:12AM +0800, Yan Zhu wrote:
> > > > We're moving sysctls out of kernel/sysctl.c as its a mess. We
> > > > already moved all filesystem sysctls out. And with time the goal is
> > > > to move all sysctls out to their own susbsystem/actual user.
> > > > 
> > > > kernel/sysctl.c has grown to an insane mess and its easy to run
> > > > into conflicts with it. The effort to move them out is part of this.
> > > > 
> > > > Signed-off-by: Yan Zhu <zhuyan34@huawei.com>
> > > 
> > > Daniel, let me know if this makes more sense now, and if so I can
> > > offer take it through sysctl-next to avoid conflicts more sysctl knobs
> > > get moved out from kernel/sysctl.c.
> > 
> > If this is a whole ongoing effort rather than drive-by patch,
> 
> It is ongoing effort, but it will take many releases before we tidy
> this whole thing up.
> 
> > then it's
> > fine with me. 
> 
> OK great. Thanks for understanding the mess.
> 
> > Btw, the patch itself should also drop the linux/bpf.h
> > include from kernel/sysctl.c since nothing else is using it after the
> > patch.
> 
> I'll let Yan deal with that.

Yan, feel free to resubmit based on sysctl-next [0].

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=sysctl-next

  Luis
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35646db3d950..50f85b47d478 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4888,3 +4888,83 @@  const struct bpf_verifier_ops bpf_syscall_verifier_ops = {
 const struct bpf_prog_ops bpf_syscall_prog_ops = {
 	.test_run = bpf_prog_test_run_syscall,
 };
+
+#ifdef CONFIG_SYSCTL
+static int bpf_stats_handler(struct ctl_table *table, int write,
+			     void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct static_key *key = (struct static_key *)table->data;
+	static int saved_val;
+	int val, ret;
+	struct ctl_table tmp = {
+		.data   = &val,
+		.maxlen = sizeof(val),
+		.mode   = table->mode,
+		.extra1 = SYSCTL_ZERO,
+		.extra2 = SYSCTL_ONE,
+	};
+
+	if (write && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	mutex_lock(&bpf_stats_enabled_mutex);
+	val = saved_val;
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+	if (write && !ret && val != saved_val) {
+		if (val)
+			static_key_slow_inc(key);
+		else
+			static_key_slow_dec(key);
+		saved_val = val;
+	}
+	mutex_unlock(&bpf_stats_enabled_mutex);
+	return ret;
+}
+
+static int bpf_unpriv_handler(struct ctl_table *table, int write,
+			      void *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret, unpriv_enable = *(int *)table->data;
+	bool locked_state = unpriv_enable == 1;
+	struct ctl_table tmp = *table;
+
+	if (write && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	tmp.data = &unpriv_enable;
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+	if (write && !ret) {
+		if (locked_state && unpriv_enable != 1)
+			return -EPERM;
+		*(int *)table->data = unpriv_enable;
+	}
+	return ret;
+}
+
+static struct ctl_table bpf_syscall_table[] = {
+	{
+		.procname	= "unprivileged_bpf_disabled",
+		.data		= &sysctl_unprivileged_bpf_disabled,
+		.maxlen		= sizeof(sysctl_unprivileged_bpf_disabled),
+		.mode		= 0644,
+		.proc_handler	= bpf_unpriv_handler,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_TWO,
+	},
+	{
+		.procname	= "bpf_stats_enabled",
+		.data		= &bpf_stats_enabled_key.key,
+		.maxlen		= sizeof(bpf_stats_enabled_key),
+		.mode		= 0644,
+		.proc_handler	= bpf_stats_handler,
+	},
+	{ }
+};
+
+static int __init bpf_syscall_sysctl_init(void)
+{
+	register_sysctl_init("kernel", bpf_syscall_table);
+	return 0;
+}
+late_initcall(bpf_syscall_sysctl_init);
+#endif /* CONFIG_SYSCTL */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ae5e59396b5d..c64db3755d9c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -146,59 +146,6 @@  static const int max_extfrag_threshold = 1000;
 
 #endif /* CONFIG_SYSCTL */
 
-#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SYSCTL)
-static int bpf_stats_handler(struct ctl_table *table, int write,
-			     void *buffer, size_t *lenp, loff_t *ppos)
-{
-	struct static_key *key = (struct static_key *)table->data;
-	static int saved_val;
-	int val, ret;
-	struct ctl_table tmp = {
-		.data   = &val,
-		.maxlen = sizeof(val),
-		.mode   = table->mode,
-		.extra1 = SYSCTL_ZERO,
-		.extra2 = SYSCTL_ONE,
-	};
-
-	if (write && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	mutex_lock(&bpf_stats_enabled_mutex);
-	val = saved_val;
-	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
-	if (write && !ret && val != saved_val) {
-		if (val)
-			static_key_slow_inc(key);
-		else
-			static_key_slow_dec(key);
-		saved_val = val;
-	}
-	mutex_unlock(&bpf_stats_enabled_mutex);
-	return ret;
-}
-
-static int bpf_unpriv_handler(struct ctl_table *table, int write,
-			      void *buffer, size_t *lenp, loff_t *ppos)
-{
-	int ret, unpriv_enable = *(int *)table->data;
-	bool locked_state = unpriv_enable == 1;
-	struct ctl_table tmp = *table;
-
-	if (write && !capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	tmp.data = &unpriv_enable;
-	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
-	if (write && !ret) {
-		if (locked_state && unpriv_enable != 1)
-			return -EPERM;
-		*(int *)table->data = unpriv_enable;
-	}
-	return ret;
-}
-#endif /* CONFIG_BPF_SYSCALL && CONFIG_SYSCTL */
-
 /*
  * /proc/sys support
  */
@@ -2188,24 +2135,6 @@  static struct ctl_table kern_table[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 #endif
-#ifdef CONFIG_BPF_SYSCALL
-	{
-		.procname	= "unprivileged_bpf_disabled",
-		.data		= &sysctl_unprivileged_bpf_disabled,
-		.maxlen		= sizeof(sysctl_unprivileged_bpf_disabled),
-		.mode		= 0644,
-		.proc_handler	= bpf_unpriv_handler,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_TWO,
-	},
-	{
-		.procname	= "bpf_stats_enabled",
-		.data		= &bpf_stats_enabled_key.key,
-		.maxlen		= sizeof(bpf_stats_enabled_key),
-		.mode		= 0644,
-		.proc_handler	= bpf_stats_handler,
-	},
-#endif
 #if defined(CONFIG_TREE_RCU)
 	{
 		.procname	= "panic_on_rcu_stall",