diff mbox series

linux-next: manual merge of the vfs tree with the parisc-hd tree

Message ID 20200511111123.68ccbaa3@canb.auug.org.au (mailing list archive)
State Awaiting Upstream
Headers show
Series linux-next: manual merge of the vfs tree with the parisc-hd tree | expand

Commit Message

Stephen Rothwell May 11, 2020, 1:11 a.m. UTC
Hi all,

Today's linux-next merge of the vfs tree got a conflict in:

  kernel/sysctl.c

between commit:

  b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")

from the parisc-hd tree and commit:

  f461d2dcd511 ("sysctl: avoid forward declarations")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

Comments

Xiaoming Ni May 11, 2020, 1:55 a.m. UTC | #1
On 2020/5/11 9:11, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the vfs tree got a conflict in:
> 
>    kernel/sysctl.c
> 
> between commit:
> 
>    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
> 
> from the parisc-hd tree and commit:
> 
>    f461d2dcd511 ("sysctl: avoid forward declarations")
> 
> from the vfs tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 


Kernel/sysctl.c contains more than 190 interface files, and there are a 
large number of config macro controls. When modifying the sysctl 
interface directly in kernel/sysctl.c , conflicts are very easy to occur.

At the same time, the register_sysctl_table() provided by the system can 
easily add the sysctl interface, and there is no conflict of 
kernel/sysctl.c .

Should we add instructions in the patch guide (coding-style.rst 
submitting-patches.rst):
Preferentially use register_sysctl_table() to add a new sysctl 
interface, centralize feature codes, and avoid directly modifying 
kernel/sysctl.c ?

In addition, is it necessary to transfer the architecture-related sysctl 
interface to arch/xxx/kernel/sysctl.c ?

Thanks
Xiaoming Ni
Luis Chamberlain May 12, 2020, 12:33 a.m. UTC | #2
On Mon, May 11, 2020 at 09:55:16AM +0800, Xiaoming Ni wrote:
> On 2020/5/11 9:11, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the vfs tree got a conflict in:
> > 
> >    kernel/sysctl.c
> > 
> > between commit:
> > 
> >    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
> > 
> > from the parisc-hd tree and commit:
> > 
> >    f461d2dcd511 ("sysctl: avoid forward declarations")
> > 
> > from the vfs tree.
> > 
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> > 
> 
> 
> Kernel/sysctl.c contains more than 190 interface files, and there are a
> large number of config macro controls. When modifying the sysctl interface
> directly in kernel/sysctl.c , conflicts are very easy to occur.
> 
> At the same time, the register_sysctl_table() provided by the system can
> easily add the sysctl interface, and there is no conflict of kernel/sysctl.c
> .
> 
> Should we add instructions in the patch guide (coding-style.rst
> submitting-patches.rst):
> Preferentially use register_sysctl_table() to add a new sysctl interface,
> centralize feature codes, and avoid directly modifying kernel/sysctl.c ?

Yes, however I don't think folks know how to do this well. So I think we
just have to do at least start ourselves, and then reflect some of this
in the docs.  The reason that this can be not easy is that we need to
ensure that at an init level we haven't busted dependencies on setting
this. We also just don't have docs on how to do this well.

> In addition, is it necessary to transfer the architecture-related sysctl
> interface to arch/xxx/kernel/sysctl.c ?

Well here's an initial attempt to start with fs stuff in a very
conservative way. What do folks think?

 fs/proc/Makefile          |  1 +
 fs/proc/fs_sysctl_table.c | 97 +++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c           | 48 -------------------
 3 files changed, 98 insertions(+), 48 deletions(-)
 create mode 100644 fs/proc/fs_sysctl_table.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index bd08616ed8ba..8bf419b2ac7d 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -28,6 +28,7 @@ proc-y	+= namespaces.o
 proc-y	+= self.o
 proc-y	+= thread_self.o
 proc-$(CONFIG_PROC_SYSCTL)	+= proc_sysctl.o
+proc-$(CONFIG_SYSCTL)		+= fs_sysctl_table.o
 proc-$(CONFIG_NET)		+= proc_net.o
 proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
diff --git a/fs/proc/fs_sysctl_table.c b/fs/proc/fs_sysctl_table.c
new file mode 100644
index 000000000000..f56a49989872
--- /dev/null
+++ b/fs/proc/fs_sysctl_table.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * /proc/sys/fs sysctl table
+ */
+#include <linux/init.h>
+#include <linux/sysctl.h>
+#include <linux/poll.h>
+#include <linux/proc_fs.h>
+#include <linux/printk.h>
+#include <linux/security.h>
+#include <linux/sched.h>
+#include <linux/cred.h>
+#include <linux/namei.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/bpf-cgroup.h>
+#include <linux/mount.h>
+#include <linux/dnotify.h>
+#include <linux/pipe_fs_i.h>
+#include <linux/aio.h>
+#include <linux/inotify.h>
+#include <linux/kmemleak.h>
+#include <linux/binfmts.h>
+
+static unsigned long zero_ul;
+static unsigned long long_max = LONG_MAX;
+
+static struct ctl_table fs_table[] = {
+	{
+		.procname	= "inode-nr",
+		.data		= &inodes_stat,
+		.maxlen		= 2*sizeof(long),
+		.mode		= 0444,
+		.proc_handler	= proc_nr_inodes,
+	},
+	{
+		.procname	= "inode-state",
+		.data		= &inodes_stat,
+		.maxlen		= 7*sizeof(long),
+		.mode		= 0444,
+		.proc_handler	= proc_nr_inodes,
+	},
+	{
+		.procname	= "file-nr",
+		.data		= &files_stat,
+		.maxlen		= sizeof(files_stat),
+		.mode		= 0444,
+		.proc_handler	= proc_nr_files,
+	},
+	{
+		.procname	= "file-max",
+		.data		= &files_stat.max_files,
+		.maxlen		= sizeof(files_stat.max_files),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+		.extra1		= &zero_ul,
+		.extra2		= &long_max,
+	},
+	{
+		.procname	= "nr_open",
+		.data		= &sysctl_nr_open,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &sysctl_nr_open_min,
+		.extra2		= &sysctl_nr_open_max,
+	},
+	{
+		.procname	= "dentry-state",
+		.data		= &dentry_stat,
+		.maxlen		= 6*sizeof(long),
+		.mode		= 0444,
+		.proc_handler	= proc_nr_dentry,
+	},
+	{ }
+};
+
+static struct ctl_table fs_base_table[] = {
+	{
+		.procname	= "fs",
+		.mode		= 0555,
+		.child		= fs_table,
+	},
+	{ }
+};
+
+static int __init fs_procsys_init(void)
+{
+	struct ctl_table_header *hdr;
+
+	hdr = register_sysctl_table(fs_base_table);
+	kmemleak_not_leak(hdr);
+
+	return 0;
+}
+
+early_initcall(fs_procsys_init);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3b0cecf57e79..6669d6118974 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -114,9 +114,7 @@ static int sixty = 60;
 static int __maybe_unused neg_one = -1;
 static int __maybe_unused two = 2;
 static int __maybe_unused four = 4;
-static unsigned long zero_ul;
 static unsigned long one_ul = 1;
-static unsigned long long_max = LONG_MAX;
 static int one_hundred = 100;
 static int one_thousand = 1000;
 #ifdef CONFIG_PRINTK
@@ -3087,52 +3085,6 @@ static struct ctl_table vm_table[] = {
 };
 
 static struct ctl_table fs_table[] = {
-	{
-		.procname	= "inode-nr",
-		.data		= &inodes_stat,
-		.maxlen		= 2*sizeof(long),
-		.mode		= 0444,
-		.proc_handler	= proc_nr_inodes,
-	},
-	{
-		.procname	= "inode-state",
-		.data		= &inodes_stat,
-		.maxlen		= 7*sizeof(long),
-		.mode		= 0444,
-		.proc_handler	= proc_nr_inodes,
-	},
-	{
-		.procname	= "file-nr",
-		.data		= &files_stat,
-		.maxlen		= sizeof(files_stat),
-		.mode		= 0444,
-		.proc_handler	= proc_nr_files,
-	},
-	{
-		.procname	= "file-max",
-		.data		= &files_stat.max_files,
-		.maxlen		= sizeof(files_stat.max_files),
-		.mode		= 0644,
-		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &zero_ul,
-		.extra2		= &long_max,
-	},
-	{
-		.procname	= "nr_open",
-		.data		= &sysctl_nr_open,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &sysctl_nr_open_min,
-		.extra2		= &sysctl_nr_open_max,
-	},
-	{
-		.procname	= "dentry-state",
-		.data		= &dentry_stat,
-		.maxlen		= 6*sizeof(long),
-		.mode		= 0444,
-		.proc_handler	= proc_nr_dentry,
-	},
 	{
 		.procname	= "overflowuid",
 		.data		= &fs_overflowuid,
Kees Cook May 12, 2020, 5:22 a.m. UTC | #3
On Tue, May 12, 2020 at 12:33:05AM +0000, Luis Chamberlain wrote:
> On Mon, May 11, 2020 at 09:55:16AM +0800, Xiaoming Ni wrote:
> > On 2020/5/11 9:11, Stephen Rothwell wrote:
> > > Hi all,
> > > 
> > > Today's linux-next merge of the vfs tree got a conflict in:
> > > 
> > >    kernel/sysctl.c
> > > 
> > > between commit:
> > > 
> > >    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
> > > 
> > > from the parisc-hd tree and commit:
> > > 
> > >    f461d2dcd511 ("sysctl: avoid forward declarations")
> > > 
> > > from the vfs tree.
> > > 
> > > I fixed it up (see below) and can carry the fix as necessary. This
> > > is now fixed as far as linux-next is concerned, but any non trivial
> > > conflicts should be mentioned to your upstream maintainer when your tree
> > > is submitted for merging.  You may also want to consider cooperating
> > > with the maintainer of the conflicting tree to minimise any particularly
> > > complex conflicts.
> > > 
> > 
> > 
> > Kernel/sysctl.c contains more than 190 interface files, and there are a
> > large number of config macro controls. When modifying the sysctl interface
> > directly in kernel/sysctl.c , conflicts are very easy to occur.
> > 
> > At the same time, the register_sysctl_table() provided by the system can
> > easily add the sysctl interface, and there is no conflict of kernel/sysctl.c
> > .
> > 
> > Should we add instructions in the patch guide (coding-style.rst
> > submitting-patches.rst):
> > Preferentially use register_sysctl_table() to add a new sysctl interface,
> > centralize feature codes, and avoid directly modifying kernel/sysctl.c ?
> 
> Yes, however I don't think folks know how to do this well. So I think we
> just have to do at least start ourselves, and then reflect some of this
> in the docs.  The reason that this can be not easy is that we need to
> ensure that at an init level we haven't busted dependencies on setting
> this. We also just don't have docs on how to do this well.
> 
> > In addition, is it necessary to transfer the architecture-related sysctl
> > interface to arch/xxx/kernel/sysctl.c ?
> 
> Well here's an initial attempt to start with fs stuff in a very
> conservative way. What do folks think?
> 
> [...]
> +static unsigned long zero_ul;
> +static unsigned long long_max = LONG_MAX;

I think it'd be nice to keep these in one place for others to reuse,
though that means making them non-static. (And now that I look at them,
I thought they were supposed to be const?)
Luis Chamberlain May 12, 2020, 5:44 a.m. UTC | #4
On Mon, May 11, 2020 at 10:22:04PM -0700, Kees Cook wrote:
> On Tue, May 12, 2020 at 12:33:05AM +0000, Luis Chamberlain wrote:
> > On Mon, May 11, 2020 at 09:55:16AM +0800, Xiaoming Ni wrote:
> > > On 2020/5/11 9:11, Stephen Rothwell wrote:
> > > > Hi all,
> > > > 
> > > > Today's linux-next merge of the vfs tree got a conflict in:
> > > > 
> > > >    kernel/sysctl.c
> > > > 
> > > > between commit:
> > > > 
> > > >    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
> > > > 
> > > > from the parisc-hd tree and commit:
> > > > 
> > > >    f461d2dcd511 ("sysctl: avoid forward declarations")
> > > > 
> > > > from the vfs tree.
> > > > 
> > > > I fixed it up (see below) and can carry the fix as necessary. This
> > > > is now fixed as far as linux-next is concerned, but any non trivial
> > > > conflicts should be mentioned to your upstream maintainer when your tree
> > > > is submitted for merging.  You may also want to consider cooperating
> > > > with the maintainer of the conflicting tree to minimise any particularly
> > > > complex conflicts.
> > > > 
> > > 
> > > 
> > > Kernel/sysctl.c contains more than 190 interface files, and there are a
> > > large number of config macro controls. When modifying the sysctl interface
> > > directly in kernel/sysctl.c , conflicts are very easy to occur.
> > > 
> > > At the same time, the register_sysctl_table() provided by the system can
> > > easily add the sysctl interface, and there is no conflict of kernel/sysctl.c
> > > .
> > > 
> > > Should we add instructions in the patch guide (coding-style.rst
> > > submitting-patches.rst):
> > > Preferentially use register_sysctl_table() to add a new sysctl interface,
> > > centralize feature codes, and avoid directly modifying kernel/sysctl.c ?
> > 
> > Yes, however I don't think folks know how to do this well. So I think we
> > just have to do at least start ourselves, and then reflect some of this
> > in the docs.  The reason that this can be not easy is that we need to
> > ensure that at an init level we haven't busted dependencies on setting
> > this. We also just don't have docs on how to do this well.
> > 
> > > In addition, is it necessary to transfer the architecture-related sysctl
> > > interface to arch/xxx/kernel/sysctl.c ?
> > 
> > Well here's an initial attempt to start with fs stuff in a very
> > conservative way. What do folks think?
> > 
> > [...]
> > +static unsigned long zero_ul;
> > +static unsigned long long_max = LONG_MAX;
> 
> I think it'd be nice to keep these in one place for others to reuse,
> though that means making them non-static. (And now that I look at them,
> I thought they were supposed to be const?)

So much spring cleaning to do. I can add the const and share it.
It seems odd to stuff this into a sysctl.h, types.h doesn't seem
right... I can't think of something proper, so I'll just move them
to sysctl.h for now.

Any thought on the approach though? I mean, I realize that this will
require more of the subsystem specific folks to look at the code and
review, but if this seems fair, I'll get the ball rolling.

  Luis
Eric W. Biederman May 12, 2020, 11:52 a.m. UTC | #5
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Mon, May 11, 2020 at 09:55:16AM +0800, Xiaoming Ni wrote:
>> On 2020/5/11 9:11, Stephen Rothwell wrote:
>> > Hi all,
>> > 
>> > Today's linux-next merge of the vfs tree got a conflict in:
>> > 
>> >    kernel/sysctl.c
>> > 
>> > between commit:
>> > 
>> >    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
>> > 
>> > from the parisc-hd tree and commit:
>> > 
>> >    f461d2dcd511 ("sysctl: avoid forward declarations")
>> > 
>> > from the vfs tree.
>> > 
>> > I fixed it up (see below) and can carry the fix as necessary. This
>> > is now fixed as far as linux-next is concerned, but any non trivial
>> > conflicts should be mentioned to your upstream maintainer when your tree
>> > is submitted for merging.  You may also want to consider cooperating
>> > with the maintainer of the conflicting tree to minimise any particularly
>> > complex conflicts.
>> > 
>> 
>> 
>> Kernel/sysctl.c contains more than 190 interface files, and there are a
>> large number of config macro controls. When modifying the sysctl interface
>> directly in kernel/sysctl.c , conflicts are very easy to occur.
>> 
>> At the same time, the register_sysctl_table() provided by the system can
>> easily add the sysctl interface, and there is no conflict of kernel/sysctl.c
>> .
>> 
>> Should we add instructions in the patch guide (coding-style.rst
>> submitting-patches.rst):
>> Preferentially use register_sysctl_table() to add a new sysctl interface,
>> centralize feature codes, and avoid directly modifying kernel/sysctl.c ?
>
> Yes, however I don't think folks know how to do this well. So I think we
> just have to do at least start ourselves, and then reflect some of this
> in the docs.  The reason that this can be not easy is that we need to
> ensure that at an init level we haven't busted dependencies on setting
> this. We also just don't have docs on how to do this well.
>
>> In addition, is it necessary to transfer the architecture-related sysctl
>> interface to arch/xxx/kernel/sysctl.c ?


>
> Well here's an initial attempt to start with fs stuff in a very
> conservative way. What do folks think?

I don't see how any of that deals with the current conflict in -next.

You are putting the fs sysctls in the wrong place.  The should live
in fs/ not in fs/proc/.  Otherwise you are pretty much repeating
the problem the problem of poorly located code in another location.


>  fs/proc/Makefile          |  1 +
>  fs/proc/fs_sysctl_table.c | 97 +++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c           | 48 -------------------
>  3 files changed, 98 insertions(+), 48 deletions(-)
>  create mode 100644 fs/proc/fs_sysctl_table.c
>
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index bd08616ed8ba..8bf419b2ac7d 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -28,6 +28,7 @@ proc-y	+= namespaces.o
>  proc-y	+= self.o
>  proc-y	+= thread_self.o
>  proc-$(CONFIG_PROC_SYSCTL)	+= proc_sysctl.o
> +proc-$(CONFIG_SYSCTL)		+= fs_sysctl_table.o
>  proc-$(CONFIG_NET)		+= proc_net.o
>  proc-$(CONFIG_PROC_KCORE)	+= kcore.o
>  proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
> diff --git a/fs/proc/fs_sysctl_table.c b/fs/proc/fs_sysctl_table.c
> new file mode 100644
> index 000000000000..f56a49989872
> --- /dev/null
> +++ b/fs/proc/fs_sysctl_table.c
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * /proc/sys/fs sysctl table
> + */
> +#include <linux/init.h>
> +#include <linux/sysctl.h>
> +#include <linux/poll.h>
> +#include <linux/proc_fs.h>
> +#include <linux/printk.h>
> +#include <linux/security.h>
> +#include <linux/sched.h>
> +#include <linux/cred.h>
> +#include <linux/namei.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/bpf-cgroup.h>
> +#include <linux/mount.h>
> +#include <linux/dnotify.h>
> +#include <linux/pipe_fs_i.h>
> +#include <linux/aio.h>
> +#include <linux/inotify.h>
> +#include <linux/kmemleak.h>
> +#include <linux/binfmts.h>
> +
> +static unsigned long zero_ul;
> +static unsigned long long_max = LONG_MAX;
> +
> +static struct ctl_table fs_table[] = {
> +	{
> +		.procname	= "inode-nr",
> +		.data		= &inodes_stat,
> +		.maxlen		= 2*sizeof(long),
> +		.mode		= 0444,
> +		.proc_handler	= proc_nr_inodes,
> +	},
> +	{
> +		.procname	= "inode-state",
> +		.data		= &inodes_stat,
> +		.maxlen		= 7*sizeof(long),
> +		.mode		= 0444,
> +		.proc_handler	= proc_nr_inodes,
> +	},
> +	{
> +		.procname	= "file-nr",
> +		.data		= &files_stat,
> +		.maxlen		= sizeof(files_stat),
> +		.mode		= 0444,
> +		.proc_handler	= proc_nr_files,
> +	},
> +	{
> +		.procname	= "file-max",
> +		.data		= &files_stat.max_files,
> +		.maxlen		= sizeof(files_stat.max_files),
> +		.mode		= 0644,
> +		.proc_handler	= proc_doulongvec_minmax,
> +		.extra1		= &zero_ul,
> +		.extra2		= &long_max,
> +	},
> +	{
> +		.procname	= "nr_open",
> +		.data		= &sysctl_nr_open,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &sysctl_nr_open_min,
> +		.extra2		= &sysctl_nr_open_max,
> +	},
> +	{
> +		.procname	= "dentry-state",
> +		.data		= &dentry_stat,
> +		.maxlen		= 6*sizeof(long),
> +		.mode		= 0444,
> +		.proc_handler	= proc_nr_dentry,
> +	},
> +	{ }
> +};
> +
> +static struct ctl_table fs_base_table[] = {
> +	{
> +		.procname	= "fs",
> +		.mode		= 0555,
> +		.child		= fs_table,
> +	},
> +	{ }
> +};
  ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
  
> > +static int __init fs_procsys_init(void)
> +{
> +	struct ctl_table_header *hdr;
> +
> +	hdr = register_sysctl_table(fs_base_table);
              ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
	AKA
        hdr = register_sysctl("fs", fs_table);
> +	kmemleak_not_leak(hdr);
> +
> +	return 0;
> +}
> +
> +early_initcall(fs_procsys_init);
Luis Chamberlain May 12, 2020, 5:24 p.m. UTC | #6
On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain <mcgrof@kernel.org> writes:
> 
> > On Mon, May 11, 2020 at 09:55:16AM +0800, Xiaoming Ni wrote:
> >> On 2020/5/11 9:11, Stephen Rothwell wrote:
> >> > Hi all,
> >> > 
> >> > Today's linux-next merge of the vfs tree got a conflict in:
> >> > 
> >> >    kernel/sysctl.c
> >> > 
> >> > between commit:
> >> > 
> >> >    b6522fa409cf ("parisc: add sysctl file interface panic_on_stackoverflow")
> >> > 
> >> > from the parisc-hd tree and commit:
> >> > 
> >> >    f461d2dcd511 ("sysctl: avoid forward declarations")
> >> > 
> >> > from the vfs tree.
> >> > 
> >> > I fixed it up (see below) and can carry the fix as necessary. This
> >> > is now fixed as far as linux-next is concerned, but any non trivial
> >> > conflicts should be mentioned to your upstream maintainer when your tree
> >> > is submitted for merging.  You may also want to consider cooperating
> >> > with the maintainer of the conflicting tree to minimise any particularly
> >> > complex conflicts.
> >> > 
> >> 
> >> 
> >> Kernel/sysctl.c contains more than 190 interface files, and there are a
> >> large number of config macro controls. When modifying the sysctl interface
> >> directly in kernel/sysctl.c , conflicts are very easy to occur.
> >> 
> >> At the same time, the register_sysctl_table() provided by the system can
> >> easily add the sysctl interface, and there is no conflict of kernel/sysctl.c
> >> .
> >> 
> >> Should we add instructions in the patch guide (coding-style.rst
> >> submitting-patches.rst):
> >> Preferentially use register_sysctl_table() to add a new sysctl interface,
> >> centralize feature codes, and avoid directly modifying kernel/sysctl.c ?
> >
> > Yes, however I don't think folks know how to do this well. So I think we
> > just have to do at least start ourselves, and then reflect some of this
> > in the docs.  The reason that this can be not easy is that we need to
> > ensure that at an init level we haven't busted dependencies on setting
> > this. We also just don't have docs on how to do this well.
> >
> >> In addition, is it necessary to transfer the architecture-related sysctl
> >> interface to arch/xxx/kernel/sysctl.c ?
> 
> 
> >
> > Well here's an initial attempt to start with fs stuff in a very
> > conservative way. What do folks think?
> 
> I don't see how any of that deals with the current conflict in -next.

The point is to cleanup the kitchen sink full of knobs everyone from
different subsystem has put in place for random things so to reduce
the amount of edits on the file, so to then avoid the possibility
of merge conflicts.

> You are putting the fs sysctls in the wrong place.  The should live
> in fs/ not in fs/proc/.

That's an easy fix, sure, I'll do that.

> Otherwise you are pretty much repeating
> the problem the problem of poorly located code in another location.

Sure, alright, well I'll chug on with trying to clean up the kitchen
sink. We can decide where we put items during review.

> >  fs/proc/Makefile          |  1 +
> >  fs/proc/fs_sysctl_table.c | 97 +++++++++++++++++++++++++++++++++++++++
> >  kernel/sysctl.c           | 48 -------------------
> >  3 files changed, 98 insertions(+), 48 deletions(-)
> >  create mode 100644 fs/proc/fs_sysctl_table.c
> >
> > diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> > index bd08616ed8ba..8bf419b2ac7d 100644
> > --- a/fs/proc/Makefile
> > +++ b/fs/proc/Makefile
> > @@ -28,6 +28,7 @@ proc-y	+= namespaces.o
> >  proc-y	+= self.o
> >  proc-y	+= thread_self.o
> >  proc-$(CONFIG_PROC_SYSCTL)	+= proc_sysctl.o
> > +proc-$(CONFIG_SYSCTL)		+= fs_sysctl_table.o
> >  proc-$(CONFIG_NET)		+= proc_net.o
> >  proc-$(CONFIG_PROC_KCORE)	+= kcore.o
> >  proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
> > diff --git a/fs/proc/fs_sysctl_table.c b/fs/proc/fs_sysctl_table.c
> > new file mode 100644
> > index 000000000000..f56a49989872
> > --- /dev/null
> > +++ b/fs/proc/fs_sysctl_table.c
> > @@ -0,0 +1,97 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * /proc/sys/fs sysctl table
> > + */
> > +#include <linux/init.h>
> > +#include <linux/sysctl.h>
> > +#include <linux/poll.h>
> > +#include <linux/proc_fs.h>
> > +#include <linux/printk.h>
> > +#include <linux/security.h>
> > +#include <linux/sched.h>
> > +#include <linux/cred.h>
> > +#include <linux/namei.h>
> > +#include <linux/mm.h>
> > +#include <linux/module.h>
> > +#include <linux/bpf-cgroup.h>
> > +#include <linux/mount.h>
> > +#include <linux/dnotify.h>
> > +#include <linux/pipe_fs_i.h>
> > +#include <linux/aio.h>
> > +#include <linux/inotify.h>
> > +#include <linux/kmemleak.h>
> > +#include <linux/binfmts.h>
> > +
> > +static unsigned long zero_ul;
> > +static unsigned long long_max = LONG_MAX;
> > +
> > +static struct ctl_table fs_table[] = {
> > +	{
> > +		.procname	= "inode-nr",
> > +		.data		= &inodes_stat,
> > +		.maxlen		= 2*sizeof(long),
> > +		.mode		= 0444,
> > +		.proc_handler	= proc_nr_inodes,
> > +	},
> > +	{
> > +		.procname	= "inode-state",
> > +		.data		= &inodes_stat,
> > +		.maxlen		= 7*sizeof(long),
> > +		.mode		= 0444,
> > +		.proc_handler	= proc_nr_inodes,
> > +	},
> > +	{
> > +		.procname	= "file-nr",
> > +		.data		= &files_stat,
> > +		.maxlen		= sizeof(files_stat),
> > +		.mode		= 0444,
> > +		.proc_handler	= proc_nr_files,
> > +	},
> > +	{
> > +		.procname	= "file-max",
> > +		.data		= &files_stat.max_files,
> > +		.maxlen		= sizeof(files_stat.max_files),
> > +		.mode		= 0644,
> > +		.proc_handler	= proc_doulongvec_minmax,
> > +		.extra1		= &zero_ul,
> > +		.extra2		= &long_max,
> > +	},
> > +	{
> > +		.procname	= "nr_open",
> > +		.data		= &sysctl_nr_open,
> > +		.maxlen		= sizeof(unsigned int),
> > +		.mode		= 0644,
> > +		.proc_handler	= proc_dointvec_minmax,
> > +		.extra1		= &sysctl_nr_open_min,
> > +		.extra2		= &sysctl_nr_open_max,
> > +	},
> > +	{
> > +		.procname	= "dentry-state",
> > +		.data		= &dentry_stat,
> > +		.maxlen		= 6*sizeof(long),
> > +		.mode		= 0444,
> > +		.proc_handler	= proc_nr_dentry,
> > +	},
> > +	{ }
> > +};
> > +
> > +static struct ctl_table fs_base_table[] = {
> > +	{
> > +		.procname	= "fs",
> > +		.mode		= 0555,
> > +		.child		= fs_table,
> > +	},
> > +	{ }
> > +};
>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> > > +static int __init fs_procsys_init(void)
> > +{
> > +	struct ctl_table_header *hdr;
> > +
> > +	hdr = register_sysctl_table(fs_base_table);
>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
> 	AKA
>         hdr = register_sysctl("fs", fs_table);

Ah, much cleaner thanks!

  Luis
Eric W. Biederman May 12, 2020, 5:40 p.m. UTC | #7
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>> Luis Chamberlain <mcgrof@kernel.org> writes:
>> 
>> > +static struct ctl_table fs_base_table[] = {
>> > +	{
>> > +		.procname	= "fs",
>> > +		.mode		= 0555,
>> > +		.child		= fs_table,
>> > +	},
>> > +	{ }
>> > +};
>>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>> > > +static int __init fs_procsys_init(void)
>> > +{
>> > +	struct ctl_table_header *hdr;
>> > +
>> > +	hdr = register_sysctl_table(fs_base_table);
>>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
>> 	AKA
>>         hdr = register_sysctl("fs", fs_table);
>
> Ah, much cleaner thanks!

It is my hope you we can get rid of register_sysctl_table one of these
days.  It was the original interface but today it is just a
compatibility wrapper.

I unfortunately ran out of steam last time before I finished converting
everything over.

Eric
Luis Chamberlain May 12, 2020, 10:03 p.m. UTC | #8
On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
> Luis Chamberlain <mcgrof@kernel.org> writes:
> 
> > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> 
> >> > +static struct ctl_table fs_base_table[] = {
> >> > +	{
> >> > +		.procname	= "fs",
> >> > +		.mode		= 0555,
> >> > +		.child		= fs_table,
> >> > +	},
> >> > +	{ }
> >> > +};
> >>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> >> > > +static int __init fs_procsys_init(void)
> >> > +{
> >> > +	struct ctl_table_header *hdr;
> >> > +
> >> > +	hdr = register_sysctl_table(fs_base_table);
> >>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
> >> 	AKA
> >>         hdr = register_sysctl("fs", fs_table);
> >
> > Ah, much cleaner thanks!
> 
> It is my hope you we can get rid of register_sysctl_table one of these
> days.  It was the original interface but today it is just a
> compatibility wrapper.
> 
> I unfortunately ran out of steam last time before I finished converting
> everything over.

Let's give it one more go. I'll start with the fs stuff.

  Luis
Xiaoming Ni May 13, 2020, 4:04 a.m. UTC | #9
On 2020/5/13 6:03, Luis Chamberlain wrote:
> On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>
>>> On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>>>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>>>
>>>>> +static struct ctl_table fs_base_table[] = {
>>>>> +	{
>>>>> +		.procname	= "fs",
>>>>> +		.mode		= 0555,
>>>>> +		.child		= fs_table,
>>>>> +	},
>>>>> +	{ }
>>>>> +};
>>>>    ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>>>>>> +static int __init fs_procsys_init(void)
>>>>> +{
>>>>> +	struct ctl_table_header *hdr;
>>>>> +
>>>>> +	hdr = register_sysctl_table(fs_base_table);
>>>>                ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
>>>> 	AKA
>>>>          hdr = register_sysctl("fs", fs_table);
>>>
>>> Ah, much cleaner thanks!
>>
>> It is my hope you we can get rid of register_sysctl_table one of these
>> days.  It was the original interface but today it is just a
>> compatibility wrapper.
>>
>> I unfortunately ran out of steam last time before I finished converting
>> everything over.
> 
> Let's give it one more go. I'll start with the fs stuff.
> 
>    Luis
> 
> .
> 

If we register each feature in its own feature code file using 
register() to register the sysctl interface. To avoid merge conflicts 
when different features modify sysctl.c at the same time.
that is, try to Avoid mixing code with multiple features in the same 
code file.

For example, the multiple file interfaces defined in sysctl.c by the 
hung_task feature can  be moved to hung_task.c.

Perhaps later, without centralized sysctl.c ?
Is this better?

Thanks
Xiaoming Ni

---
  include/linux/sched/sysctl.h |  8 +----
  kernel/hung_task.c           | 78 
+++++++++++++++++++++++++++++++++++++++++++-
  kernel/sysctl.c              | 50 ----------------------------
  3 files changed, 78 insertions(+), 58 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215..bb4e0d3 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -7,14 +7,8 @@
  struct ctl_table;

  #ifdef CONFIG_DETECT_HUNG_TASK
-extern int	     sysctl_hung_task_check_count;
-extern unsigned int  sysctl_hung_task_panic;
+/* used for block/ */
  extern unsigned long sysctl_hung_task_timeout_secs;
-extern unsigned long sysctl_hung_task_check_interval_secs;
-extern int sysctl_hung_task_warnings;
-extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int 
write,
-					 void __user *buffer,
-					 size_t *lenp, loff_t *ppos);
  #else
  /* Avoid need for ifdefs elsewhere in the code */
  enum { sysctl_hung_task_timeout_secs = 0 };
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 14a625c..53589f2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -20,10 +20,10 @@
  #include <linux/utsname.h>
  #include <linux/sched/signal.h>
  #include <linux/sched/debug.h>
+#include <linux/kmemleak.h>
  #include <linux/sched/sysctl.h>

  #include <trace/events/sched.h>
-
  /*
   * The number of tasks checked:
   */
@@ -296,8 +296,84 @@ static int watchdog(void *dummy)
  	return 0;
  }

+/*
+ * This is needed for proc_doulongvec_minmax of 
sysctl_hung_task_timeout_secs
+ * and hung_task_check_interval_secs
+ */
+static unsigned long hung_task_timeout_max = (LONG_MAX / HZ);
+static int __maybe_unused neg_one = -1;
+static struct ctl_table hung_task_sysctls[] = {
+	{
+		.procname	= "hung_task_panic",
+		.data		= &sysctl_hung_task_panic,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "hung_task_check_count",
+		.data		= &sysctl_hung_task_check_count,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
+		.procname	= "hung_task_timeout_secs",
+		.data		= &sysctl_hung_task_timeout_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
+	{
+		.procname	= "hung_task_check_interval_secs",
+		.data		= &sysctl_hung_task_check_interval_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
+	{
+		.procname	= "hung_task_warnings",
+		.data		= &sysctl_hung_task_warnings,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &neg_one,
+	},
+	{}
+};
+
+/* get /proc/sys/kernel root */
+static struct ctl_table sysctls_root[] = {
+	{
+		.procname       = "kernel",
+		.mode           = 0555,
+		.child          = hung_task_sysctls,
+	},
+	{}
+};
+
+static int __init hung_task_sysctl_init(void)
+{
+	struct ctl_table_header *srt = register_sysctl_table(sysctls_root);
+
+	if (!srt)
+		return -ENOMEM;
+	kmemleak_not_leak(srt);
+	return 0;
+}
+
  static int __init hung_task_init(void)
  {
+	int ret = hung_task_sysctl_init();
+
+	if (ret != 0)
+		return ret;
+
  	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);

  	/* Disable hung task detector on suspend */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8..45a1153 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -149,13 +149,6 @@
  static int ngroups_max = NGROUPS_MAX;
  static const int cap_last_cap = CAP_LAST_CAP;

-/*
- * This is needed for proc_doulongvec_minmax of 
sysctl_hung_task_timeout_secs
- * and hung_task_check_interval_secs
- */
-#ifdef CONFIG_DETECT_HUNG_TASK
-static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
-#endif

  #ifdef CONFIG_INOTIFY_USER
  #include <linux/inotify.h>
@@ -1085,49 +1078,6 @@ static int sysrq_sysctl_handler(struct ctl_table 
*table, int write,
  		.proc_handler	= proc_dointvec,
  	},
  #endif
-#ifdef CONFIG_DETECT_HUNG_TASK
-	{
-		.procname	= "hung_task_panic",
-		.data		= &sysctl_hung_task_panic,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
-	},
-	{
-		.procname	= "hung_task_check_count",
-		.data		= &sysctl_hung_task_check_count,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= SYSCTL_ZERO,
-	},
-	{
-		.procname	= "hung_task_timeout_secs",
-		.data		= &sysctl_hung_task_timeout_secs,
-		.maxlen		= sizeof(unsigned long),
-		.mode		= 0644,
-		.proc_handler	= proc_dohung_task_timeout_secs,
-		.extra2		= &hung_task_timeout_max,
-	},
-	{
-		.procname	= "hung_task_check_interval_secs",
-		.data		= &sysctl_hung_task_check_interval_secs,
-		.maxlen		= sizeof(unsigned long),
-		.mode		= 0644,
-		.proc_handler	= proc_dohung_task_timeout_secs,
-		.extra2		= &hung_task_timeout_max,
-	},
-	{
-		.procname	= "hung_task_warnings",
-		.data		= &sysctl_hung_task_warnings,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &neg_one,
-	},
-#endif
  #ifdef CONFIG_RT_MUTEXES
  	{
  		.procname	= "max_lock_depth",
Luis Chamberlain May 13, 2020, 12:50 p.m. UTC | #10
On Wed, May 13, 2020 at 12:04:02PM +0800, Xiaoming Ni wrote:
> On 2020/5/13 6:03, Luis Chamberlain wrote:
> > On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
> > > Luis Chamberlain <mcgrof@kernel.org> writes:
> > > 
> > > > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> > > > > Luis Chamberlain <mcgrof@kernel.org> writes:
> > > > > 
> > > > > > +static struct ctl_table fs_base_table[] = {
> > > > > > +	{
> > > > > > +		.procname	= "fs",
> > > > > > +		.mode		= 0555,
> > > > > > +		.child		= fs_table,
> > > > > > +	},
> > > > > > +	{ }
> > > > > > +};
> > > > >    ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> > > > > > > +static int __init fs_procsys_init(void)
> > > > > > +{
> > > > > > +	struct ctl_table_header *hdr;
> > > > > > +
> > > > > > +	hdr = register_sysctl_table(fs_base_table);
> > > > >                ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
> > > > > 	AKA
> > > > >          hdr = register_sysctl("fs", fs_table);
> > > > 
> > > > Ah, much cleaner thanks!
> > > 
> > > It is my hope you we can get rid of register_sysctl_table one of these
> > > days.  It was the original interface but today it is just a
> > > compatibility wrapper.
> > > 
> > > I unfortunately ran out of steam last time before I finished converting
> > > everything over.
> > 
> > Let's give it one more go. I'll start with the fs stuff.
> > 
> >    Luis
> > 
> > .
> > 
> 
> If we register each feature in its own feature code file using register() to
> register the sysctl interface. To avoid merge conflicts when different
> features modify sysctl.c at the same time.
> that is, try to Avoid mixing code with multiple features in the same code
> file.
> 
> For example, the multiple file interfaces defined in sysctl.c by the
> hung_task feature can  be moved to hung_task.c.
> 
> Perhaps later, without centralized sysctl.c ?
> Is this better?
> 
> Thanks
> Xiaoming Ni
> 
> ---
>  include/linux/sched/sysctl.h |  8 +----
>  kernel/hung_task.c           | 78
> +++++++++++++++++++++++++++++++++++++++++++-
>  kernel/sysctl.c              | 50 ----------------------------
>  3 files changed, 78 insertions(+), 58 deletions(-)
> 
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index d4f6215..bb4e0d3 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -7,14 +7,8 @@
>  struct ctl_table;
> 
>  #ifdef CONFIG_DETECT_HUNG_TASK
> -extern int	     sysctl_hung_task_check_count;
> -extern unsigned int  sysctl_hung_task_panic;
> +/* used for block/ */
>  extern unsigned long sysctl_hung_task_timeout_secs;
> -extern unsigned long sysctl_hung_task_check_interval_secs;
> -extern int sysctl_hung_task_warnings;
> -extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int
> write,
> -					 void __user *buffer,
> -					 size_t *lenp, loff_t *ppos);
>  #else
>  /* Avoid need for ifdefs elsewhere in the code */
>  enum { sysctl_hung_task_timeout_secs = 0 };
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 14a625c..53589f2 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -20,10 +20,10 @@
>  #include <linux/utsname.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/debug.h>
> +#include <linux/kmemleak.h>
>  #include <linux/sched/sysctl.h>
> 
>  #include <trace/events/sched.h>
> -
>  /*
>   * The number of tasks checked:
>   */
> @@ -296,8 +296,84 @@ static int watchdog(void *dummy)
>  	return 0;
>  }
> 
> +/*
> + * This is needed for proc_doulongvec_minmax of
> sysctl_hung_task_timeout_secs
> + * and hung_task_check_interval_secs
> + */
> +static unsigned long hung_task_timeout_max = (LONG_MAX / HZ);

This is not generic so it can stay in this file.

> +static int __maybe_unused neg_one = -1;

This is generic so we can share it, I suggest we just rename this
for now to sysctl_neg_one, export it to a symbol namespace,
EXPORT_SYMBOL_NS_GPL(sysctl_neg_one, SYSCTL) and then import it with
MODULE_IMPORT_NS(SYSCTL)


> +static struct ctl_table hung_task_sysctls[] = {

We want to wrap this around with CONFIG_SYSCTL, so a cleaner solution
is something like this:

diff --git a/kernel/Makefile b/kernel/Makefile
index a42ac3a58994..689718351754 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -88,7 +88,9 @@ obj-$(CONFIG_KCOV) += kcov.o
 obj-$(CONFIG_KPROBES) += kprobes.o
 obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
-obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
+obj-$(CONFIG_DETECT_HUNG_TASK) += hung_tasks.o
+hung_tasks-y := hung_task.o
+hung_tasks-$(CONFIG_SYSCTL) += hung_task_sysctl.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o

> +/* get /proc/sys/kernel root */
> +static struct ctl_table sysctls_root[] = {
> +	{
> +		.procname       = "kernel",
> +		.mode           = 0555,
> +		.child          = hung_task_sysctls,
> +	},
> +	{}
> +};
> +

And as per Eric, this is not needed, we can simplify this more, as noted
below.

> +static int __init hung_task_sysctl_init(void)
> +{
> +	struct ctl_table_header *srt = register_sysctl_table(sysctls_root);

You want instead something like::

        struct ctl_table_header *srt;

	srt = register_sysctl("kernel", hung_task_sysctls);
> +
> +	if (!srt)
> +		return -ENOMEM;
> +	kmemleak_not_leak(srt);
> +	return 0;
> +}
> +

>  static int __init hung_task_init(void)
>  {
> +	int ret = hung_task_sysctl_init();
> +
> +	if (ret != 0)
> +		return ret;
> +

And just #ifdef this around CONFIG_SYSCTL.

  Luis
Eric W. Biederman May 13, 2020, 1:42 p.m. UTC | #11
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
>> Luis Chamberlain <mcgrof@kernel.org> writes:
>> 
>> > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>> >> Luis Chamberlain <mcgrof@kernel.org> writes:
>> >> 
>> >> > +static struct ctl_table fs_base_table[] = {
>> >> > +	{
>> >> > +		.procname	= "fs",
>> >> > +		.mode		= 0555,
>> >> > +		.child		= fs_table,
>> >> > +	},
>> >> > +	{ }
>> >> > +};
>> >>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>> >> > > +static int __init fs_procsys_init(void)
>> >> > +{
>> >> > +	struct ctl_table_header *hdr;
>> >> > +
>> >> > +	hdr = register_sysctl_table(fs_base_table);
>> >>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
>> >> 	AKA
>> >>         hdr = register_sysctl("fs", fs_table);
>> >
>> > Ah, much cleaner thanks!
>> 
>> It is my hope you we can get rid of register_sysctl_table one of these
>> days.  It was the original interface but today it is just a
>> compatibility wrapper.
>> 
>> I unfortunately ran out of steam last time before I finished converting
>> everything over.
>
> Let's give it one more go. I'll start with the fs stuff.

Just to be clear moving the tables out of kernel/sysctl.c is a related
but slightly different problem.

Today it looks like there are 35 calls of register_sysctl_table
and 9 calls of register_sysctl_paths.

Among them is lib/sysctl_test.c and check-sysctl-docs.

Meanwhile I can only find 5 calls to register_sysctl in the tree
so it looks like I didn't get very far converting things over.

Eric
Luis Chamberlain May 13, 2020, 2:14 p.m. UTC | #12
On Wed, May 13, 2020 at 08:42:30AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain <mcgrof@kernel.org> writes:
> 
> > On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> 
> >> > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> >> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> >> 
> >> >> > +static struct ctl_table fs_base_table[] = {
> >> >> > +	{
> >> >> > +		.procname	= "fs",
> >> >> > +		.mode		= 0555,
> >> >> > +		.child		= fs_table,
> >> >> > +	},
> >> >> > +	{ }
> >> >> > +};
> >> >>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> >> >> > > +static int __init fs_procsys_init(void)
> >> >> > +{
> >> >> > +	struct ctl_table_header *hdr;
> >> >> > +
> >> >> > +	hdr = register_sysctl_table(fs_base_table);
> >> >>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
> >> >> 	AKA
> >> >>         hdr = register_sysctl("fs", fs_table);
> >> >
> >> > Ah, much cleaner thanks!
> >> 
> >> It is my hope you we can get rid of register_sysctl_table one of these
> >> days.  It was the original interface but today it is just a
> >> compatibility wrapper.
> >> 
> >> I unfortunately ran out of steam last time before I finished converting
> >> everything over.
> >
> > Let's give it one more go. I'll start with the fs stuff.
> 
> Just to be clear moving the tables out of kernel/sysctl.c is a related
> but slightly different problem.

Sure, but also before we go on this crusade, how about we add a few
helpers:

register_sysctl_kernel()
register_sysctl_vm()
register_sysctl_fs()
register_sysctl_debug()
register_sysctl_dev()

That should make it easier to look for these, and shorter. We *know*
this is a common path, given the size of the existing table.

> Today it looks like there are 35 calls of register_sysctl_table
> and 9 calls of register_sysctl_paths.
> 
> Among them is lib/sysctl_test.c and check-sysctl-docs.
> 
> Meanwhile I can only find 5 calls to register_sysctl in the tree
> so it looks like I didn't get very far converting things over.

While we're on the spring cleaning topic, I've tried to put what I can
think of for TODO items here, anything else? Feel free to edit, its a
wiki after all.

https://kernelnewbies.org/KernelProjects/proc

Feel free to add wishlist items.

  Luis
Eric W. Biederman May 13, 2020, 2:44 p.m. UTC | #13
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Wed, May 13, 2020 at 08:42:30AM -0500, Eric W. Biederman wrote:
>> Luis Chamberlain <mcgrof@kernel.org> writes:
>> 
>> > On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
>> >> Luis Chamberlain <mcgrof@kernel.org> writes:
>> >> 
>> >> > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>> >> >> Luis Chamberlain <mcgrof@kernel.org> writes:
>> >> >> 
>> >> >> > +static struct ctl_table fs_base_table[] = {
>> >> >> > +	{
>> >> >> > +		.procname	= "fs",
>> >> >> > +		.mode		= 0555,
>> >> >> > +		.child		= fs_table,
>> >> >> > +	},
>> >> >> > +	{ }
>> >> >> > +};
>> >> >>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>> >> >> > > +static int __init fs_procsys_init(void)
>> >> >> > +{
>> >> >> > +	struct ctl_table_header *hdr;
>> >> >> > +
>> >> >> > +	hdr = register_sysctl_table(fs_base_table);
>> >> >>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
>> >> >> 	AKA
>> >> >>         hdr = register_sysctl("fs", fs_table);
>> >> >
>> >> > Ah, much cleaner thanks!
>> >> 
>> >> It is my hope you we can get rid of register_sysctl_table one of these
>> >> days.  It was the original interface but today it is just a
>> >> compatibility wrapper.
>> >> 
>> >> I unfortunately ran out of steam last time before I finished converting
>> >> everything over.
>> >
>> > Let's give it one more go. I'll start with the fs stuff.
>> 
>> Just to be clear moving the tables out of kernel/sysctl.c is a related
>> but slightly different problem.
>
> Sure, but also before we go on this crusade, how about we add a few
> helpers:
>
> register_sysctl_kernel()
> register_sysctl_vm()
> register_sysctl_fs()
> register_sysctl_debug()
> register_sysctl_dev()

Hmm.

  register_sysctl("kernel")

> That should make it easier to look for these, and shorter. We *know*
> this is a common path, given the size of the existing table.

I don't really care but one character shorter doesn't look like it
really helps.  Not really for grepping and not maintenance as we get a
bunch of trivial one line implementations.

Eric
Luis Chamberlain May 13, 2020, 3:02 p.m. UTC | #14
On Wed, May 13, 2020 at 09:44:40AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain <mcgrof@kernel.org> writes:
> 
> > On Wed, May 13, 2020 at 08:42:30AM -0500, Eric W. Biederman wrote:
> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> 
> >> > On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
> >> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> >> 
> >> >> > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> >> >> >> Luis Chamberlain <mcgrof@kernel.org> writes:
> >> >> >> 
> >> >> >> > +static struct ctl_table fs_base_table[] = {
> >> >> >> > +	{
> >> >> >> > +		.procname	= "fs",
> >> >> >> > +		.mode		= 0555,
> >> >> >> > +		.child		= fs_table,
> >> >> >> > +	},
> >> >> >> > +	{ }
> >> >> >> > +};
> >> >> >>   ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> >> >> >> > > +static int __init fs_procsys_init(void)
> >> >> >> > +{
> >> >> >> > +	struct ctl_table_header *hdr;
> >> >> >> > +
> >> >> >> > +	hdr = register_sysctl_table(fs_base_table);
> >> >> >>               ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
> >> >> >> 	AKA
> >> >> >>         hdr = register_sysctl("fs", fs_table);
> >> >> >
> >> >> > Ah, much cleaner thanks!
> >> >> 
> >> >> It is my hope you we can get rid of register_sysctl_table one of these
> >> >> days.  It was the original interface but today it is just a
> >> >> compatibility wrapper.
> >> >> 
> >> >> I unfortunately ran out of steam last time before I finished converting
> >> >> everything over.
> >> >
> >> > Let's give it one more go. I'll start with the fs stuff.
> >> 
> >> Just to be clear moving the tables out of kernel/sysctl.c is a related
> >> but slightly different problem.
> >
> > Sure, but also before we go on this crusade, how about we add a few
> > helpers:
> >
> > register_sysctl_kernel()
> > register_sysctl_vm()
> > register_sysctl_fs()
> > register_sysctl_debug()
> > register_sysctl_dev()
> 
> Hmm.
> 
>   register_sysctl("kernel")
> 
> > That should make it easier to look for these, and shorter. We *know*
> > this is a common path, given the size of the existing table.
> 
> I don't really care but one character shorter doesn't look like it
> really helps.  Not really for grepping and not maintenance as we get a
> bunch of trivial one line implementations.

Alright, let's skip the helpers for now.

  Luis
Xiaoming Ni May 14, 2020, 6:05 a.m. UTC | #15
On 2020/5/13 20:50, Luis Chamberlain wrote:
> On Wed, May 13, 2020 at 12:04:02PM +0800, Xiaoming Ni wrote:
>> On 2020/5/13 6:03, Luis Chamberlain wrote:
>>> On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
>>>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>>>
>>>>> On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>>>>>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>>>>>
>>>>>>> +static struct ctl_table fs_base_table[] = {
>>>>>>> +	{
>>>>>>> +		.procname	= "fs",
>>>>>>> +		.mode		= 0555,
>>>>>>> +		.child		= fs_table,
>>>>>>> +	},
>>>>>>> +	{ }
>>>>>>> +};
>>>>>>     ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>>>>>>>> +static int __init fs_procsys_init(void)
>>>>>>> +{
>>>>>>> +	struct ctl_table_header *hdr;
>>>>>>> +
>>>>>>> +	hdr = register_sysctl_table(fs_base_table);
>>>>>>                 ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl instead.
>>>>>> 	AKA
>>>>>>           hdr = register_sysctl("fs", fs_table);
>>>>>
>>>>> Ah, much cleaner thanks!
>>>>
>>>> It is my hope you we can get rid of register_sysctl_table one of these
>>>> days.  It was the original interface but today it is just a
>>>> compatibility wrapper.
>>>>
>>>> I unfortunately ran out of steam last time before I finished converting
>>>> everything over.
>>>
>>> Let's give it one more go. I'll start with the fs stuff.
>>>
>>>     Luis
>>>
>>> .
>>>
>>
>> If we register each feature in its own feature code file using register() to
>> register the sysctl interface. To avoid merge conflicts when different
>> features modify sysctl.c at the same time.
>> that is, try to Avoid mixing code with multiple features in the same code
>> file.
>>
>> For example, the multiple file interfaces defined in sysctl.c by the
>> hung_task feature can  be moved to hung_task.c.
>>
>> Perhaps later, without centralized sysctl.c ?
>> Is this better?
>>
>> Thanks
>> Xiaoming Ni
>>
>> ---
>>   include/linux/sched/sysctl.h |  8 +----
>>   kernel/hung_task.c           | 78
>> +++++++++++++++++++++++++++++++++++++++++++-
>>   kernel/sysctl.c              | 50 ----------------------------
>>   3 files changed, 78 insertions(+), 58 deletions(-)
>>
>> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
>> index d4f6215..bb4e0d3 100644
>> --- a/include/linux/sched/sysctl.h
>> +++ b/include/linux/sched/sysctl.h
>> @@ -7,14 +7,8 @@
>>   struct ctl_table;
>>
>>   #ifdef CONFIG_DETECT_HUNG_TASK
>> -extern int	     sysctl_hung_task_check_count;
>> -extern unsigned int  sysctl_hung_task_panic;
>> +/* used for block/ */
>>   extern unsigned long sysctl_hung_task_timeout_secs;
>> -extern unsigned long sysctl_hung_task_check_interval_secs;
>> -extern int sysctl_hung_task_warnings;
>> -extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int
>> write,
>> -					 void __user *buffer,
>> -					 size_t *lenp, loff_t *ppos);
>>   #else
>>   /* Avoid need for ifdefs elsewhere in the code */
>>   enum { sysctl_hung_task_timeout_secs = 0 };
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 14a625c..53589f2 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -20,10 +20,10 @@
>>   #include <linux/utsname.h>
>>   #include <linux/sched/signal.h>
>>   #include <linux/sched/debug.h>
>> +#include <linux/kmemleak.h>
>>   #include <linux/sched/sysctl.h>
>>
>>   #include <trace/events/sched.h>
>> -
>>   /*
>>    * The number of tasks checked:
>>    */
>> @@ -296,8 +296,84 @@ static int watchdog(void *dummy)
>>   	return 0;
>>   }
>>
>> +/*
>> + * This is needed for proc_doulongvec_minmax of
>> sysctl_hung_task_timeout_secs
>> + * and hung_task_check_interval_secs
>> + */
>> +static unsigned long hung_task_timeout_max = (LONG_MAX / HZ);
> 
> This is not generic so it can stay in this file.
> 
>> +static int __maybe_unused neg_one = -1;
> 
> This is generic so we can share it, I suggest we just rename this
> for now to sysctl_neg_one, export it to a symbol namespace,
> EXPORT_SYMBOL_NS_GPL(sysctl_neg_one, SYSCTL) and then import it with
> MODULE_IMPORT_NS(SYSCTL)
> 
> 
>> +static struct ctl_table hung_task_sysctls[] = {
> 
> We want to wrap this around with CONFIG_SYSCTL, so a cleaner solution
> is something like this:
> 
> diff --git a/kernel/Makefile b/kernel/Makefile
> index a42ac3a58994..689718351754 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -88,7 +88,9 @@ obj-$(CONFIG_KCOV) += kcov.o
>   obj-$(CONFIG_KPROBES) += kprobes.o
>   obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
>   obj-$(CONFIG_KGDB) += debug/
> -obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
> +obj-$(CONFIG_DETECT_HUNG_TASK) += hung_tasks.o
> +hung_tasks-y := hung_task.o
> +hung_tasks-$(CONFIG_SYSCTL) += hung_task_sysctl.o
>   obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
>   obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>   obj-$(CONFIG_SECCOMP) += seccomp.o
> 
>> +/* get /proc/sys/kernel root */
>> +static struct ctl_table sysctls_root[] = {
>> +	{
>> +		.procname       = "kernel",
>> +		.mode           = 0555,
>> +		.child          = hung_task_sysctls,
>> +	},
>> +	{}
>> +};
>> +
> 
> And as per Eric, this is not needed, we can simplify this more, as noted
> below.
> 
>> +static int __init hung_task_sysctl_init(void)
>> +{
>> +	struct ctl_table_header *srt = register_sysctl_table(sysctls_root);
> 
> You want instead something like::
> 
>          struct ctl_table_header *srt;
> 
> 	srt = register_sysctl("kernel", hung_task_sysctls);
>> +
>> +	if (!srt)
>> +		return -ENOMEM;
>> +	kmemleak_not_leak(srt);
>> +	return 0;
>> +}
>> +
> 
>>   static int __init hung_task_init(void)
>>   {
>> +	int ret = hung_task_sysctl_init();
>> +
>> +	if (ret != 0)
>> +		return ret;
>> +
> 
> And just #ifdef this around CONFIG_SYSCTL.
> 
>    Luis
> 
> .
> 

Thank you for your guidance, I will send the patch later

Xiaoming Ni
Xiaoming Ni May 14, 2020, 4:17 p.m. UTC | #16
On 2020/5/14 14:05, Xiaoming Ni wrote:
> On 2020/5/13 20:50, Luis Chamberlain wrote:
>> On Wed, May 13, 2020 at 12:04:02PM +0800, Xiaoming Ni wrote:
>>> On 2020/5/13 6:03, Luis Chamberlain wrote:
>>>> On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
>>>>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>>>>
>>>>>> On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
>>>>>>> Luis Chamberlain <mcgrof@kernel.org> writes:
>>>>>>>
>>>>>>>> +static struct ctl_table fs_base_table[] = {
>>>>>>>> +    {
>>>>>>>> +        .procname    = "fs",
>>>>>>>> +        .mode        = 0555,
>>>>>>>> +        .child        = fs_table,
>>>>>>>> +    },
>>>>>>>> +    { }
>>>>>>>> +};
>>>>>>>     ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
>>>>>>>>> +static int __init fs_procsys_init(void)
>>>>>>>> +{
>>>>>>>> +    struct ctl_table_header *hdr;
>>>>>>>> +
>>>>>>>> +    hdr = register_sysctl_table(fs_base_table);
>>>>>>>                 ^^^^^^^^^^^^^^^^^^^^^ Please use register_sysctl 
>>>>>>> instead.
>>>>>>>     AKA
>>>>>>>           hdr = register_sysctl("fs", fs_table);
>>>>>>
>>>>>> Ah, much cleaner thanks!
>>>>>
>>>>> It is my hope you we can get rid of register_sysctl_table one of these
>>>>> days.  It was the original interface but today it is just a
>>>>> compatibility wrapper.
>>>>>
>>>>> I unfortunately ran out of steam last time before I finished 
>>>>> converting
>>>>> everything over.
>>>>
>>>> Let's give it one more go. I'll start with the fs stuff.
>>>>
>>>>     Luis
>>>>
>>>> .
>>>>
>>>
>>> If we register each feature in its own feature code file using 
>>> register() to
>>> register the sysctl interface. To avoid merge conflicts when different
>>> features modify sysctl.c at the same time.
>>> that is, try to Avoid mixing code with multiple features in the same 
>>> code
>>> file.
>>>
>>> For example, the multiple file interfaces defined in sysctl.c by the
>>> hung_task feature can  be moved to hung_task.c.
>>>
>>> Perhaps later, without centralized sysctl.c ?
>>> Is this better?
>>>
>>> Thanks
>>> Xiaoming Ni
>>>
>>> ---
>>>   include/linux/sched/sysctl.h |  8 +----
>>>   kernel/hung_task.c           | 78
>>> +++++++++++++++++++++++++++++++++++++++++++-
>>>   kernel/sysctl.c              | 50 ----------------------------
>>>   3 files changed, 78 insertions(+), 58 deletions(-)
>>>
>>> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
>>> index d4f6215..bb4e0d3 100644
>>> --- a/include/linux/sched/sysctl.h
>>> +++ b/include/linux/sched/sysctl.h
>>> @@ -7,14 +7,8 @@
>>>   struct ctl_table;
>>>
>>>   #ifdef CONFIG_DETECT_HUNG_TASK
>>> -extern int         sysctl_hung_task_check_count;
>>> -extern unsigned int  sysctl_hung_task_panic;
>>> +/* used for block/ */
>>>   extern unsigned long sysctl_hung_task_timeout_secs;
>>> -extern unsigned long sysctl_hung_task_check_interval_secs;
>>> -extern int sysctl_hung_task_warnings;
>>> -extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int
>>> write,
>>> -                     void __user *buffer,
>>> -                     size_t *lenp, loff_t *ppos);
>>>   #else
>>>   /* Avoid need for ifdefs elsewhere in the code */
>>>   enum { sysctl_hung_task_timeout_secs = 0 };
>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>> index 14a625c..53589f2 100644
>>> --- a/kernel/hung_task.c
>>> +++ b/kernel/hung_task.c
>>> @@ -20,10 +20,10 @@
>>>   #include <linux/utsname.h>
>>>   #include <linux/sched/signal.h>
>>>   #include <linux/sched/debug.h>
>>> +#include <linux/kmemleak.h>
>>>   #include <linux/sched/sysctl.h>
>>>
>>>   #include <trace/events/sched.h>
>>> -
>>>   /*
>>>    * The number of tasks checked:
>>>    */
>>> @@ -296,8 +296,84 @@ static int watchdog(void *dummy)
>>>       return 0;
>>>   }
>>>
>>> +/*
>>> + * This is needed for proc_doulongvec_minmax of
>>> sysctl_hung_task_timeout_secs
>>> + * and hung_task_check_interval_secs
>>> + */
>>> +static unsigned long hung_task_timeout_max = (LONG_MAX / HZ);
>>
>> This is not generic so it can stay in this file.
>>
>>> +static int __maybe_unused neg_one = -1;
>>
>> This is generic so we can share it, I suggest we just rename this
>> for now to sysctl_neg_one, export it to a symbol namespace,
>> EXPORT_SYMBOL_NS_GPL(sysctl_neg_one, SYSCTL) and then import it with
>> MODULE_IMPORT_NS(SYSCTL)

When I made the patch, I found that only sysctl_writes_strict and 
hung_task_warnings use the neg_one variable, so is it necessary to merge 
and generate the SYSCTL_NEG_ONE variable?

In addition, the SYSCTL symbol namespace has not been created yet. Do I 
just need to add a new member -1 to the sysctl_vals array?

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b6f5d45..acae1fa 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -23,7 +23,7 @@
  static const struct inode_operations proc_sys_dir_operations;

  /* shared constants to be used in various sysctls */
-const int sysctl_vals[] = { 0, 1, INT_MAX };
+const int sysctl_vals[] = { 0, 1, INT_MAX, -1 };
  EXPORT_SYMBOL(sysctl_vals);

  /* Support for permanently empty directories */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 02fa844..6d741d6 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -41,6 +41,7 @@
  #define SYSCTL_ZERO    ((void *)&sysctl_vals[0])
  #define SYSCTL_ONE     ((void *)&sysctl_vals[1])
  #define SYSCTL_INT_MAX ((void *)&sysctl_vals[2])
+#define SYSCTL_NEG_ONE       ((void *)&sysctl_vals[3])

  extern const int sysctl_vals[];

Thanks
Xiaoming Ni


>>
>>
>>> +static struct ctl_table hung_task_sysctls[] = {
>>
>> We want to wrap this around with CONFIG_SYSCTL, so a cleaner solution
>> is something like this:
>>
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index a42ac3a58994..689718351754 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -88,7 +88,9 @@ obj-$(CONFIG_KCOV) += kcov.o
>>   obj-$(CONFIG_KPROBES) += kprobes.o
>>   obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
>>   obj-$(CONFIG_KGDB) += debug/
>> -obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>> +obj-$(CONFIG_DETECT_HUNG_TASK) += hung_tasks.o
>> +hung_tasks-y := hung_task.o
>> +hung_tasks-$(CONFIG_SYSCTL) += hung_task_sysctl.o
>>   obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
>>   obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>>   obj-$(CONFIG_SECCOMP) += seccomp.o
>>
>>> +/* get /proc/sys/kernel root */
>>> +static struct ctl_table sysctls_root[] = {
>>> +    {
>>> +        .procname       = "kernel",
>>> +        .mode           = 0555,
>>> +        .child          = hung_task_sysctls,
>>> +    },
>>> +    {}
>>> +};
>>> +
>>
>> And as per Eric, this is not needed, we can simplify this more, as noted
>> below.
>>
>>> +static int __init hung_task_sysctl_init(void)
>>> +{
>>> +    struct ctl_table_header *srt = register_sysctl_table(sysctls_root);
>>
>> You want instead something like::
>>
>>          struct ctl_table_header *srt;
>>
>>     srt = register_sysctl("kernel", hung_task_sysctls);
>>> +
>>> +    if (!srt)
>>> +        return -ENOMEM;
>>> +    kmemleak_not_leak(srt);
>>> +    return 0;
>>> +}
>>> +
>>
>>>   static int __init hung_task_init(void)
>>>   {
>>> +    int ret = hung_task_sysctl_init();
>>> +
>>> +    if (ret != 0)
>>> +        return ret;
>>> +
>>
>> And just #ifdef this around CONFIG_SYSCTL.
>>
>>    Luis
>>
>> .
>>
> 
> Thank you for your guidance, I will send the patch later
> 
> Xiaoming Ni
>
Luis Chamberlain May 15, 2020, 4:08 p.m. UTC | #17
On Fri, May 15, 2020 at 12:17:52AM +0800, Xiaoming Ni wrote:
> On 2020/5/14 14:05, Xiaoming Ni wrote:
> > On 2020/5/13 20:50, Luis Chamberlain wrote:
> > > On Wed, May 13, 2020 at 12:04:02PM +0800, Xiaoming Ni wrote:
> > > > On 2020/5/13 6:03, Luis Chamberlain wrote:
> > > > > On Tue, May 12, 2020 at 12:40:55PM -0500, Eric W. Biederman wrote:
> > > > > > Luis Chamberlain <mcgrof@kernel.org> writes:
> > > > > > 
> > > > > > > On Tue, May 12, 2020 at 06:52:35AM -0500, Eric W. Biederman wrote:
> > > > > > > > Luis Chamberlain <mcgrof@kernel.org> writes:
> > > > > > > > 
> > > > > > > > > +static struct ctl_table fs_base_table[] = {
> > > > > > > > > +    {
> > > > > > > > > +        .procname    = "fs",
> > > > > > > > > +        .mode        = 0555,
> > > > > > > > > +        .child        = fs_table,
> > > > > > > > > +    },
> > > > > > > > > +    { }
> > > > > > > > > +};
> > > > > > > >     ^^^^^^^^^^^^^^^^^^^^^^^^ You don't need this at all.
> > > > > > > > > > +static int __init fs_procsys_init(void)
> > > > > > > > > +{
> > > > > > > > > +    struct ctl_table_header *hdr;
> > > > > > > > > +
> > > > > > > > > +    hdr = register_sysctl_table(fs_base_table);
> > > > > > > >                 ^^^^^^^^^^^^^^^^^^^^^ Please use
> > > > > > > > register_sysctl instead.
> > > > > > > >     AKA
> > > > > > > >           hdr = register_sysctl("fs", fs_table);
> > > > > > > 
> > > > > > > Ah, much cleaner thanks!
> > > > > > 
> > > > > > It is my hope you we can get rid of register_sysctl_table one of these
> > > > > > days.  It was the original interface but today it is just a
> > > > > > compatibility wrapper.
> > > > > > 
> > > > > > I unfortunately ran out of steam last time before I
> > > > > > finished converting
> > > > > > everything over.
> > > > > 
> > > > > Let's give it one more go. I'll start with the fs stuff.
> > > > > 
> > > > >     Luis
> > > > > 
> > > > > .
> > > > > 
> > > > 
> > > > If we register each feature in its own feature code file using
> > > > register() to
> > > > register the sysctl interface. To avoid merge conflicts when different
> > > > features modify sysctl.c at the same time.
> > > > that is, try to Avoid mixing code with multiple features in the
> > > > same code
> > > > file.
> > > > 
> > > > For example, the multiple file interfaces defined in sysctl.c by the
> > > > hung_task feature can  be moved to hung_task.c.
> > > > 
> > > > Perhaps later, without centralized sysctl.c ?
> > > > Is this better?
> > > > 
> > > > Thanks
> > > > Xiaoming Ni
> > > > 
> > > > ---
> > > >   include/linux/sched/sysctl.h |  8 +----
> > > >   kernel/hung_task.c           | 78
> > > > +++++++++++++++++++++++++++++++++++++++++++-
> > > >   kernel/sysctl.c              | 50 ----------------------------
> > > >   3 files changed, 78 insertions(+), 58 deletions(-)
> > > > 
> > > > diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> > > > index d4f6215..bb4e0d3 100644
> > > > --- a/include/linux/sched/sysctl.h
> > > > +++ b/include/linux/sched/sysctl.h
> > > > @@ -7,14 +7,8 @@
> > > >   struct ctl_table;
> > > > 
> > > >   #ifdef CONFIG_DETECT_HUNG_TASK
> > > > -extern int         sysctl_hung_task_check_count;
> > > > -extern unsigned int  sysctl_hung_task_panic;
> > > > +/* used for block/ */
> > > >   extern unsigned long sysctl_hung_task_timeout_secs;
> > > > -extern unsigned long sysctl_hung_task_check_interval_secs;
> > > > -extern int sysctl_hung_task_warnings;
> > > > -extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int
> > > > write,
> > > > -                     void __user *buffer,
> > > > -                     size_t *lenp, loff_t *ppos);
> > > >   #else
> > > >   /* Avoid need for ifdefs elsewhere in the code */
> > > >   enum { sysctl_hung_task_timeout_secs = 0 };
> > > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > > index 14a625c..53589f2 100644
> > > > --- a/kernel/hung_task.c
> > > > +++ b/kernel/hung_task.c
> > > > @@ -20,10 +20,10 @@
> > > >   #include <linux/utsname.h>
> > > >   #include <linux/sched/signal.h>
> > > >   #include <linux/sched/debug.h>
> > > > +#include <linux/kmemleak.h>
> > > >   #include <linux/sched/sysctl.h>
> > > > 
> > > >   #include <trace/events/sched.h>
> > > > -
> > > >   /*
> > > >    * The number of tasks checked:
> > > >    */
> > > > @@ -296,8 +296,84 @@ static int watchdog(void *dummy)
> > > >       return 0;
> > > >   }
> > > > 
> > > > +/*
> > > > + * This is needed for proc_doulongvec_minmax of
> > > > sysctl_hung_task_timeout_secs
> > > > + * and hung_task_check_interval_secs
> > > > + */
> > > > +static unsigned long hung_task_timeout_max = (LONG_MAX / HZ);
> > > 
> > > This is not generic so it can stay in this file.
> > > 
> > > > +static int __maybe_unused neg_one = -1;
> > > 
> > > This is generic so we can share it, I suggest we just rename this
> > > for now to sysctl_neg_one, export it to a symbol namespace,
> > > EXPORT_SYMBOL_NS_GPL(sysctl_neg_one, SYSCTL) and then import it with
> > > MODULE_IMPORT_NS(SYSCTL)
> 
> When I made the patch, I found that only sysctl_writes_strict and
> hung_task_warnings use the neg_one variable, so is it necessary to merge and
> generate the SYSCTL_NEG_ONE variable?

Yes.


> In addition, the SYSCTL symbol namespace has not been created yet. Do I just
> need to add a new member -1 to the sysctl_vals array?

I had forgotten about our sysctl_vals, so disregard my request
to use EXPORT_SYMBOL_NS_GPL(sysctl_neg_one, SYSCTL) and using
MODULE_IMPORT_NS(SYSCTL). Since we are already using these and
have a prefix on the define we should be good.

> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index b6f5d45..acae1fa 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -23,7 +23,7 @@
>  static const struct inode_operations proc_sys_dir_operations;
> 
>  /* shared constants to be used in various sysctls */
> -const int sysctl_vals[] = { 0, 1, INT_MAX };
> +const int sysctl_vals[] = { 0, 1, INT_MAX, -1 };
>  EXPORT_SYMBOL(sysctl_vals);
> 
>  /* Support for permanently empty directories */
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa844..6d741d6 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -41,6 +41,7 @@
>  #define SYSCTL_ZERO    ((void *)&sysctl_vals[0])
>  #define SYSCTL_ONE     ((void *)&sysctl_vals[1])
>  #define SYSCTL_INT_MAX ((void *)&sysctl_vals[2])
> +#define SYSCTL_NEG_ONE       ((void *)&sysctl_vals[3])
> 
>  extern const int sysctl_vals[];

This looks good.

  Luis
diff mbox series

Patch

diff --cc kernel/sysctl.c
index b9ff323e1d26,e961286d0e14..000000000000
--- a/kernel/sysctl.c