diff mbox series

[1/3] kernel/sysctl: support setting sysctl parameters from kernel command line

Message ID 20200330115535.3215-2-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series support setting sysctl parameters from kernel command line | expand

Commit Message

Vlastimil Babka March 30, 2020, 11:55 a.m. UTC
A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a general
support for passing sysctl parameters via command line. Googling found only
somebody else wondering the same [2], but I haven't found any prior discussion
with reasons why not to do this.

Settings the vm_swappiness issue aside (the underlying issue might be solved in
a different way), quick search of kernel-parameters.txt shows there are already
some that exist as both sysctl and kernel parameter - hung_task_panic,
nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
would remove the need to add more of those one-offs and might be handy in
situations where configuration by e.g. /etc/sysctl.d/ is impractical.

Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and tries to interpret them as writes to the
corresponding sys/ files using an temporary in-kernel procfs mount. This
mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
registered sysctl tables. Errors due to e.g. invalid parameter name or value
are reported in the kernel log.

The processing is hooked right before the init process is loaded, as some
handlers might be more complicated than simple setters and might need some
subsystems to be initialized. At the moment the init process can be started and
eventually execute a process writing to /proc/sys/ then it should be also fine
to do that from the kernel.

Sysctls registered later on module load time are not set by this mechanism -
it's expected that in such scenarios, setting sysctl values from userspace is
practical enough.

[1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
[3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 .../admin-guide/kernel-parameters.txt         |   9 ++
 fs/proc/proc_sysctl.c                         | 100 ++++++++++++++++++
 include/linux/sysctl.h                        |   4 +
 init/main.c                                   |   2 +
 4 files changed, 115 insertions(+)

Comments

Vlastimil Babka March 30, 2020, 4:23 p.m. UTC | #1
On 3/30/20 1:55 PM, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
> 
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> 
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables. Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
> 
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
> 
> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.
> 
> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Boo, all the error prints should terminate with \n
Will wait for feedback before resend.
Kees Cook March 30, 2020, 5:39 p.m. UTC | #2
On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
> 
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> 
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables. Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
> 
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
> 
> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.
> 
> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  .../admin-guide/kernel-parameters.txt         |   9 ++
>  fs/proc/proc_sysctl.c                         | 100 ++++++++++++++++++
>  include/linux/sysctl.h                        |   4 +
>  init/main.c                                   |   2 +
>  4 files changed, 115 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..81ff626fc700 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>  
>  	switches=	[HW,M68k]
>  
> +	sysctl.*=	[KNL]
> +			Set a sysctl parameter, right before loading the init
> +			process, as if the value was written to the respective
> +			/proc/sys/... file. Both '.' and '/' are recognized as
> +			separators. Unrecognized parameters and invalid values
> +			are reported in the kernel log. Sysctls registered
> +			later by a loaded module cannot be set this way.
> +			Example: sysctl.vm.swappiness=40
> +
>  	sysfs.deprecated=0|1 [KNL]
>  			Enable/disable old style sysfs layout for old udev
>  			on older distributions. When this option is enabled
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c75bb4632ed1..653188c9c4c9 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -14,6 +14,7 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/bpf-cgroup.h>
> +#include <linux/mount.h>
>  #include "internal.h"
>  
>  static const struct dentry_operations proc_sys_dentry_operations;
> @@ -1725,3 +1726,102 @@ int __init proc_sys_init(void)
>  
>  	return sysctl_init();
>  }
> +
> +/* Set sysctl value passed on kernel command line. */
> +static int process_sysctl_arg(char *param, char *val,
> +			       const char *unused, void *arg)
> +{
> +	char *path;
> +	struct vfsmount *proc_mnt = *((struct vfsmount **)arg);

I would just make this:

	struct vfsmount **proc_mnt = (struct vfsmount **)arg;


> +	struct file_system_type *proc_fs_type;
> +	struct file *file;
> +	int len;
> +	int err;
> +	loff_t pos = 0;
> +	ssize_t wret;
> +
> +	if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
> +		return 0;
> +
> +	param += sizeof("sysctl") - 1;
> +
> +	if (param[0] != '/' && param[0] != '.')
> +		return 0;
> +
> +	param++;
> +
> +	if (!proc_mnt) {

	if (!*proc_mnt) {

I would also add a comment here to explain that this is doing an
on-demand mount so that it doesn't have to mount proc if there are not
sysctl parameters.

> +		proc_fs_type = get_fs_type("proc");
> +		if (!proc_fs_type) {
> +			pr_err("Failed to find procfs to set sysctl from command line");
> +			return 0;
> +		}
> +		proc_mnt = kern_mount(proc_fs_type);

		*proc_mnt = kern_mount(proc_fs_type);

> +		put_filesystem(proc_fs_type);
> +		if (IS_ERR(proc_mnt)) {

		if (IS_ERR(*proc_mnt)) {

> +			pr_err("Failed to mount procfs to set sysctl from command line");
> +			return 0;
> +		}
> +		*((struct vfsmount **)arg) = proc_mnt;

Then drop this line.

> +	}
> +
> +	path = kasprintf(GFP_KERNEL, "sys/%s", param);
> +	if (!path)
> +		panic("%s: Failed to allocate path for %s\n", __func__, param);
> +	strreplace(path, '.', '/');
> +
> +	file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);

	file = file_open_root((*proc_mnt)->mnt_root, *proc_mnt, path, O_WRONLY, 0);

> +	if (IS_ERR(file)) {
> +		err = PTR_ERR(file);
> +		if (err == -ENOENT)
> +			pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
> +				param, val);
> +		else if (err == -EACCES)
> +			pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
> +				param, val);
> +		else
> +			pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
> +				file, param, val);
> +		goto out;
> +	}
> +	len = strlen(val);
> +	wret = kernel_write(file, val, len, &pos);
> +	if (wret < 0) {
> +		err = wret;
> +		if (err == -EINVAL)
> +			pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
> +				param, val);
> +		else
> +			pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
> +				ERR_PTR(err), param, val);
> +	} else if (wret != len) {
> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> +			wret, len, path, param, val);
> +	}
> +
> +	err = filp_close(file, NULL);
> +	if (err)
> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> +			ERR_PTR(err), param, val);
> +out:
> +	kfree(path);
> +	return 0;
> +}
> +
> +void do_sysctl_args(void)
> +{
> +	char *command_line;
> +	struct vfsmount *proc_mnt = NULL;
> +
> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> +	if (!command_line)
> +		panic("%s: Failed to allocate copy of command line\n", __func__);
> +
> +	parse_args("Setting sysctl args", command_line,
> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> +
> +	if (proc_mnt)
> +		kern_unmount(proc_mnt);
> +
> +	kfree(command_line);
> +}
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa84493f23..5f3f2a00d75f 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
>  void unregister_sysctl_table(struct ctl_table_header * table);
>  
>  extern int sysctl_init(void);
> +void do_sysctl_args(void);
>  
>  extern struct ctl_table sysctl_mount_point[];
>  
> @@ -236,6 +237,9 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
>  {
>  }
>  
> +void do_sysctl_args(void)

As with the others in the no-op case:

static inline void do_sysctl_args(void)

> +{
> +}
>  #endif /* CONFIG_SYSCTL */
>  
>  int sysctl_max_threads(struct ctl_table *table, int write,
> diff --git a/init/main.c b/init/main.c
> index ee4947af823f..a91ea166a731 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1367,6 +1367,8 @@ static int __ref kernel_init(void *unused)
>  
>  	rcu_end_inkernel_boot();
>  
> +	do_sysctl_args();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> -- 
> 2.25.1
> 

Otherwise, yes, looks good!
Kees Cook March 30, 2020, 5:40 p.m. UTC | #3
On Mon, Mar 30, 2020 at 06:23:57PM +0200, Vlastimil Babka wrote:
> Boo, all the error prints should terminate with \n
> Will wait for feedback before resend.

eek, yes, good catch. :)
Luis Chamberlain March 30, 2020, 10:44 p.m. UTC | #4
Sorry to be late to the apocalypse review party for this, feedback below.

On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
> 
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> 
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables.

"even though we don't handle modular sysctls" might be safer to add.

> Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
> 
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.

This is wonderful when we think about existing sysctls which have
corresponding silly boot params that do the same thing. However, shoving
a boot param capability down every possible built-in sysctl brings
forward support considerations we should take serious, as this would
add a new user interface and we'll have to support it.

Simply put, not all sysctls should be born to be boot params. I suggest
we white-list which ones we can process, so that only sysctls we *do*
review and agree are good candidates get allowed to also be boot params.
Calling a proc hanlder early might seem functional, but if the subsystem
defers evaluation of a setting later, then any boot param set would be
lifted anyway. I think each syscl would need to be reviewed for this to
be supported in a way that doesn't create odd unexpected system settings
which we later have to support forever.

Should we not do this, we'll have to live with the consequences of
supporting the full swoop of sysctls are boot params, whatever
consequences those may be.

> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.

I'm just not sure if its worth supporting these, for modules we have
module params, but those with more creative userspace might have a
better idea as to why we'd want to support this. I just can't see it
yet.

> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  .../admin-guide/kernel-parameters.txt         |   9 ++
>  fs/proc/proc_sysctl.c                         | 100 ++++++++++++++++++
>  include/linux/sysctl.h                        |   4 +
>  init/main.c                                   |   2 +
>  4 files changed, 115 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..81ff626fc700 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>  
>  	switches=	[HW,M68k]
>  
> +	sysctl.*=	[KNL]
> +			Set a sysctl parameter, right before loading the init
> +			process, as if the value was written to the respective
> +			/proc/sys/... file. Both '.' and '/' are recognized as
> +			separators. Unrecognized parameters and invalid values
> +			are reported in the kernel log. Sysctls registered
> +			later by a loaded module cannot be set this way.
> +			Example: sysctl.vm.swappiness=40
> +
>  	sysfs.deprecated=0|1 [KNL]
>  			Enable/disable old style sysfs layout for old udev
>  			on older distributions. When this option is enabled
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c75bb4632ed1..653188c9c4c9 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -14,6 +14,7 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/bpf-cgroup.h>
> +#include <linux/mount.h>
>  #include "internal.h"
>  
>  static const struct dentry_operations proc_sys_dentry_operations;
> @@ -1725,3 +1726,102 @@ int __init proc_sys_init(void)
>  
>  	return sysctl_init();
>  }
> +
> +/* Set sysctl value passed on kernel command line. */
> +static int process_sysctl_arg(char *param, char *val,
> +			       const char *unused, void *arg)
> +{
> +	char *path;
> +	struct vfsmount *proc_mnt = *((struct vfsmount **)arg);
> +	struct file_system_type *proc_fs_type;
> +	struct file *file;
> +	int len;
> +	int err;
> +	loff_t pos = 0;
> +	ssize_t wret;
> +
> +	if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
> +		return 0;
> +
> +	param += sizeof("sysctl") - 1;
> +
> +	if (param[0] != '/' && param[0] != '.')
> +		return 0;
> +
> +	param++;
> +
> +	if (!proc_mnt) {
> +		proc_fs_type = get_fs_type("proc");
> +		if (!proc_fs_type) {
> +			pr_err("Failed to find procfs to set sysctl from command line");
> +			return 0;
> +		}
> +		proc_mnt = kern_mount(proc_fs_type);
> +		put_filesystem(proc_fs_type);
> +		if (IS_ERR(proc_mnt)) {
> +			pr_err("Failed to mount procfs to set sysctl from command line");
> +			return 0;
> +		}
> +		*((struct vfsmount **)arg) = proc_mnt;
> +	}
> +
> +	path = kasprintf(GFP_KERNEL, "sys/%s", param);
> +	if (!path)
> +		panic("%s: Failed to allocate path for %s\n", __func__, param);
> +	strreplace(path, '.', '/');
> +
> +	file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);
> +	if (IS_ERR(file)) {
> +		err = PTR_ERR(file);
> +		if (err == -ENOENT)
> +			pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
> +				param, val);
> +		else if (err == -EACCES)
> +			pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
> +				param, val);
> +		else
> +			pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
> +				file, param, val);
> +		goto out;
> +	}
> +	len = strlen(val);
> +	wret = kernel_write(file, val, len, &pos);
> +	if (wret < 0) {
> +		err = wret;
> +		if (err == -EINVAL)
> +			pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
> +				param, val);
> +		else
> +			pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
> +				ERR_PTR(err), param, val);
> +	} else if (wret != len) {
> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> +			wret, len, path, param, val);
> +	}
> +
> +	err = filp_close(file, NULL);
> +	if (err)
> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> +			ERR_PTR(err), param, val);
> +out:
> +	kfree(path);
> +	return 0;
> +}
> +
> +void do_sysctl_args(void)
> +{
> +	char *command_line;
> +	struct vfsmount *proc_mnt = NULL;
> +
> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);

can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
move __kstrncpy() to a generic kstrncpy().

> +	if (!command_line)
> +		panic("%s: Failed to allocate copy of command line\n", __func__);
> +
> +	parse_args("Setting sysctl args", command_line,
> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> +
> +	if (proc_mnt)
> +		kern_unmount(proc_mnt);
> +
> +	kfree(command_line);
> +}

Then, can we get this tested as part of lib/test_sysctl.c with its
respective tools/testing/selftests/sysctl/sysctl.sh ?

  Luis
Vlastimil Babka March 31, 2020, 7:42 a.m. UTC | #5
On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> Sorry to be late to the apocalypse review party for this, feedback below.
> 
> On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
>> A recently proposed patch to add vm_swappiness command line parameter in
>> addition to existing sysctl [1] made me wonder why we don't have a general
>> support for passing sysctl parameters via command line. Googling found only
>> somebody else wondering the same [2], but I haven't found any prior discussion
>> with reasons why not to do this.
>> 
>> Settings the vm_swappiness issue aside (the underlying issue might be solved in
>> a different way), quick search of kernel-parameters.txt shows there are already
>> some that exist as both sysctl and kernel parameter - hung_task_panic,
>> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
>> would remove the need to add more of those one-offs and might be handy in
>> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
>> 
>> Hence, this patch adds a new parse_args() pass that looks for parameters
>> prefixed by 'sysctl.' and tries to interpret them as writes to the
>> corresponding sys/ files using an temporary in-kernel procfs mount. This
>> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
>> registered sysctl tables.
> 
> "even though we don't handle modular sysctls" might be safer to add.

OK

>> Errors due to e.g. invalid parameter name or value
>> are reported in the kernel log.
>> 
>> The processing is hooked right before the init process is loaded, as some
>> handlers might be more complicated than simple setters and might need some
>> subsystems to be initialized. At the moment the init process can be started and
>> eventually execute a process writing to /proc/sys/ then it should be also fine
>> to do that from the kernel.
> 
> This is wonderful when we think about existing sysctls which have
> corresponding silly boot params that do the same thing. However, shoving
> a boot param capability down every possible built-in sysctl brings
> forward support considerations we should take serious, as this would
> add a new user interface and we'll have to support it.

Hmm, if I boot with an initramfs with init process that does mount /proc and set
some sysctl there as the very first thing, then this will be almost the same
moment as my patch does it. There is no further kernel initialization in
between. So with your logic we already do support all non-modular sysctls to be
set so early.

> Simply put, not all sysctls should be born to be boot params. I suggest
> we white-list which ones we can process, so that only sysctls we *do*
> review and agree are good candidates get allowed to also be boot params.

By above, the nuber of sysctls that will be problematic with this boot param
mechanism, but work properly when set from init process immediately, should be
near zero, and I would expect truly zero. As such, whitelist approach seems
excessive to me and it would take a lot of effort to build it, and it will be a
lottery which sysctl would work as boot param on which kernel version. Sounds
like a lot of trouble for little benefit to me.

> Calling a proc hanlder early might seem functional, but if the subsystem
> defers evaluation of a setting later, then any boot param set would be
> lifted anyway.

I'm not sure I understand, can you show me some example please?

> I think each syscl would need to be reviewed for this to
> be supported in a way that doesn't create odd unexpected system settings
> which we later have to support forever.

We would already do per the initramfs argument.

> Should we not do this, we'll have to live with the consequences of
> supporting the full swoop of sysctls are boot params, whatever
> consequences those may be.

Of course when the first user tries to set some particular sysctl as boot param
and finds and reports it doesn't work as intended, then it can be fixed or
blacklisted and it can't break anyone else?

>> Sysctls registered later on module load time are not set by this mechanism -
>> it's expected that in such scenarios, setting sysctl values from userspace is
>> practical enough.
> 
> I'm just not sure if its worth supporting these, for modules we have
> module params, but those with more creative userspace might have a
> better idea as to why we'd want to support this. I just can't see it
> yet.

Sure, I can defer that part for later now.

>> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
>> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
>> [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---

>
Michal Hocko March 31, 2020, 7:48 a.m. UTC | #6
On Tue 31-03-20 09:42:46, Vlastimil Babka wrote:
[...]
> > Should we not do this, we'll have to live with the consequences of
> > supporting the full swoop of sysctls are boot params, whatever
> > consequences those may be.
> 
> Of course when the first user tries to set some particular sysctl as boot param
> and finds and reports it doesn't work as intended, then it can be fixed or
> blacklisted and it can't break anyone else?

Absolutely agreed. I would be really careful to not overengineer this
whole thing.
Luis Chamberlain March 31, 2020, 2:31 p.m. UTC | #7
On Tue, Mar 31, 2020 at 09:42:46AM +0200, Vlastimil Babka wrote:
> On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > 
> > This is wonderful when we think about existing sysctls which have
> > corresponding silly boot params that do the same thing. However, shoving
> > a boot param capability down every possible built-in sysctl brings
> > forward support considerations we should take serious, as this would
> > add a new user interface and we'll have to support it.
> 
> Hmm, if I boot with an initramfs with init process that does mount /proc and set
> some sysctl there as the very first thing, then this will be almost the same
> moment as my patch does it. There is no further kernel initialization in
> between. So with your logic we already do support all non-modular sysctls to be
> set so early.

Yes, true. Then by all means:

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis
Kees Cook March 31, 2020, 6:26 p.m. UTC | #8
On Tue, Mar 31, 2020 at 09:48:36AM +0200, Michal Hocko wrote:
> On Tue 31-03-20 09:42:46, Vlastimil Babka wrote:
> [...]
> > > Should we not do this, we'll have to live with the consequences of
> > > supporting the full swoop of sysctls are boot params, whatever
> > > consequences those may be.
> > 
> > Of course when the first user tries to set some particular sysctl as boot param
> > and finds and reports it doesn't work as intended, then it can be fixed or
> > blacklisted and it can't break anyone else?
> 
> Absolutely agreed. I would be really careful to not overengineer this
> whole thing.

Right -- this is supposed to be _simple_, and I think that's the primary
benefit here. If we encounter problems we can fix those cases. The
careful place, I think, needs to be with converting existing boot params
to be aliases. That's when timing considerations need to be taken into
account carefully.
Vlastimil Babka April 1, 2020, 11:01 a.m. UTC | #9
On 3/31/20 12:44 AM, Luis Chamberlain wrote:
>> +	} else if (wret != len) {
>> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
>> +			wret, len, path, param, val);
>> +	}
>> +
>> +	err = filp_close(file, NULL);
>> +	if (err)
>> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
>> +			ERR_PTR(err), param, val);
>> +out:
>> +	kfree(path);
>> +	return 0;
>> +}
>> +
>> +void do_sysctl_args(void)
>> +{
>> +	char *command_line;
>> +	struct vfsmount *proc_mnt = NULL;
>> +
>> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> 
> can you use kstrndup() ? And then use kfree_const()? Yes, feel free to

I don't follow, what am I missing? Do you mean this?

size_t len = strlen(saved_command_line);
command_line = kstrndup(saved_command_line, len, GFP_KERNEL);

What would be the advantage over plain kstrdup()?
As for kfree_const(), when would command_line be .rodata? I don't see using
kstrndup() resulting in that.

> move __kstrncpy() to a generic kstrncpy().
> 
>> +	if (!command_line)
>> +		panic("%s: Failed to allocate copy of command line\n", __func__);
>> +
>> +	parse_args("Setting sysctl args", command_line,
>> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
>> +
>> +	if (proc_mnt)
>> +		kern_unmount(proc_mnt);
>> +
>> +	kfree(command_line);
>> +}
> 
> Then, can we get this tested as part of lib/test_sysctl.c with its
> respective tools/testing/selftests/sysctl/sysctl.sh ?

Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
should be build with 'y', which would be needed anyway) and expand the test
instructions so that the test kernel boot has to include it on the command line,
and then I verify it has been set? Or do you see a better way?

Thanks,
Vlastimil

>   Luis
>
Luis Chamberlain April 2, 2020, 4:04 p.m. UTC | #10
On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> >> +	} else if (wret != len) {
> >> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> >> +			wret, len, path, param, val);
> >> +	}
> >> +
> >> +	err = filp_close(file, NULL);
> >> +	if (err)
> >> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> >> +			ERR_PTR(err), param, val);
> >> +out:
> >> +	kfree(path);
> >> +	return 0;
> >> +}
> >> +
> >> +void do_sysctl_args(void)
> >> +{
> >> +	char *command_line;
> >> +	struct vfsmount *proc_mnt = NULL;
> >> +
> >> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > 
> > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> 
> I don't follow, what am I missing? Do you mean this?
> 
> size_t len = strlen(saved_command_line);
> command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> 
> What would be the advantage over plain kstrdup()?
> As for kfree_const(), when would command_line be .rodata? I don't see using
> kstrndup() resulting in that.

The const nature of using kstrdup() comes with using const for your
purpose. ie:

const char *const_command_line = saved_command_line;

The point of a kstrncpy() then is to ensure force a const throughout
your use if you know you don't need modifications.

> > move __kstrncpy() to a generic kstrncpy().
> > 
> >> +	if (!command_line)
> >> +		panic("%s: Failed to allocate copy of command line\n", __func__);
> >> +
> >> +	parse_args("Setting sysctl args", command_line,
> >> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> >> +
> >> +	if (proc_mnt)
> >> +		kern_unmount(proc_mnt);
> >> +
> >> +	kfree(command_line);
> >> +}
> > 
> > Then, can we get this tested as part of lib/test_sysctl.c with its
> > respective tools/testing/selftests/sysctl/sysctl.sh ?
> 
> Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> should be build with 'y', which would be needed anyway) and expand the test
> instructions so that the test kernel boot has to include it on the command line,
> and then I verify it has been set? Or do you see a better way?

We don't necessarily have a way to test the use boot params today.
That reveals an are which we should eventually put some focus on
in the future. In the meantime we have to deal with what we have.

So let's think about this:

You are adding a new cmdline sysctl boot param, and also a wrapper
for those old boot bootparams to also work using both new sysctl
path and old path. Testing just these both should suffice.

How about this:

For testing the new feature you are adding, can you extend the default
boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
upon boot we can verify the proc handlers for these new boot params got
kicked, and likewise some other proc handlers which also can be used
from the cmdline are *not* set. For this later set, we already have
a series of test syctls you can use. In fact, you can use the existing
syctls for both cases already I believe, its just a matter of adding
this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
and these tests would take place *first* on the script.

That would test both cases with one kernel.

You could then also add a bogus new sysctl which also expands to a silly
raw boot param to test the wrapper you are providing. That would be the
only new test syctl you would need to add.

If you can think of a way to *break* your new functionality and ensure
it doesn't break the kernel, even better.

  Luis
Kees Cook April 2, 2020, 5:23 p.m. UTC | #11
On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > >> +	} else if (wret != len) {
> > >> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > >> +			wret, len, path, param, val);
> > >> +	}
> > >> +
> > >> +	err = filp_close(file, NULL);
> > >> +	if (err)
> > >> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > >> +			ERR_PTR(err), param, val);
> > >> +out:
> > >> +	kfree(path);
> > >> +	return 0;
> > >> +}
> > >> +
> > >> +void do_sysctl_args(void)
> > >> +{
> > >> +	char *command_line;
> > >> +	struct vfsmount *proc_mnt = NULL;
> > >> +
> > >> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > > 
> > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> > 
> > I don't follow, what am I missing? Do you mean this?
> > 
> > size_t len = strlen(saved_command_line);
> > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> > 
> > What would be the advantage over plain kstrdup()?
> > As for kfree_const(), when would command_line be .rodata? I don't see using
> > kstrndup() resulting in that.
> 
> The const nature of using kstrdup() comes with using const for your
> purpose. ie:
> 
> const char *const_command_line = saved_command_line;
> 
> The point of a kstrncpy() then is to ensure force a const throughout
> your use if you know you don't need modifications.

I'm not following this suggestion. It _is_ modifying it. That's why it's
making a copy. What am I missing?

> > >> +	parse_args("Setting sysctl args", command_line,
> > >> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > >> +
> > >> +	if (proc_mnt)
> > >> +		kern_unmount(proc_mnt);
> > >> +
> > >> +	kfree(command_line);
> > >> +}
> > > 
> > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> > 
> > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > should be build with 'y', which would be needed anyway) and expand the test
> > instructions so that the test kernel boot has to include it on the command line,
> > and then I verify it has been set? Or do you see a better way?
> 
> We don't necessarily have a way to test the use boot params today.
> That reveals an are which we should eventually put some focus on
> in the future. In the meantime we have to deal with what we have.
> 
> So let's think about this:
> 
> You are adding a new cmdline sysctl boot param, and also a wrapper
> for those old boot bootparams to also work using both new sysctl
> path and old path. Testing just these both should suffice.
> 
> How about this:
> 
> For testing the new feature you are adding, can you extend the default
> boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> upon boot we can verify the proc handlers for these new boot params got
> kicked, and likewise some other proc handlers which also can be used
> from the cmdline are *not* set. For this later set, we already have
> a series of test syctls you can use. In fact, you can use the existing
> syctls for both cases already I believe, its just a matter of adding
> this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> and these tests would take place *first* on the script.

This seems... messy. I'm all for testing this, but I'd rather this not
be internally driven. This is an external interface (boot params), so
I'd rather an external driver handle that testing. We don't have a
common method to do that with the kernel, though.

> That would test both cases with one kernel.
> 
> You could then also add a bogus new sysctl which also expands to a silly
> raw boot param to test the wrapper you are providing. That would be the
> only new test syctl you would need to add.

Sure, that seems reasonable. Supporting externally driven testing makes
sense for this.
Luis Chamberlain April 2, 2020, 8:59 p.m. UTC | #12
On Thu, Apr 02, 2020 at 10:23:13AM -0700, Kees Cook wrote:
> On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> > On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > > >> +	} else if (wret != len) {
> > > >> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > > >> +			wret, len, path, param, val);
> > > >> +	}
> > > >> +
> > > >> +	err = filp_close(file, NULL);
> > > >> +	if (err)
> > > >> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > > >> +			ERR_PTR(err), param, val);
> > > >> +out:
> > > >> +	kfree(path);
> > > >> +	return 0;
> > > >> +}
> > > >> +
> > > >> +void do_sysctl_args(void)
> > > >> +{
> > > >> +	char *command_line;
> > > >> +	struct vfsmount *proc_mnt = NULL;
> > > >> +
> > > >> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > > > 
> > > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> > > 
> > > I don't follow, what am I missing? Do you mean this?
> > > 
> > > size_t len = strlen(saved_command_line);
> > > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> > > 
> > > What would be the advantage over plain kstrdup()?
> > > As for kfree_const(), when would command_line be .rodata? I don't see using
> > > kstrndup() resulting in that.
> > 
> > The const nature of using kstrdup() comes with using const for your
> > purpose. ie:
> > 
> > const char *const_command_line = saved_command_line;
> > 
> > The point of a kstrncpy() then is to ensure force a const throughout
> > your use if you know you don't need modifications.
> 
> I'm not following this suggestion. It _is_ modifying it. That's why it's
> making a copy. What am I missing?

We modify the copied bootparams to allow new sysctls to map to old boot params?

If so, then yes, this cannot be used.

> > > >> +	parse_args("Setting sysctl args", command_line,
> > > >> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > > >> +
> > > >> +	if (proc_mnt)
> > > >> +		kern_unmount(proc_mnt);
> > > >> +
> > > >> +	kfree(command_line);
> > > >> +}
> > > > 
> > > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> > > 
> > > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > > should be build with 'y', which would be needed anyway) and expand the test
> > > instructions so that the test kernel boot has to include it on the command line,
> > > and then I verify it has been set? Or do you see a better way?
> > 
> > We don't necessarily have a way to test the use boot params today.
> > That reveals an are which we should eventually put some focus on
> > in the future. In the meantime we have to deal with what we have.
> > 
> > So let's think about this:
> > 
> > You are adding a new cmdline sysctl boot param, and also a wrapper
> > for those old boot bootparams to also work using both new sysctl
> > path and old path. Testing just these both should suffice.
> > 
> > How about this:
> > 
> > For testing the new feature you are adding, can you extend the default
> > boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> > upon boot we can verify the proc handlers for these new boot params got
> > kicked, and likewise some other proc handlers which also can be used
> > from the cmdline are *not* set. For this later set, we already have
> > a series of test syctls you can use. In fact, you can use the existing
> > syctls for both cases already I believe, its just a matter of adding
> > this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> > and these tests would take place *first* on the script.
> 
> This seems... messy.

It is all we have.


> I'm all for testing this,

OK so we do want to test it.

> but I'd rather this not be internally driven.

This is the least cumbersome solution I could think of. Other things
would require things like using qemu, etc. That seems much more messsy.

> This is an external interface (boot params), so
> I'd rather an external driver handle that testing. We don't have a
> common method to do that with the kernel, though.

Right... which begs the question now -- how do we test this sort of
stuff? The above would at least get us coverage while we iron something
more generic out for boot params.

> > That would test both cases with one kernel.
> > 
> > You could then also add a bogus new sysctl which also expands to a silly
> > raw boot param to test the wrapper you are providing. That would be the
> > only new test syctl you would need to add.
> 
> Sure, that seems reasonable. Supporting externally driven testing makes
> sense for this.

But again, what exactly?

  Luis
Kees Cook April 3, 2020, 11:57 p.m. UTC | #13
On Thu, Apr 02, 2020 at 08:59:32PM +0000, Luis Chamberlain wrote:
> On Thu, Apr 02, 2020 at 10:23:13AM -0700, Kees Cook wrote:
> > On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> > > On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > > > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > > > >> +	} else if (wret != len) {
> > > > >> +		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > > > >> +			wret, len, path, param, val);
> > > > >> +	}
> > > > >> +
> > > > >> +	err = filp_close(file, NULL);
> > > > >> +	if (err)
> > > > >> +		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > > > >> +			ERR_PTR(err), param, val);
> > > > >> +out:
> > > > >> +	kfree(path);
> > > > >> +	return 0;
> > > > >> +}
> > > > >> +
> > > > >> +void do_sysctl_args(void)
> > > > >> +{
> > > > >> +	char *command_line;
> > > > >> +	struct vfsmount *proc_mnt = NULL;
> > > > >> +
> > > > >> +	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > > > > 
> > > > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> > > > 
> > > > I don't follow, what am I missing? Do you mean this?
> > > > 
> > > > size_t len = strlen(saved_command_line);
> > > > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> > > > 
> > > > What would be the advantage over plain kstrdup()?
> > > > As for kfree_const(), when would command_line be .rodata? I don't see using
> > > > kstrndup() resulting in that.
> > > 
> > > The const nature of using kstrdup() comes with using const for your
> > > purpose. ie:
> > > 
> > > const char *const_command_line = saved_command_line;
> > > 
> > > The point of a kstrncpy() then is to ensure force a const throughout
> > > your use if you know you don't need modifications.
> > 
> > I'm not following this suggestion. It _is_ modifying it. That's why it's
> > making a copy. What am I missing?
> 
> We modify the copied bootparams to allow new sysctls to map to old boot params?
> 
> If so, then yes, this cannot be used.

I feel like I've lost track of this thread. This strdup is so that the
command line can have '\0's injected while it steps through the args
(and for doing the . and / replacement). I don't know what you mean by
"map" here: this is standard parse_args() usage.

> > > > >> +	parse_args("Setting sysctl args", command_line,
> > > > >> +		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > > > >> +
> > > > >> +	if (proc_mnt)
> > > > >> +		kern_unmount(proc_mnt);
> > > > >> +
> > > > >> +	kfree(command_line);
> > > > >> +}
> > > > > 
> > > > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> > > > 
> > > > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > > > should be build with 'y', which would be needed anyway) and expand the test
> > > > instructions so that the test kernel boot has to include it on the command line,
> > > > and then I verify it has been set? Or do you see a better way?
> > > 
> > > We don't necessarily have a way to test the use boot params today.
> > > That reveals an are which we should eventually put some focus on
> > > in the future. In the meantime we have to deal with what we have.
> > > 
> > > So let's think about this:
> > > 
> > > You are adding a new cmdline sysctl boot param, and also a wrapper
> > > for those old boot bootparams to also work using both new sysctl
> > > path and old path. Testing just these both should suffice.
> > > 
> > > How about this:
> > > 
> > > For testing the new feature you are adding, can you extend the default
> > > boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> > > upon boot we can verify the proc handlers for these new boot params got
> > > kicked, and likewise some other proc handlers which also can be used
> > > from the cmdline are *not* set. For this later set, we already have
> > > a series of test syctls you can use. In fact, you can use the existing
> > > syctls for both cases already I believe, its just a matter of adding
> > > this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> > > and these tests would take place *first* on the script.
> > 
> > This seems... messy.
> 
> It is all we have.
> > I'm all for testing this,
> 
> OK so we do want to test it.
> 
> > but I'd rather this not be internally driven.
> 
> This is the least cumbersome solution I could think of. Other things
> would require things like using qemu, etc. That seems much more messsy.

Yes. Doing an internal extension isn't testing the actual code.

> 
> > This is an external interface (boot params), so
> > I'd rather an external driver handle that testing. We don't have a
> > common method to do that with the kernel, though.
> 
> Right... which begs the question now -- how do we test this sort of
> stuff? The above would at least get us coverage while we iron something
> more generic out for boot params.
> 
> > > That would test both cases with one kernel.
> > > 
> > > You could then also add a bogus new sysctl which also expands to a silly
> > > raw boot param to test the wrapper you are providing. That would be the
> > > only new test syctl you would need to add.
> > 
> > Sure, that seems reasonable. Supporting externally driven testing makes
> > sense for this.
> 
> But again, what exactly?

I don't think anything is needed for this series. It can be boot tested
manually.
Luis Chamberlain April 6, 2020, 2:08 p.m. UTC | #14
On Fri, Apr 03, 2020 at 04:57:51PM -0700, Kees Cook wrote:
> On Thu, Apr 02, 2020 at 08:59:32PM +0000, Luis Chamberlain wrote:
> > We modify the copied bootparams to allow new sysctls to map to old boot params?
> 
> This strdup is so that the
> command line can have '\0's injected while it steps through the args
> (and for doing the . and / replacement).

Please ignore the const feedback then.

> > This is the least cumbersome solution I could think of. Other things
> > would require things like using qemu, etc. That seems much more messsy.
> 
> Yes. Doing an internal extension isn't testing the actual code.

But it would.

> > > This is an external interface (boot params), so
> > > I'd rather an external driver handle that testing. We don't have a
> > > common method to do that with the kernel, though.
> > 
> > Right... which begs the question now -- how do we test this sort of
> > stuff? The above would at least get us coverage while we iron something
> > more generic out for boot params.
> > 
> > > > That would test both cases with one kernel.
> > > > 
> > > > You could then also add a bogus new sysctl which also expands to a silly
> > > > raw boot param to test the wrapper you are providing. That would be the
> > > > only new test syctl you would need to add.
> > > 
> > > Sure, that seems reasonable. Supporting externally driven testing makes
> > > sense for this.
> > 
> > But again, what exactly?
> 
> I don't think anything is needed for this series. It can be boot tested
> manually.

Why test it manually when it could be tested automatically with a new kconfig?

  Luis
Kees Cook April 6, 2020, 3:58 p.m. UTC | #15
On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> > Yes. Doing an internal extension isn't testing the actual code.
> 
> But it would.
> 
> [...]
> > I don't think anything is needed for this series. It can be boot tested
> > manually.
> 
> Why test it manually when it could be tested automatically with a new kconfig?

So, my impression is that adding code to the internals to test the
internals isn't a valid test (or at least makes it fragile) because the
test would depend on the changes to the internals (or at least depend on
non-default non-production CONFIGs).

Can you send a patch for what you think this should look like? Perhaps
I'm not correctly imagining what you're describing?

Regardless of testing, I think this series is ready for -mm.
Luis Chamberlain April 6, 2020, 5:08 p.m. UTC | #16
On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> > > Yes. Doing an internal extension isn't testing the actual code.
> > 
> > But it would.
> > 
> > [...]
> > > I don't think anything is needed for this series. It can be boot tested
> > > manually.
> > 
> > Why test it manually when it could be tested automatically with a new kconfig?
> 
> So, my impression is that adding code to the internals to test the
> internals isn't a valid test (or at least makes it fragile) because the
> test would depend on the changes to the internals (or at least depend on
> non-default non-production CONFIGs).

The *internal* aspect here is an extension to boot params under a
kconfig which would simply append to it, as if the user would have
added some more params. Since we already have test sysctl params the
only one we'd need to add on the test driver would be a dummy one which
tests the alias, on the second patch. We should have enough sysctls to
already test dummy values.

Nothing else would be needed as the sysctl test driver would just need
to test that the values expected when this is enabled is set.

> Can you send a patch for what you think this should look like? Perhaps
> I'm not correctly imagining what you're describing?

I rather get the person involved in the changes to do the testing so
as they're the ones designing the feature. If however it is not clear
what I mean I'm happy to elaborate.

Vlastimil do you get what I mean?

> Regardless of testing, I think this series is ready for -mm.

I'm happy for it to go in provided we at least devise a follow up plan
for testing. Otherwise -- like other things, it won't get done.

  Luis
Vlastimil Babka April 14, 2020, 11:25 a.m. UTC | #17
On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
>> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
>> > > Yes. Doing an internal extension isn't testing the actual code.
>> > 
>> > But it would.
>> > 
>> > [...]
>> > > I don't think anything is needed for this series. It can be boot tested
>> > > manually.
>> > 
>> > Why test it manually when it could be tested automatically with a new kconfig?
>> 
>> So, my impression is that adding code to the internals to test the
>> internals isn't a valid test (or at least makes it fragile) because the
>> test would depend on the changes to the internals (or at least depend on
>> non-default non-production CONFIGs).
> 
> The *internal* aspect here is an extension to boot params under a
> kconfig which would simply append to it, as if the user would have

So there's no such kconfig yet to apply boot parameters specified by configure,
right? That would itself be a new feature. Or could we use bootconfig? (CC Masami)

> added some more params. Since we already have test sysctl params the
> only one we'd need to add on the test driver would be a dummy one which
> tests the alias, on the second patch. We should have enough sysctls to
> already test dummy values.

Right.

> Nothing else would be needed as the sysctl test driver would just need
> to test that the values expected when this is enabled is set.
> 
>> Can you send a patch for what you think this should look like? Perhaps
>> I'm not correctly imagining what you're describing?
> 
> I rather get the person involved in the changes to do the testing so
> as they're the ones designing the feature. If however it is not clear
> what I mean I'm happy to elaborate.
> 
> Vlastimil do you get what I mean?

Hopefully :)

>> Regardless of testing, I think this series is ready for -mm.
> 
> I'm happy for it to go in provided we at least devise a follow up plan
> for testing. Otherwise -- like other things, it won't get done.

OK I'll send a v2 and we can discuss the test driver details. I don't want to
neglect testing, but also not block the new functionality, in case it means
testing means adding another new functionality.

Thanks,
Vlastimil

>   Luis
>
Masami Hiramatsu (Google) April 15, 2020, 3:23 a.m. UTC | #18
On Tue, 14 Apr 2020 13:25:07 +0200
Vlastimil Babka <vbabka@suse.cz> wrote:

> On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> > On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> >> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> >> > > Yes. Doing an internal extension isn't testing the actual code.
> >> > 
> >> > But it would.
> >> > 
> >> > [...]
> >> > > I don't think anything is needed for this series. It can be boot tested
> >> > > manually.
> >> > 
> >> > Why test it manually when it could be tested automatically with a new kconfig?
> >> 
> >> So, my impression is that adding code to the internals to test the
> >> internals isn't a valid test (or at least makes it fragile) because the
> >> test would depend on the changes to the internals (or at least depend on
> >> non-default non-production CONFIGs).
> > 
> > The *internal* aspect here is an extension to boot params under a
> > kconfig which would simply append to it, as if the user would have
> 
> So there's no such kconfig yet to apply boot parameters specified by configure,
> right? That would itself be a new feature. Or could we use bootconfig? (CC Masami)

Yes, I think you can use bootconfig to add this feature more flexibly.
I think your patch is easily modified to use bootconfig. :)

Thank you,
Luis Chamberlain April 15, 2020, 6:08 a.m. UTC | #19
On Tue, Apr 14, 2020 at 01:25:07PM +0200, Vlastimil Babka wrote:
> On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> > On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> >> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> >> > > Yes. Doing an internal extension isn't testing the actual code.
> >> > 
> >> > But it would.
> >> > 
> >> > [...]
> >> > > I don't think anything is needed for this series. It can be boot tested
> >> > > manually.
> >> > 
> >> > Why test it manually when it could be tested automatically with a new kconfig?
> >> 
> >> So, my impression is that adding code to the internals to test the
> >> internals isn't a valid test (or at least makes it fragile) because the
> >> test would depend on the changes to the internals (or at least depend on
> >> non-default non-production CONFIGs).
> > 
> > The *internal* aspect here is an extension to boot params under a
> > kconfig which would simply append to it, as if the user would have
> 
> So there's no such kconfig yet to apply boot parameters specified by configure,
> right? That would itself be a new feature.

I cannot say, there is no easy grammatical expression for this. For
kernel testing, no, I am not aware of one.

> Or could we use bootconfig? (CC Masami)

Neat, yeah, that seems like a set of helpers of which could help for
sure.

> >> Regardless of testing, I think this series is ready for -mm.
> > 
> > I'm happy for it to go in provided we at least devise a follow up plan
> > for testing. Otherwise -- like other things, it won't get done.
> 
> OK I'll send a v2 and we can discuss the test driver details. I don't want to
> neglect testing, but also not block the new functionality,

So long as the testing part gets done, I'm happy :)

> in case it means
> testing means adding another new functionality.

Yeah, I can see how some new code may need to be added for that, its a
good point.

But think about this for a second, if we *didn't* have such code added,
how could it have easily been tested before? The question is rhetorical,
as I'm alluding to that a lot of old code wasn't designed for easy unit
testing either, and I'd invite one to consider the value of the change
in philosophy on first code design when one *does* take that into
consideration. There are certain best practices that I'd wager are
probably good for us to evaluate for the long term of kernel evolution,
and I think that always advocating designing testing around new
functionality would be one.

Once and if you do add your testing, and if such testing is expanded to
test parsing the cmdline further, and to purposely break it, who knows
what other bugs bugs we'll find.

But maybe Masami already uncovered all the fun bugs.

  Luis
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d230bc..81ff626fc700 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4793,6 +4793,15 @@ 
 
 	switches=	[HW,M68k]
 
+	sysctl.*=	[KNL]
+			Set a sysctl parameter, right before loading the init
+			process, as if the value was written to the respective
+			/proc/sys/... file. Both '.' and '/' are recognized as
+			separators. Unrecognized parameters and invalid values
+			are reported in the kernel log. Sysctls registered
+			later by a loaded module cannot be set this way.
+			Example: sysctl.vm.swappiness=40
+
 	sysfs.deprecated=0|1 [KNL]
 			Enable/disable old style sysfs layout for old udev
 			on older distributions. When this option is enabled
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c75bb4632ed1..653188c9c4c9 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -14,6 +14,7 @@ 
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/bpf-cgroup.h>
+#include <linux/mount.h>
 #include "internal.h"
 
 static const struct dentry_operations proc_sys_dentry_operations;
@@ -1725,3 +1726,102 @@  int __init proc_sys_init(void)
 
 	return sysctl_init();
 }
+
+/* Set sysctl value passed on kernel command line. */
+static int process_sysctl_arg(char *param, char *val,
+			       const char *unused, void *arg)
+{
+	char *path;
+	struct vfsmount *proc_mnt = *((struct vfsmount **)arg);
+	struct file_system_type *proc_fs_type;
+	struct file *file;
+	int len;
+	int err;
+	loff_t pos = 0;
+	ssize_t wret;
+
+	if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
+		return 0;
+
+	param += sizeof("sysctl") - 1;
+
+	if (param[0] != '/' && param[0] != '.')
+		return 0;
+
+	param++;
+
+	if (!proc_mnt) {
+		proc_fs_type = get_fs_type("proc");
+		if (!proc_fs_type) {
+			pr_err("Failed to find procfs to set sysctl from command line");
+			return 0;
+		}
+		proc_mnt = kern_mount(proc_fs_type);
+		put_filesystem(proc_fs_type);
+		if (IS_ERR(proc_mnt)) {
+			pr_err("Failed to mount procfs to set sysctl from command line");
+			return 0;
+		}
+		*((struct vfsmount **)arg) = proc_mnt;
+	}
+
+	path = kasprintf(GFP_KERNEL, "sys/%s", param);
+	if (!path)
+		panic("%s: Failed to allocate path for %s\n", __func__, param);
+	strreplace(path, '.', '/');
+
+	file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		if (err == -ENOENT)
+			pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
+				param, val);
+		else if (err == -EACCES)
+			pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
+				param, val);
+		else
+			pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
+				file, param, val);
+		goto out;
+	}
+	len = strlen(val);
+	wret = kernel_write(file, val, len, &pos);
+	if (wret < 0) {
+		err = wret;
+		if (err == -EINVAL)
+			pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
+				param, val);
+		else
+			pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
+				ERR_PTR(err), param, val);
+	} else if (wret != len) {
+		pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
+			wret, len, path, param, val);
+	}
+
+	err = filp_close(file, NULL);
+	if (err)
+		pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
+			ERR_PTR(err), param, val);
+out:
+	kfree(path);
+	return 0;
+}
+
+void do_sysctl_args(void)
+{
+	char *command_line;
+	struct vfsmount *proc_mnt = NULL;
+
+	command_line = kstrdup(saved_command_line, GFP_KERNEL);
+	if (!command_line)
+		panic("%s: Failed to allocate copy of command line\n", __func__);
+
+	parse_args("Setting sysctl args", command_line,
+		   NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
+
+	if (proc_mnt)
+		kern_unmount(proc_mnt);
+
+	kfree(command_line);
+}
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 02fa84493f23..5f3f2a00d75f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -206,6 +206,7 @@  struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 void unregister_sysctl_table(struct ctl_table_header * table);
 
 extern int sysctl_init(void);
+void do_sysctl_args(void);
 
 extern struct ctl_table sysctl_mount_point[];
 
@@ -236,6 +237,9 @@  static inline void setup_sysctl_set(struct ctl_table_set *p,
 {
 }
 
+void do_sysctl_args(void)
+{
+}
 #endif /* CONFIG_SYSCTL */
 
 int sysctl_max_threads(struct ctl_table *table, int write,
diff --git a/init/main.c b/init/main.c
index ee4947af823f..a91ea166a731 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1367,6 +1367,8 @@  static int __ref kernel_init(void *unused)
 
 	rcu_end_inkernel_boot();
 
+	do_sysctl_args();
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)