diff mbox series

[v3,3/5] fs: Enable to enforce noexec mounts or file exec through RESOLVE_MAYEXEC

Message ID 20200428175129.634352-4-mic@digikod.net (mailing list archive)
State New, archived
Headers show
Series Add support for RESOLVE_MAYEXEC | expand

Commit Message

Mickaël Salaün April 28, 2020, 5:51 p.m. UTC
Enable to either propagate the mount options from the underlying VFS
mount to prevent execution, or to propagate the file execute permission.
This may allow a script interpreter to check execution permissions
before reading commands from a file.

The main goal is to be able to protect the kernel by restricting
arbitrary syscalls that an attacker could perform with a crafted binary
or certain script languages.  It also improves multilevel isolation
by reducing the ability of an attacker to use side channels with
specific code.  These restrictions can natively be enforced for ELF
binaries (with the noexec mount option) but require this kernel
extension to properly handle scripts (e.g., Python, Perl).

Add a new sysctl fs.open_mayexec_enforce to control this behavior.
Indeed, because of compatibility with installed systems, only the system
administrator is able to check that this new enforcement is in line with
the system mount points and file permissions.  A following patch adds
documentation.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Philippe Trébuchet <philippe.trebuchet@ssi.gouv.fr>
Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
---

Changes since v2:
* Cosmetic changes.

Changes since v1:
* Move code from Yama to the FS subsystem (suggested by Kees Cook).
* Make omayexec_inode_permission() static (suggested by Jann Horn).
* Use mode 0600 for the sysctl.
* Only match regular files (not directories nor other types), which
  follows the same semantic as commit 73601ea5b7b1 ("fs/open.c: allow
  opening only regular files during execve()").
---
 fs/namei.c         | 72 +++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h |  3 ++
 kernel/sysctl.c    |  7 +++++
 3 files changed, 81 insertions(+), 1 deletion(-)

Comments

James Morris May 1, 2020, 4:22 a.m. UTC | #1
On Tue, 28 Apr 2020, Mickaël Salaün wrote:

> Enable to either propagate the mount options from the underlying VFS
> mount to prevent execution, or to propagate the file execute permission.
> This may allow a script interpreter to check execution permissions
> before reading commands from a file.

I'm finding the description of this patch difficult to understand.

In the first case, this seems to mean: if you open a file with 
RESOLVE_MAYEXEC from a noexec mount, then it will fail. Correct?

In the second case, do you mean a RESOLVE_MAYEXEC open will fail if the 
file does not have +x set for the user?


> The main goal is to be able to protect the kernel by restricting
> arbitrary syscalls that an attacker could perform with a crafted binary
> or certain script languages.

This sounds like the job of seccomp. Why is this part of MAYEXEC?

>  It also improves multilevel isolation
> by reducing the ability of an attacker to use side channels with
> specific code.  These restrictions can natively be enforced for ELF
> binaries (with the noexec mount option) but require this kernel
> extension to properly handle scripts (e.g., Python, Perl).

Again, not sure why you're talking about side channels and MAYEXEC and 
mount options. Are you more generally talking about being able to prevent 
execution of arbitrary script files included by an interpreter?

> Add a new sysctl fs.open_mayexec_enforce to control this behavior.
> Indeed, because of compatibility with installed systems, only the system
> administrator is able to check that this new enforcement is in line with
> the system mount points and file permissions.  A following patch adds
> documentation.

I don't like the idea of any of this feature set being configurable. 
RESOLVE_MAYEXEC as a new flag should have well-defined, stable semantics.
Mickaël Salaün May 1, 2020, 2:32 p.m. UTC | #2
On 01/05/2020 06:22, James Morris wrote:
> On Tue, 28 Apr 2020, Mickaël Salaün wrote:
> 
>> Enable to either propagate the mount options from the underlying VFS
>> mount to prevent execution, or to propagate the file execute permission.
>> This may allow a script interpreter to check execution permissions
>> before reading commands from a file.
> 
> I'm finding the description of this patch difficult to understand.
> 
> In the first case, this seems to mean: if you open a file with 
> RESOLVE_MAYEXEC from a noexec mount, then it will fail. Correct?

Yes.

> 
> In the second case, do you mean a RESOLVE_MAYEXEC open will fail if the 
> file does not have +x set for the user?

Yes, and this is still in the hands sysadmins.

As explain in the documentation patch, the sysctl takes a bitfield
consisting of "mount !noexec" or "file exec permission". It is not an
exclusive OR. The "file exec permission" could be seen as a more strict
restriction than only the mount one.

> 
> 
>> The main goal is to be able to protect the kernel by restricting
>> arbitrary syscalls that an attacker could perform with a crafted binary
>> or certain script languages.
> 
> This sounds like the job of seccomp. Why is this part of MAYEXEC?

The initial goal of O_MAYEXEC (in CLIP OS 4) is to prevent untrusted
code execution. Code execution leads to system calls, which could lead
to data access/modification, and kernel attacks. seccomp can't enable
execution of only some files, it only enables an on/off execution policy
(i.e. by filtering execve(2) ).

> 
>>  It also improves multilevel isolation
>> by reducing the ability of an attacker to use side channels with
>> specific code.  These restrictions can natively be enforced for ELF
>> binaries (with the noexec mount option) but require this kernel
>> extension to properly handle scripts (e.g., Python, Perl).
> 
> Again, not sure why you're talking about side channels and MAYEXEC and 
> mount options. Are you more generally talking about being able to prevent 
> execution of arbitrary script files included by an interpreter?

Yes, I explain the thread model, especially the risk of scripts.
Multilevel isolation can be bypassed simply by using CPU instructions
(e.g. side channel attacks), not even syscalls. The ability to execute
arbitrary code, including advanced scripting languages, can lead to this
kind of CPU attacks.

> 
>> Add a new sysctl fs.open_mayexec_enforce to control this behavior.
>> Indeed, because of compatibility with installed systems, only the system
>> administrator is able to check that this new enforcement is in line with
>> the system mount points and file permissions.  A following patch adds
>> documentation.
> 
> I don't like the idea of any of this feature set being configurable. 
> RESOLVE_MAYEXEC as a new flag should have well-defined, stable semantics.

Unfortunately, as discussed in the v2 email thread [1], if we impose
such restrictions with the O_MAYEXEC flag, then almost no userspace
application will use it because it could cause unexpected behavior.
Indeed, the application developers may not know the configuration of the
system their applications will be run on. Only sysadmins know the mount
points configuration (either they are have noexec or not), and if the
set of application installed on the system works well with the set of
files (e.g. according to their permissions). It is the same issue as
with a system-wide LSM policy configuration. Processes should be able to
tell their use of a file, but the sysadmin using common distro binaries
should be able to control the execution policy according to the maturity
of the current system installation.

However, for fully controlled distros such as CLIP OS, it make sense to
enforce such restrictions at kernel build time. I can add an alternative
kernel configuration to enforce a particular policy at boot and disable
this sysctl.

[1]
https://lore.kernel.org/lkml/1fbf54f6-7597-3633-a76c-11c4b2481add@ssi.gouv.fr/
James Morris May 1, 2020, 6:05 p.m. UTC | #3
On Fri, 1 May 2020, Mickaël Salaün wrote:

> 
> However, for fully controlled distros such as CLIP OS, it make sense to
> enforce such restrictions at kernel build time. I can add an alternative
> kernel configuration to enforce a particular policy at boot and disable
> this sysctl.

Sounds good.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 33b6d372e74a..dbf56de1fbe8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -39,6 +39,7 @@ 
 #include <linux/bitops.h>
 #include <linux/init_task.h>
 #include <linux/uaccess.h>
+#include <linux/sysctl.h>
 
 #include "internal.h"
 #include "mount.h"
@@ -411,10 +412,40 @@  static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
 	return 0;
 }
 
+#define OMAYEXEC_ENFORCE_NONE	0
+#define OMAYEXEC_ENFORCE_MOUNT	(1 << 0)
+#define OMAYEXEC_ENFORCE_FILE	(1 << 1)
+#define _OMAYEXEC_LAST		OMAYEXEC_ENFORCE_FILE
+#define _OMAYEXEC_MASK		((_OMAYEXEC_LAST << 1) - 1)
+
+/**
+ * omayexec_inode_permission - Check RESOLVE_MAYEXEC before accessing an inode
+ *
+ * @inode: Inode to check permission on
+ * @mask: Right to check for (%MAY_OPENEXEC, %MAY_EXECMOUNT, %MAY_EXEC)
+ *
+ * Returns 0 if access is permitted, -EACCES otherwise.
+ */
+static inline int omayexec_inode_permission(struct inode *inode, int mask)
+{
+	if (!(mask & MAY_OPENEXEC))
+		return 0;
+
+	if ((sysctl_omayexec_enforce & OMAYEXEC_ENFORCE_MOUNT) &&
+			!(mask & MAY_EXECMOUNT))
+		return -EACCES;
+
+	if (sysctl_omayexec_enforce & OMAYEXEC_ENFORCE_FILE)
+		return generic_permission(inode, MAY_EXEC);
+
+	return 0;
+}
+
 /**
  * inode_permission - Check for access rights to a given inode
  * @inode: Inode to check permission on
- * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC, %MAY_OPENEXEC,
+ *        %MAY_EXECMOUNT)
  *
  * Check for read/write/execute permissions on an inode.  We use fs[ug]id for
  * this, letting us set arbitrary permissions for filesystem access without
@@ -454,10 +485,48 @@  int inode_permission(struct inode *inode, int mask)
 	if (retval)
 		return retval;
 
+	retval = omayexec_inode_permission(inode, mask);
+	if (retval)
+		return retval;
+
 	return security_inode_permission(inode, mask);
 }
 EXPORT_SYMBOL(inode_permission);
 
+/*
+ * Handle open_mayexec_enforce sysctl
+ */
+#ifdef CONFIG_SYSCTL
+int proc_omayexec(struct ctl_table *table, int write, void __user *buffer,
+		size_t *lenp, loff_t *ppos)
+{
+	int error;
+
+	if (write) {
+		struct ctl_table table_copy;
+		int tmp_mayexec_enforce;
+
+		if (!capable(CAP_MAC_ADMIN))
+			return -EPERM;
+		tmp_mayexec_enforce = *((int *)table->data);
+		table_copy = *table;
+		/* Do not erase sysctl_omayexec_enforce. */
+		table_copy.data = &tmp_mayexec_enforce;
+		error = proc_dointvec(&table_copy, write, buffer, lenp, ppos);
+		if (error)
+			return error;
+		if ((tmp_mayexec_enforce | _OMAYEXEC_MASK) != _OMAYEXEC_MASK)
+			return -EINVAL;
+		*((int *)table->data) = tmp_mayexec_enforce;
+	} else {
+		error = proc_dointvec(table, write, buffer, lenp, ppos);
+		if (error)
+			return error;
+	}
+	return 0;
+}
+#endif
+
 /**
  * path_get - get a reference to a path
  * @path: path to get the reference to
@@ -922,6 +991,7 @@  int sysctl_protected_symlinks __read_mostly = 0;
 int sysctl_protected_hardlinks __read_mostly = 0;
 int sysctl_protected_fifos __read_mostly;
 int sysctl_protected_regular __read_mostly;
+int sysctl_omayexec_enforce __read_mostly = OMAYEXEC_ENFORCE_NONE;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9213147d8636..850c98276b6e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,6 +83,7 @@  extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
 extern int sysctl_protected_fifos;
 extern int sysctl_protected_regular;
+extern int sysctl_omayexec_enforce;
 
 typedef __kernel_rwf_t rwf_t;
 
@@ -3545,6 +3546,8 @@  int proc_nr_dentry(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 int proc_nr_inodes(struct ctl_table *table, int write,
 		   void __user *buffer, size_t *lenp, loff_t *ppos);
+int proc_omayexec(struct ctl_table *table, int write, void __user *buffer,
+		size_t *lenp, loff_t *ppos);
 int __init get_filesystem_list(char *buf);
 
 #define __FMODE_EXEC		((__force int) FMODE_EXEC)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8727a3..911afa69f84c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1892,6 +1892,13 @@  static struct ctl_table fs_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &two,
 	},
+	{
+		.procname       = "open_mayexec_enforce",
+		.data           = &sysctl_omayexec_enforce,
+		.maxlen         = sizeof(int),
+		.mode           = 0600,
+		.proc_handler   = proc_omayexec,
+	},
 #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
 	{
 		.procname	= "binfmt_misc",