diff mbox series

[v1,1/1] fs: Allow no_new_privs tasks to call chroot(2)

Message ID 20210310161000.382796-2-mic@digikod.net (mailing list archive)
State New, archived
Headers show
Series Unprivileged chroot | expand

Commit Message

Mickaël Salaün March 10, 2021, 4:10 p.m. UTC
From: Mickaël Salaün <mic@linux.microsoft.com>

Being able to easily change root directories enable to ease some
development workflow and can be used as a tool to strengthen
unprivileged security sandboxes.  chroot(2) is not an access-control
mechanism per se, but it can be used to limit the absolute view of the
filesystem, and then limit ways to access data and kernel interfaces
(e.g. /proc, /sys, /dev, etc.).

Users may not wish to expose namespace complexity to potentially
malicious processes, or limit their use because of limited resources.
The chroot feature is much more simple (and limited) than the mount
namespace, but can still be useful.  As for containers, users of
chroot(2) should take care of file descriptors or data accessible by
other means (e.g. current working directory, leaked FDs, passed FDs,
devices, mount points, etc.).  There is a lot of literature that discuss
the limitations of chroot, and users of this feature should be aware of
the multiple ways to bypass it.  Using chroot(2) for security purposes
can make sense if it is combined with other features (e.g. dedicated
user, seccomp, LSM access-controls, etc.).

One could argue that chroot(2) is useless without a properly populated
root hierarchy (i.e. without /dev and /proc).  However, there are
multiple use cases that don't require the chrooting process to create
file hierarchies with special files nor mount points, e.g.:
* A process sandboxing itself, once all its libraries are loaded, may
  not need files other than regular files, or even no file at all.
* Some pre-populated root hierarchies could be used to chroot into,
  provided for instance by development environments or tailored
  distributions.
* Processes executed in a chroot may not require access to these special
  files (e.g. with minimal runtimes, or by emulating some special files
  with a LD_PRELOADed library or seccomp).

Allowing a task to change its own root directory is not a threat to the
system if we can prevent confused deputy attacks, which could be
performed through execution of SUID-like binaries.  This can be
prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
prctl(2).  To only affect this task, its filesystem information must not
be shared with other tasks, which can be achieved by not passing
CLONE_FS to clone(2).  A similar no_new_privs check is already used by
seccomp to avoid the same kind of security issues.  Furthermore, because
of its security use and to avoid giving a new way for attackers to get
out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is
only allowed if the new root directory is the same or beneath the
current one.  This still allows a process to use a subset of its
legitimate filesystem to chroot into and then further reduce its view of
the filesystem.

This change may not impact systems relying on other permission models
than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
such systems may require to update their security policies.

Only the chroot system call is relaxed with this no_new_privs check; the
init_chroot() helper doesn't require such change.

Allowing unprivileged users to use chroot(2) is one of the initial
objectives of no_new_privs:
https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
This patch is a follow-up of a previous one sent by Andy Lutomirski, but
with less limitations:
https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210310161000.382796-2-mic@digikod.net
---
 fs/open.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 3 deletions(-)

Comments

Eric W. Biederman March 10, 2021, 4:56 p.m. UTC | #1
Mickaël Salaün <mic@digikod.net> writes:

> From: Mickaël Salaün <mic@linux.microsoft.com>
>
> Being able to easily change root directories enable to ease some
> development workflow and can be used as a tool to strengthen
> unprivileged security sandboxes.  chroot(2) is not an access-control
> mechanism per se, but it can be used to limit the absolute view of the
> filesystem, and then limit ways to access data and kernel interfaces
> (e.g. /proc, /sys, /dev, etc.).

Actually chroot does not so limit the view of things.  It only limits
the default view.

A process that is chrooted can always escape by something like
chroot("../../../../../../../../..").

So I don't see the point of allowing chroot once you are in your locked
down sandbox.

> Users may not wish to expose namespace complexity to potentially
> malicious processes, or limit their use because of limited resources.
> The chroot feature is much more simple (and limited) than the mount
> namespace, but can still be useful.  As for containers, users of
> chroot(2) should take care of file descriptors or data accessible by
> other means (e.g. current working directory, leaked FDs, passed FDs,
> devices, mount points, etc.).  There is a lot of literature that discuss
> the limitations of chroot, and users of this feature should be aware of
> the multiple ways to bypass it.  Using chroot(2) for security purposes
> can make sense if it is combined with other features (e.g. dedicated
> user, seccomp, LSM access-controls, etc.).
>
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc).  However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
>   not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
>   provided for instance by development environments or tailored
>   distributions.
> * Processes executed in a chroot may not require access to these special
>   files (e.g. with minimal runtimes, or by emulating some special files
>   with a LD_PRELOADed library or seccomp).
>
> Allowing a task to change its own root directory is not a threat to the
> system if we can prevent confused deputy attacks, which could be
> performed through execution of SUID-like binaries.  This can be
> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
> prctl(2).  To only affect this task, its filesystem information must not
> be shared with other tasks, which can be achieved by not passing
> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
> seccomp to avoid the same kind of security issues.  Furthermore, because
> of its security use and to avoid giving a new way for attackers to get
> out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is
> only allowed if the new root directory is the same or beneath the
> current one.  This still allows a process to use a subset of its
> legitimate filesystem to chroot into and then further reduce its view of
> the filesystem.
>
> This change may not impact systems relying on other permission models
> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
> such systems may require to update their security policies.
>
> Only the chroot system call is relaxed with this no_new_privs check; the
> init_chroot() helper doesn't require such change.
>
> Allowing unprivileged users to use chroot(2) is one of the initial
> objectives of no_new_privs:
> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
> This patch is a follow-up of a previous one sent by Andy Lutomirski, but
> with less limitations:
> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/

Last time I remember talking architecture we agreed that user namespaces
would be used for enabling features and that no_new_privs would just be
used to lock-down userspace.  That way no_new_privs could be kept simple
and trivial to audit and understand.

You can build your sandbox and use chroot if you use a user namespace at
the start.  A mount namespace would also help lock things down.  Still
allowing chroot after the sanbox has been built, a seccomp filter has
been installed and no_new_privs has been enabled seems like it is asking
for trouble and may weaken existing sandboxes.

So I think we need a pretty compelling use case to consider allowing
chroot(2).  You haven't even mentioned what your usecase is at this
point so I don't know why we would tackle that complexity.

Eric
Mickaël Salaün March 10, 2021, 6:13 p.m. UTC | #2
On 10/03/2021 17:56, Eric W. Biederman wrote:
> Mickaël Salaün <mic@digikod.net> writes:
> 
>> From: Mickaël Salaün <mic@linux.microsoft.com>
>>
>> Being able to easily change root directories enable to ease some
>> development workflow and can be used as a tool to strengthen
>> unprivileged security sandboxes.  chroot(2) is not an access-control
>> mechanism per se, but it can be used to limit the absolute view of the
>> filesystem, and then limit ways to access data and kernel interfaces
>> (e.g. /proc, /sys, /dev, etc.).
> 
> Actually chroot does not so limit the view of things.  It only limits
> the default view.
> 
> A process that is chrooted can always escape by something like
> chroot("../../../../../../../../..").

Not with this patch.

> 
> So I don't see the point of allowing chroot once you are in your locked
> down sandbox.
> 
>> Users may not wish to expose namespace complexity to potentially
>> malicious processes, or limit their use because of limited resources.
>> The chroot feature is much more simple (and limited) than the mount
>> namespace, but can still be useful.  As for containers, users of
>> chroot(2) should take care of file descriptors or data accessible by
>> other means (e.g. current working directory, leaked FDs, passed FDs,
>> devices, mount points, etc.).  There is a lot of literature that discuss
>> the limitations of chroot, and users of this feature should be aware of
>> the multiple ways to bypass it.  Using chroot(2) for security purposes
>> can make sense if it is combined with other features (e.g. dedicated
>> user, seccomp, LSM access-controls, etc.).
>>
>> One could argue that chroot(2) is useless without a properly populated
>> root hierarchy (i.e. without /dev and /proc).  However, there are
>> multiple use cases that don't require the chrooting process to create
>> file hierarchies with special files nor mount points, e.g.:
>> * A process sandboxing itself, once all its libraries are loaded, may
>>   not need files other than regular files, or even no file at all.
>> * Some pre-populated root hierarchies could be used to chroot into,
>>   provided for instance by development environments or tailored
>>   distributions.
>> * Processes executed in a chroot may not require access to these special
>>   files (e.g. with minimal runtimes, or by emulating some special files
>>   with a LD_PRELOADed library or seccomp).
>>
>> Allowing a task to change its own root directory is not a threat to the
>> system if we can prevent confused deputy attacks, which could be
>> performed through execution of SUID-like binaries.  This can be
>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
>> prctl(2).  To only affect this task, its filesystem information must not
>> be shared with other tasks, which can be achieved by not passing
>> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
>> seccomp to avoid the same kind of security issues.  Furthermore, because
>> of its security use and to avoid giving a new way for attackers to get
>> out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is
>> only allowed if the new root directory is the same or beneath the
>> current one.  This still allows a process to use a subset of its
>> legitimate filesystem to chroot into and then further reduce its view of
>> the filesystem.
>>
>> This change may not impact systems relying on other permission models
>> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
>> such systems may require to update their security policies.
>>
>> Only the chroot system call is relaxed with this no_new_privs check; the
>> init_chroot() helper doesn't require such change.
>>
>> Allowing unprivileged users to use chroot(2) is one of the initial
>> objectives of no_new_privs:
>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
>> This patch is a follow-up of a previous one sent by Andy Lutomirski, but
>> with less limitations:
>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
> 
> Last time I remember talking architecture we agreed that user namespaces
> would be used for enabling features and that no_new_privs would just be
> used to lock-down userspace.  That way no_new_privs could be kept simple
> and trivial to audit and understand.

chroot(2) is simple.

> 
> You can build your sandbox and use chroot if you use a user namespace at
> the start.  A mount namespace would also help lock things down.  Still
> allowing chroot after the sanbox has been built, a seccomp filter has
> been installed and no_new_privs has been enabled seems like it is asking
> for trouble and may weaken existing sandboxes.

Could you please provide a new attack scenario?

> 
> So I think we need a pretty compelling use case to consider allowing
> chroot(2).  You haven't even mentioned what your usecase is at this
> point so I don't know why we would tackle that complexity.

They are explained in this commit message.

> 
> Eric
>
diff mbox series

Patch

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..dd761e7b079c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -22,6 +22,7 @@ 
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/fs.h>
+#include <linux/path.h>
 #include <linux/personality.h>
 #include <linux/pagemap.h>
 #include <linux/syscalls.h>
@@ -532,6 +533,47 @@  SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 	return error;
 }
 
+/*
+ * Return true if @child is equal to @parent or beneath it, return false
+ * otherwise.
+ */
+static bool is_path_beneath(const struct path *const parent,
+		const struct path *const child)
+{
+	bool is_beneath = false;
+	struct path walker_path = *child;
+
+	path_get(&walker_path);
+	while (true) {
+		struct dentry *parent_dentry;
+
+		if (path_equal(parent, &walker_path)) {
+			is_beneath = true;
+			break;
+		}
+
+jump_up:
+		if (walker_path.dentry == walker_path.mnt->mnt_root) {
+			if (follow_up(&walker_path)) {
+				/* Ignores hidden mount points. */
+				goto jump_up;
+			} else {
+				/* Stops at the real root. */
+				break;
+			}
+		}
+		if (unlikely(IS_ROOT(walker_path.dentry))) {
+			/* Stops at disconnected root directories. */
+			break;
+		}
+		parent_dentry = dget_parent(walker_path.dentry);
+		dput(walker_path.dentry);
+		walker_path.dentry = parent_dentry;
+	}
+	path_put(&walker_path);
+	return is_beneath;
+}
+
 SYSCALL_DEFINE1(chroot, const char __user *, filename)
 {
 	struct path path;
@@ -546,15 +588,31 @@  SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	if (error)
 		goto dput_and_out;
 
+	/*
+	 * Changing the root directory for the calling task (and its future
+	 * children) requires that this task has CAP_SYS_CHROOT in its
+	 * namespace, or be running with no_new_privs and not sharing its
+	 * fs_struct and not escaping its current root directory.  As for
+	 * seccomp, checking no_new_privs avoids scenarios where unprivileged
+	 * tasks can affect the behavior of privileged children.  Lock the path
+	 * to protect against TOCTOU race between is_path_beneath() and
+	 * set_fs_root().  No need to lock the root because it is not possible
+	 * to rename it beneath itself.
+	 */
 	error = -EPERM;
-	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
-		goto dput_and_out;
+	inode_lock(d_inode(path.dentry));
+	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT) &&
+			!(task_no_new_privs(current) && current->fs->users == 1
+				&& is_path_beneath(&current->fs->root, &path)))
+		goto unlock_and_out;
 	error = security_path_chroot(&path);
 	if (error)
-		goto dput_and_out;
+		goto unlock_and_out;
 
 	set_fs_root(current->fs, &path);
 	error = 0;
+unlock_and_out:
+	inode_unlock(d_inode(path.dentry));
 dput_and_out:
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {