diff mbox series

fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT

Message ID 20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org (mailing list archive)
State New
Headers show
Series fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT | expand

Commit Message

Seth Forshee (DigitalOcean) July 24, 2024, 2:53 p.m. UTC
Christian noticed that it is possible for a privileged user to mount
most filesystems with a non-initial user namespace in sb->s_user_ns.
When fsopen() is called in a non-init namespace the caller's namespace
is recorded in fs_context->user_ns. If the returned file descriptor is
then passed to a process priviliged in init_user_ns, that process can
call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
with sb->s_user_ns set to the namespace of the process which called
fsopen().

This is problematic. We cannot assume that any filesystem which does not
set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
mind, increasing the risk for bugs and security issues.

Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
not set for the filesystem and a non-initial user namespace will be
used. sget() does not need to be updated as it always uses the user
namespace of the current context, or the initial user namespace if
SB_SUBMOUNT is set.

Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
Reported-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
---
 fs/super.c | 11 +++++++++++
 1 file changed, 11 insertions(+)


---
base-commit: 256abd8e550ce977b728be79a74e1729438b4948
change-id: 20240723-s_user_ns-fix-b00c31de1cb8

Best regards,

Comments

Alexander Mikhalitsyn July 24, 2024, 3:17 p.m. UTC | #1
Am Mi., 24. Juli 2024 um 16:54 Uhr schrieb Seth Forshee (DigitalOcean)
<sforshee@kernel.org>:
>
> Christian noticed that it is possible for a privileged user to mount
> most filesystems with a non-initial user namespace in sb->s_user_ns.
> When fsopen() is called in a non-init namespace the caller's namespace
> is recorded in fs_context->user_ns. If the returned file descriptor is
> then passed to a process priviliged in init_user_ns, that process can
> call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
> with sb->s_user_ns set to the namespace of the process which called
> fsopen().
>
> This is problematic. We cannot assume that any filesystem which does not
> set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
> mind, increasing the risk for bugs and security issues.
>
> Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
> not set for the filesystem and a non-initial user namespace will be
> used. sget() does not need to be updated as it always uses the user
> namespace of the current context, or the initial user namespace if
> SB_SUBMOUNT is set.
>
> Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
> Reported-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>

Hi Seth!

LGTM

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/super.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/fs/super.c b/fs/super.c
> index 095ba793e10c..d681fb7698d8 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -736,6 +736,17 @@ struct super_block *sget_fc(struct fs_context *fc,
>         struct user_namespace *user_ns = fc->global ? &init_user_ns : fc->user_ns;
>         int err;
>
> +       /*
> +        * Never allow s_user_ns != &init_user_ns when FS_USERNS_MOUNT is
> +        * not set, as the filesystem is likely unprepared to handle it.
> +        * This can happen when fsconfig() is called from init_user_ns with
> +        * an fs_fd opened in another user namespace.
> +        */
> +       if (user_ns != &init_user_ns && !(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) {
> +               errorfc(fc, "mounting from non-initial user namespace is not allowed");
> +               return ERR_PTR(-EPERM);
> +       }
> +
>  retry:
>         spin_lock(&sb_lock);
>         if (test) {
>
> ---
> base-commit: 256abd8e550ce977b728be79a74e1729438b4948
> change-id: 20240723-s_user_ns-fix-b00c31de1cb8
>
> Best regards,
> --
> Seth Forshee (DigitalOcean) <sforshee@kernel.org>
>
Christian Brauner July 25, 2024, 8:19 a.m. UTC | #2
On Wed, 24 Jul 2024 09:53:59 -0500, Seth Forshee (DigitalOcean) wrote:
> Christian noticed that it is possible for a privileged user to mount
> most filesystems with a non-initial user namespace in sb->s_user_ns.
> When fsopen() is called in a non-init namespace the caller's namespace
> is recorded in fs_context->user_ns. If the returned file descriptor is
> then passed to a process priviliged in init_user_ns, that process can
> call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
> with sb->s_user_ns set to the namespace of the process which called
> fsopen().
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT
      https://git.kernel.org/vfs/vfs/c/f7c589ccd630
diff mbox series

Patch

diff --git a/fs/super.c b/fs/super.c
index 095ba793e10c..d681fb7698d8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -736,6 +736,17 @@  struct super_block *sget_fc(struct fs_context *fc,
 	struct user_namespace *user_ns = fc->global ? &init_user_ns : fc->user_ns;
 	int err;
 
+	/*
+	 * Never allow s_user_ns != &init_user_ns when FS_USERNS_MOUNT is
+	 * not set, as the filesystem is likely unprepared to handle it.
+	 * This can happen when fsconfig() is called from init_user_ns with
+	 * an fs_fd opened in another user namespace.
+	 */
+	if (user_ns != &init_user_ns && !(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) {
+		errorfc(fc, "mounting from non-initial user namespace is not allowed");
+		return ERR_PTR(-EPERM);
+	}
+
 retry:
 	spin_lock(&sb_lock);
 	if (test) {