diff mbox series

[v3,07/38] mount: attach mappings to mounts

Message ID 20201128213527.2669807-8-christian.brauner@ubuntu.com (mailing list archive)
State New
Headers show
Series idmapped mounts | expand

Commit Message

Christian Brauner Nov. 28, 2020, 9:34 p.m. UTC
In order to support per-mount idmappings vfsmounts will be marked with user
namespaces. The idmapping associated with that user namespace will be used to
map the ids of vfs objects when they are accessed through that mount. By
default all vfsmounts are marked with the initial user namespace. The initial
user namespace is used to indicate that a mount is not idmapped. All operations
behave as before.

Based on prior discussions we want to attach the whole user namespace and not
just a dedicated idmapping struct. This allows us to reuse all the helpers that
already exist for dealing with idmappings instead of introducing a whole new
range of helpers. In addition, if we decide in the future that we are confident
enough to enable unprivileged users to setup idmapped mounts we can allow
already idmapped mounts to be marked with another user namespace. The permission
checking would then take into account whether the caller is privileged in the
user namespace the mount is currently marked with. For now, we will enforce in
later patches that once a mount has been idmapped it can't be remapped. This
keeps permission checking and life-cycle management simple, especially since
users can always create a new mount with a different idmapping anyway.

The idea to attach user namespaces to vfsmounts has been floated around in
various forms at Linux Plumbers in ~2018 with the original idea tracing back to
a discussion during a conference in St. Petersburg between Christoph, Tycho, and
myself.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
- Christoph Hellwig <hch@lst.de>:
  - Split internal implementation into separate patch and move syscall
    implementation later.

/* v3 */
- David Howells <dhowells@redhat.com>:
  - Remove MNT_IDMAPPED flag. We can simply check the pointer and use
    smp_load_acquire() in later patches.

- Tycho Andersen <tycho@tycho.pizza>:
  - Use READ_ONCE() in mnt_user_ns().
---
 fs/namespace.c        | 9 +++++++++
 include/linux/fs.h    | 1 +
 include/linux/mount.h | 6 ++++++
 3 files changed, 16 insertions(+)

Comments

Christoph Hellwig Dec. 1, 2020, 10:50 a.m. UTC | #1
The READ_ONCE still looks suspect as it generally needs to be paired
with a WRITE_ONCE.  The rest looks sane to me.
Tycho Andersen Dec. 1, 2020, 1:25 p.m. UTC | #2
On Tue, Dec 01, 2020 at 11:50:25AM +0100, Christoph Hellwig wrote:
> The READ_ONCE still looks suspect as it generally needs to be paired
> with a WRITE_ONCE.  The rest looks sane to me.

Yeah, the comment from the other location is,

/* Pairs with smp_load_acquire() in mnt_user_ns(). */

So I think it's just a typo that needs to be fixed.

Tycho
Christian Brauner Dec. 2, 2020, 9:24 a.m. UTC | #3
On Tue, Dec 01, 2020 at 11:50:25AM +0100, Christoph Hellwig wrote:
> The READ_ONCE still looks suspect as it generally needs to be paired
> with a WRITE_ONCE.  The rest looks sane to me.

I think the READ_ONCE() can be dropped from this patch. At this point in
the series we don't allowing changing the vfsmount's userns. The infra
to do that is only introduced as almost the last patch in the series and
there we immediately use smp_load_acquire() and smp_store_release().

Thanks!
Christian
diff mbox series

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index f9ea31b7eb7f..a1fda548c3af 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -220,6 +220,7 @@  static struct mount *alloc_vfsmnt(const char *name)
 		INIT_HLIST_NODE(&mnt->mnt_mp_list);
 		INIT_LIST_HEAD(&mnt->mnt_umounting);
 		INIT_HLIST_HEAD(&mnt->mnt_stuck_children);
+		mnt->mnt.mnt_user_ns = &init_user_ns;
 	}
 	return mnt;
 
@@ -559,6 +560,11 @@  int sb_prepare_remount_readonly(struct super_block *sb)
 
 static void free_vfsmnt(struct mount *mnt)
 {
+	struct user_namespace *ns;
+
+	ns = mnt_user_ns(&mnt->mnt);
+	if (ns != &init_user_ns)
+		put_user_ns(ns);
 	kfree_const(mnt->mnt_devname);
 #ifdef CONFIG_SMP
 	free_percpu(mnt->mnt_pcp);
@@ -1067,6 +1073,9 @@  static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
 
 	atomic_inc(&sb->s_active);
+	mnt->mnt.mnt_user_ns = mnt_user_ns(&old->mnt);
+	if (mnt->mnt.mnt_user_ns != &init_user_ns)
+		mnt->mnt.mnt_user_ns = get_user_ns(mnt->mnt.mnt_user_ns);
 	mnt->mnt.mnt_sb = sb;
 	mnt->mnt.mnt_root = dget(root);
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f59b7f16f216..004686f49e32 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2276,6 +2276,7 @@  struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
+#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
 #define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index aaf343b38671..d0d4a270acea 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -72,8 +72,14 @@  struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	int mnt_flags;
+	struct user_namespace *mnt_user_ns;
 } __randomize_layout;
 
+static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
+{
+	return READ_ONCE(mnt->mnt_user_ns);
+}
+
 struct file; /* forward dec */
 struct path;