diff mbox series

[v10,22/27] securityfs: Extend securityfs with namespacing support

Message ID 20220201203735.164593-23-stefanb@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series ima: Namespace IMA with audit support in IMA-ns | expand

Commit Message

Stefan Berger Feb. 1, 2022, 8:37 p.m. UTC
Enable multiple instances of securityfs by keying each instance with a
pointer to the user namespace it belongs to.

Since we do not need the pinning of the filesystem for the virtualization
case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
case when the init_user_ns is active. This simplifies the cleanup for the
virtualization case where usage of securityfs_remove() to free dentries
is therefore not needed anymore.

For the initial securityfs, i.e. the one mounted in the host userns mount,
nothing changes. The rules for securityfs_remove() are as before and it is
still paired with securityfs_create(). Specifically, a file created via
securityfs_create_dentry() in the initial securityfs mount still needs to
be removed by a call to securityfs_remove(). Creating a new dentry in the
initial securityfs mount still pins the filesystem like it always did.
Consequently, the initial securityfs mount is not destroyed on
umount/shutdown as long as at least one user of it still has dentries that
it hasn't removed with a call to securityfs_remove().

Prevent mounting of an instance of securityfs in another user namespace
than it belongs to. Also, prevent accesses to files and directories by
a user namespace that is neither the user namespace it belongs to
nor an ancestor of the user namespace that the instance of securityfs
belongs to. Do not prevent access if securityfs was bind-mounted and
therefore the init_user_ns is the owning user namespace.

Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 security/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 61 insertions(+), 11 deletions(-)

Comments

Mimi Zohar Feb. 23, 2022, 1:48 a.m. UTC | #1
On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
> Enable multiple instances of securityfs by keying each instance with a
> pointer to the user namespace it belongs to.
> 
> Since we do not need the pinning of the filesystem for the virtualization

^namespacing case

> case, limit the usage of simple_pin_fs() and simpe_release_fs() to the

^simple_release_fs

> case when the init_user_ns is active. This simplifies the cleanup for the
> virtualization case where usage of securityfs_remove() to free dentries

^namespacing 

> is therefore not needed anymore.
> 
> For the initial securityfs, i.e. the one mounted in the host userns mount,
> nothing changes. The rules for securityfs_remove() are as before and it is
> still paired with securityfs_create(). Specifically, a file created via
> securityfs_create_dentry() in the initial securityfs mount still needs to
> be removed by a call to securityfs_remove(). Creating a new dentry in the
> initial securityfs mount still pins the filesystem like it always did.
> Consequently, the initial securityfs mount is not destroyed on
> umount/shutdown as long as at least one user of it still has dentries that
> it hasn't removed with a call to securityfs_remove().
> 
> Prevent mounting of an instance of securityfs in another user namespace
> than it belongs to. Also, prevent accesses to files and directories by
> a user namespace that is neither the user namespace it belongs to
> nor an ancestor of the user namespace that the instance of securityfs
> belongs to. Do not prevent access if securityfs was bind-mounted and
> therefore the init_user_ns is the owning user namespace.
> 
> Suggested-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Christian, I understand that "[PATCH v10 23/27] ima: Setup securityfs
for IMA namespace" needs to be deferred, but is there a reason for
deferring  "[PATCH v10 22/27] securityfs: Extend securityfs with
namespacing support"?   As the securityfs patches are really
independent of IMA namespacing, I would have thought  "[PATCH v10
04/27] securityfs: rework dentry creation" and this patch should be co-
located at the beginning of the patch set.
Christian Brauner Feb. 23, 2022, 8:14 a.m. UTC | #2
On Tue, Feb 22, 2022 at 08:48:47PM -0500, Mimi Zohar wrote:
> On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
> > Enable multiple instances of securityfs by keying each instance with a
> > pointer to the user namespace it belongs to.
> > 
> > Since we do not need the pinning of the filesystem for the virtualization
> 
> ^namespacing case
> 
> > case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
> 
> ^simple_release_fs
> 
> > case when the init_user_ns is active. This simplifies the cleanup for the
> > virtualization case where usage of securityfs_remove() to free dentries
> 
> ^namespacing 
> 
> > is therefore not needed anymore.
> > 
> > For the initial securityfs, i.e. the one mounted in the host userns mount,
> > nothing changes. The rules for securityfs_remove() are as before and it is
> > still paired with securityfs_create(). Specifically, a file created via
> > securityfs_create_dentry() in the initial securityfs mount still needs to
> > be removed by a call to securityfs_remove(). Creating a new dentry in the
> > initial securityfs mount still pins the filesystem like it always did.
> > Consequently, the initial securityfs mount is not destroyed on
> > umount/shutdown as long as at least one user of it still has dentries that
> > it hasn't removed with a call to securityfs_remove().
> > 
> > Prevent mounting of an instance of securityfs in another user namespace
> > than it belongs to. Also, prevent accesses to files and directories by
> > a user namespace that is neither the user namespace it belongs to
> > nor an ancestor of the user namespace that the instance of securityfs
> > belongs to. Do not prevent access if securityfs was bind-mounted and
> > therefore the init_user_ns is the owning user namespace.
> > 
> > Suggested-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> 
> Christian, I understand that "[PATCH v10 23/27] ima: Setup securityfs
> for IMA namespace" needs to be deferred, but is there a reason for
> deferring  "[PATCH v10 22/27] securityfs: Extend securityfs with
> namespacing support"?   As the securityfs patches are really
> independent of IMA namespacing, I would have thought  "[PATCH v10
> 04/27] securityfs: rework dentry creation" and this patch should be co-
> located at the beginning of the patch set.

It felt more natural to me to defer it until the end but I have no
strong thoughts on this as of right now. Since Stefan has mentioned
moving this earlier already himself and you seem to agree as well, feel
free to do so.
Stefan Berger Feb. 23, 2022, 3:44 p.m. UTC | #3
On 2/23/22 03:14, Christian Brauner wrote:
> On Tue, Feb 22, 2022 at 08:48:47PM -0500, Mimi Zohar wrote:
>> On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
>>> Enable multiple instances of securityfs by keying each instance with a
>>> pointer to the user namespace it belongs to.
>>>
>>> Since we do not need the pinning of the filesystem for the virtualization
>> ^namespacing case
>>
>>> case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
>> ^simple_release_fs
>>
>>> case when the init_user_ns is active. This simplifies the cleanup for the
>>> virtualization case where usage of securityfs_remove() to free dentries
>> ^namespacing
>>
>>> is therefore not needed anymore.
>>>
>>> For the initial securityfs, i.e. the one mounted in the host userns mount,
>>> nothing changes. The rules for securityfs_remove() are as before and it is
>>> still paired with securityfs_create(). Specifically, a file created via
>>> securityfs_create_dentry() in the initial securityfs mount still needs to
>>> be removed by a call to securityfs_remove(). Creating a new dentry in the
>>> initial securityfs mount still pins the filesystem like it always did.
>>> Consequently, the initial securityfs mount is not destroyed on
>>> umount/shutdown as long as at least one user of it still has dentries that
>>> it hasn't removed with a call to securityfs_remove().
>>>
>>> Prevent mounting of an instance of securityfs in another user namespace
>>> than it belongs to. Also, prevent accesses to files and directories by
>>> a user namespace that is neither the user namespace it belongs to
>>> nor an ancestor of the user namespace that the instance of securityfs
>>> belongs to. Do not prevent access if securityfs was bind-mounted and
>>> therefore the init_user_ns is the owning user namespace.
>>>
>>> Suggested-by: Christian Brauner <brauner@kernel.org>
>>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>>> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>> Christian, I understand that "[PATCH v10 23/27] ima: Setup securityfs
>> for IMA namespace" needs to be deferred, but is there a reason for
>> deferring  "[PATCH v10 22/27] securityfs: Extend securityfs with
>> namespacing support"?   As the securityfs patches are really
>> independent of IMA namespacing, I would have thought  "[PATCH v10
>> 04/27] securityfs: rework dentry creation" and this patch should be co-
>> located at the beginning of the patch set.
> It felt more natural to me to defer it until the end but I have no
> strong thoughts on this as of right now. Since Stefan has mentioned
> moving this earlier already himself and you seem to agree as well, feel
> free to do so.

I'll move it after 'securityfs: rework dentry creation' if that's ok.

    Stefan
diff mbox series

Patch

diff --git a/security/inode.c b/security/inode.c
index 13e6780c4444..e525ba960063 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -21,9 +21,37 @@ 
 #include <linux/security.h>
 #include <linux/lsm_hooks.h>
 #include <linux/magic.h>
+#include <linux/user_namespace.h>
 
-static struct vfsmount *mount;
-static int mount_count;
+static struct vfsmount *init_securityfs_mount;
+static int init_securityfs_mount_count;
+
+static int securityfs_permission(struct user_namespace *mnt_userns,
+				 struct inode *inode, int mask)
+{
+	int err;
+
+	err = generic_permission(&init_user_ns, inode, mask);
+	if (!err) {
+		/* Unless bind-mounted, deny access if current_user_ns() is not
+		 * ancestor.
+		 */
+		if (inode->i_sb->s_user_ns != &init_user_ns &&
+		    !in_userns(current_user_ns(), inode->i_sb->s_user_ns))
+			err = -EACCES;
+	}
+
+	return err;
+}
+
+static const struct inode_operations securityfs_dir_inode_operations = {
+	.permission	= securityfs_permission,
+	.lookup		= simple_lookup,
+};
+
+static const struct inode_operations securityfs_file_inode_operations = {
+	.permission	= securityfs_permission,
+};
 
 static void securityfs_free_inode(struct inode *inode)
 {
@@ -40,20 +68,25 @@  static const struct super_operations securityfs_super_operations = {
 static int securityfs_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr files[] = {{""}};
+	struct user_namespace *ns = fc->user_ns;
 	int error;
 
+	if (WARN_ON(ns != current_user_ns()))
+		return -EINVAL;
+
 	error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
 	if (error)
 		return error;
 
 	sb->s_op = &securityfs_super_operations;
+	sb->s_root->d_inode->i_op = &securityfs_dir_inode_operations;
 
 	return 0;
 }
 
 static int securityfs_get_tree(struct fs_context *fc)
 {
-	return get_tree_single(fc, securityfs_fill_super);
+	return get_tree_keyed(fc, securityfs_fill_super, fc->user_ns);
 }
 
 static const struct fs_context_operations securityfs_context_ops = {
@@ -71,6 +104,7 @@  static struct file_system_type fs_type = {
 	.name =		"securityfs",
 	.init_fs_context = securityfs_init_fs_context,
 	.kill_sb =	kill_litter_super,
+	.fs_flags =	FS_USERNS_MOUNT,
 };
 
 /**
@@ -109,6 +143,7 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 					const struct file_operations *fops,
 					const struct inode_operations *iops)
 {
+	struct user_namespace *ns = current_user_ns();
 	struct dentry *dentry;
 	struct inode *dir, *inode;
 	int error;
@@ -118,12 +153,19 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 
 	pr_debug("securityfs: creating file '%s'\n",name);
 
-	error = simple_pin_fs(&fs_type, &mount, &mount_count);
-	if (error)
-		return ERR_PTR(error);
+	if (ns == &init_user_ns) {
+		error = simple_pin_fs(&fs_type, &init_securityfs_mount,
+				      &init_securityfs_mount_count);
+		if (error)
+			return ERR_PTR(error);
+	}
 
-	if (!parent)
-		parent = mount->mnt_root;
+	if (!parent) {
+		if (ns == &init_user_ns)
+			parent = init_securityfs_mount->mnt_root;
+		else
+			return ERR_PTR(-EINVAL);
+	}
 
 	dir = d_inode(parent);
 
@@ -148,7 +190,7 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 	inode->i_private = data;
 	if (S_ISDIR(mode)) {
-		inode->i_op = &simple_dir_inode_operations;
+		inode->i_op = &securityfs_dir_inode_operations;
 		inode->i_fop = &simple_dir_operations;
 		inc_nlink(inode);
 		inc_nlink(dir);
@@ -156,6 +198,7 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 		inode->i_op = iops ? iops : &simple_symlink_inode_operations;
 		inode->i_link = data;
 	} else {
+		inode->i_op = &securityfs_file_inode_operations;
 		inode->i_fop = fops;
 	}
 	d_instantiate(dentry, inode);
@@ -167,7 +210,9 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 	dentry = ERR_PTR(error);
 out:
 	inode_unlock(dir);
-	simple_release_fs(&mount, &mount_count);
+	if (ns == &init_user_ns)
+		simple_release_fs(&init_securityfs_mount,
+				  &init_securityfs_mount_count);
 	return dentry;
 }
 
@@ -293,11 +338,14 @@  EXPORT_SYMBOL_GPL(securityfs_create_symlink);
  */
 void securityfs_remove(struct dentry *dentry)
 {
+	struct user_namespace *ns;
 	struct inode *dir;
 
 	if (!dentry || IS_ERR(dentry))
 		return;
 
+	ns = dentry->d_sb->s_user_ns;
+
 	dir = d_inode(dentry->d_parent);
 	inode_lock(dir);
 	if (simple_positive(dentry)) {
@@ -310,7 +358,9 @@  void securityfs_remove(struct dentry *dentry)
 		dput(dentry);
 	}
 	inode_unlock(dir);
-	simple_release_fs(&mount, &mount_count);
+	if (ns == &init_user_ns)
+		simple_release_fs(&init_securityfs_mount,
+				  &init_securityfs_mount_count);
 }
 EXPORT_SYMBOL_GPL(securityfs_remove);