diff mbox series

[v6,23/40] exec: handle idmapped mounts

Message ID 20210121131959.646623-24-christian.brauner@ubuntu.com (mailing list archive)
State New, archived
Headers show
Series idmapped mounts | expand

Commit Message

Christian Brauner Jan. 21, 2021, 1:19 p.m. UTC
When executing a setuid binary the kernel will verify in bprm_fill_uid()
that the inode has a mapping in the caller's user namespace before
setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
mounts. If the inode is accessed through an idmapped mount it is mapped
according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Link: https://lore.kernel.org/r/20210112220124.837960-32-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged

/* v3 */
unchanged

/* v4 */
- Serge Hallyn <serge@hallyn.com>:
  - Use "mnt_userns" to refer to a vfsmount's userns everywhere to make
    terminology consistent.

/* v5 */
unchanged
base-commit: 7c53f6b671f4aba70ff15e1b05148b10d58c2837

/* v6 */
base-commit: 19c329f6808995b142b3966301f217c831e7cf31

- Christoph Hellwig <hch@lst.de>:
  - Use new file_mnt_user_ns() helper.
---
 fs/exec.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

James Morris Jan. 22, 2021, 4:35 a.m. UTC | #1
On Thu, 21 Jan 2021, Christian Brauner wrote:

> When executing a setuid binary the kernel will verify in bprm_fill_uid()
> that the inode has a mapping in the caller's user namespace before
> setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> mounts. If the inode is accessed through an idmapped mount it is mapped
> according to the mount's user namespace. Afterwards the checks are
> identical to non-idmapped mounts. If the initial user namespace is
> passed nothing changes so non-idmapped mounts will see identical
> behavior as before.
> 
> Link: https://lore.kernel.org/r/20210112220124.837960-32-christian.brauner@ubuntu.com
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>


Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Eric W. Biederman Jan. 25, 2021, 4:39 p.m. UTC | #2
Christian Brauner <christian.brauner@ubuntu.com> writes:

> When executing a setuid binary the kernel will verify in bprm_fill_uid()
> that the inode has a mapping in the caller's user namespace before
> setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> mounts. If the inode is accessed through an idmapped mount it is mapped
> according to the mount's user namespace. Afterwards the checks are
> identical to non-idmapped mounts. If the initial user namespace is
> passed nothing changes so non-idmapped mounts will see identical
> behavior as before.

This does not handle the v3 capabilites xattr with embeds a uid.
So at least at that level you are missing some critical conversions.

Eric

> Link: https://lore.kernel.org/r/20210112220124.837960-32-christian.brauner@ubuntu.com
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
> /* v2 */
> unchanged
>
> /* v3 */
> unchanged
>
> /* v4 */
> - Serge Hallyn <serge@hallyn.com>:
>   - Use "mnt_userns" to refer to a vfsmount's userns everywhere to make
>     terminology consistent.
>
> /* v5 */
> unchanged
> base-commit: 7c53f6b671f4aba70ff15e1b05148b10d58c2837
>
> /* v6 */
> base-commit: 19c329f6808995b142b3966301f217c831e7cf31
>
> - Christoph Hellwig <hch@lst.de>:
>   - Use new file_mnt_user_ns() helper.
> ---
>  fs/exec.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index d803227805f6..48d1e8b1610b 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1580,6 +1580,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
>  {
>  	/* Handle suid and sgid on files */
> +	struct user_namespace *mnt_userns;
>  	struct inode *inode;
>  	unsigned int mode;
>  	kuid_t uid;
> @@ -1596,13 +1597,15 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
>  	if (!(mode & (S_ISUID|S_ISGID)))
>  		return;
>  
> +	mnt_userns = file_mnt_user_ns(file);
> +
>  	/* Be careful if suid/sgid is set */
>  	inode_lock(inode);
>  
>  	/* reload atomically mode/uid/gid now that lock held */
>  	mode = inode->i_mode;
> -	uid = inode->i_uid;
> -	gid = inode->i_gid;
> +	uid = i_uid_into_mnt(mnt_userns, inode);
> +	gid = i_gid_into_mnt(mnt_userns, inode);
>  	inode_unlock(inode);
>  
>  	/* We ignore suid/sgid if there are no mappings for them in the ns */
Christian Brauner Jan. 25, 2021, 4:44 p.m. UTC | #3
On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > When executing a setuid binary the kernel will verify in bprm_fill_uid()
> > that the inode has a mapping in the caller's user namespace before
> > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> > mounts. If the inode is accessed through an idmapped mount it is mapped
> > according to the mount's user namespace. Afterwards the checks are
> > identical to non-idmapped mounts. If the initial user namespace is
> > passed nothing changes so non-idmapped mounts will see identical
> > behavior as before.
> 
> This does not handle the v3 capabilites xattr with embeds a uid.
> So at least at that level you are missing some critical conversions.

Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm
not sure what you're referring to here. There are tests in xfstests that
verify vfs3 capability behavior.

Christian
Serge Hallyn Jan. 25, 2021, 5:03 p.m. UTC | #4
On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote:
> On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote:
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > 
> > > When executing a setuid binary the kernel will verify in bprm_fill_uid()
> > > that the inode has a mapping in the caller's user namespace before
> > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> > > mounts. If the inode is accessed through an idmapped mount it is mapped
> > > according to the mount's user namespace. Afterwards the checks are
> > > identical to non-idmapped mounts. If the initial user namespace is
> > > passed nothing changes so non-idmapped mounts will see identical
> > > behavior as before.
> > 
> > This does not handle the v3 capabilites xattr with embeds a uid.
> > So at least at that level you are missing some critical conversions.
> 
> Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm
> not sure what you're referring to here. There are tests in xfstests that
> verify vfs3 capability behavior.

*just* to make sure i'm not misunderstanding - s/vfs3/v3/ right?
Christian Brauner Jan. 25, 2021, 5:06 p.m. UTC | #5
On Mon, Jan 25, 2021 at 11:03:16AM -0600, Serge Hallyn wrote:
> On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote:
> > On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote:
> > > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > > 
> > > > When executing a setuid binary the kernel will verify in bprm_fill_uid()
> > > > that the inode has a mapping in the caller's user namespace before
> > > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> > > > mounts. If the inode is accessed through an idmapped mount it is mapped
> > > > according to the mount's user namespace. Afterwards the checks are
> > > > identical to non-idmapped mounts. If the initial user namespace is
> > > > passed nothing changes so non-idmapped mounts will see identical
> > > > behavior as before.
> > > 
> > > This does not handle the v3 capabilites xattr with embeds a uid.
> > > So at least at that level you are missing some critical conversions.
> > 
> > Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm
> > not sure what you're referring to here. There are tests in xfstests that
> > verify vfs3 capability behavior.
> 
> *just* to make sure i'm not misunderstanding - s/vfs3/v3/ right?

Yes, in my mind it's always as "vfs v3 caps -> vfs3 caps". Sorry for the
confusion.
Serge Hallyn Jan. 27, 2021, 5:50 a.m. UTC | #6
On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote:
> On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote:
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > 
> > > When executing a setuid binary the kernel will verify in bprm_fill_uid()
> > > that the inode has a mapping in the caller's user namespace before
> > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped
> > > mounts. If the inode is accessed through an idmapped mount it is mapped
> > > according to the mount's user namespace. Afterwards the checks are
> > > identical to non-idmapped mounts. If the initial user namespace is
> > > passed nothing changes so non-idmapped mounts will see identical
> > > behavior as before.
> > 
> > This does not handle the v3 capabilites xattr with embeds a uid.
> > So at least at that level you are missing some critical conversions.
> 
> Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm
> not sure what you're referring to here. There are tests in xfstests that
> verify vfs3 capability behavior.
> 
> Christian

So fwiw I just tested it manually as well.  Scenario:

uid 1000 user ubuntu
uid 1001 user u2
sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
su - u2
  cp /usr/bin/id /mnt/
  unshare -Urm
    setcap cap_setuid=pe /mnt/id
At this point, from init_user_ns,
ubuntu@fscaps:~/mount-idmapped$ /home/u2/rcap /mnt/id
v3, rootid is 1001
ubuntu@fscaps:~/mount-idmapped$ /home/u2/rcap //home/ubuntu/id
v3, rootid is 1000

-serge
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index d803227805f6..48d1e8b1610b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1580,6 +1580,7 @@  static void check_unsafe_exec(struct linux_binprm *bprm)
 static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
 {
 	/* Handle suid and sgid on files */
+	struct user_namespace *mnt_userns;
 	struct inode *inode;
 	unsigned int mode;
 	kuid_t uid;
@@ -1596,13 +1597,15 @@  static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
 	if (!(mode & (S_ISUID|S_ISGID)))
 		return;
 
+	mnt_userns = file_mnt_user_ns(file);
+
 	/* Be careful if suid/sgid is set */
 	inode_lock(inode);
 
 	/* reload atomically mode/uid/gid now that lock held */
 	mode = inode->i_mode;
-	uid = inode->i_uid;
-	gid = inode->i_gid;
+	uid = i_uid_into_mnt(mnt_userns, inode);
+	gid = i_gid_into_mnt(mnt_userns, inode);
 	inode_unlock(inode);
 
 	/* We ignore suid/sgid if there are no mappings for them in the ns */