Message ID | 20210121131959.646623-24-christian.brauner@ubuntu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | idmapped mounts | expand |
On Thu, 21 Jan 2021, Christian Brauner wrote: > When executing a setuid binary the kernel will verify in bprm_fill_uid() > that the inode has a mapping in the caller's user namespace before > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > mounts. If the inode is accessed through an idmapped mount it is mapped > according to the mount's user namespace. Afterwards the checks are > identical to non-idmapped mounts. If the initial user namespace is > passed nothing changes so non-idmapped mounts will see identical > behavior as before. > > Link: https://lore.kernel.org/r/20210112220124.837960-32-christian.brauner@ubuntu.com > Cc: Christoph Hellwig <hch@lst.de> > Cc: David Howells <dhowells@redhat.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: linux-fsdevel@vger.kernel.org > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Christian Brauner <christian.brauner@ubuntu.com> writes: > When executing a setuid binary the kernel will verify in bprm_fill_uid() > that the inode has a mapping in the caller's user namespace before > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > mounts. If the inode is accessed through an idmapped mount it is mapped > according to the mount's user namespace. Afterwards the checks are > identical to non-idmapped mounts. If the initial user namespace is > passed nothing changes so non-idmapped mounts will see identical > behavior as before. This does not handle the v3 capabilites xattr with embeds a uid. So at least at that level you are missing some critical conversions. Eric > Link: https://lore.kernel.org/r/20210112220124.837960-32-christian.brauner@ubuntu.com > Cc: Christoph Hellwig <hch@lst.de> > Cc: David Howells <dhowells@redhat.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: linux-fsdevel@vger.kernel.org > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > --- > /* v2 */ > unchanged > > /* v3 */ > unchanged > > /* v4 */ > - Serge Hallyn <serge@hallyn.com>: > - Use "mnt_userns" to refer to a vfsmount's userns everywhere to make > terminology consistent. > > /* v5 */ > unchanged > base-commit: 7c53f6b671f4aba70ff15e1b05148b10d58c2837 > > /* v6 */ > base-commit: 19c329f6808995b142b3966301f217c831e7cf31 > > - Christoph Hellwig <hch@lst.de>: > - Use new file_mnt_user_ns() helper. > --- > fs/exec.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index d803227805f6..48d1e8b1610b 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1580,6 +1580,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) > { > /* Handle suid and sgid on files */ > + struct user_namespace *mnt_userns; > struct inode *inode; > unsigned int mode; > kuid_t uid; > @@ -1596,13 +1597,15 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) > if (!(mode & (S_ISUID|S_ISGID))) > return; > > + mnt_userns = file_mnt_user_ns(file); > + > /* Be careful if suid/sgid is set */ > inode_lock(inode); > > /* reload atomically mode/uid/gid now that lock held */ > mode = inode->i_mode; > - uid = inode->i_uid; > - gid = inode->i_gid; > + uid = i_uid_into_mnt(mnt_userns, inode); > + gid = i_gid_into_mnt(mnt_userns, inode); > inode_unlock(inode); > > /* We ignore suid/sgid if there are no mappings for them in the ns */
On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote: > Christian Brauner <christian.brauner@ubuntu.com> writes: > > > When executing a setuid binary the kernel will verify in bprm_fill_uid() > > that the inode has a mapping in the caller's user namespace before > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > > mounts. If the inode is accessed through an idmapped mount it is mapped > > according to the mount's user namespace. Afterwards the checks are > > identical to non-idmapped mounts. If the initial user namespace is > > passed nothing changes so non-idmapped mounts will see identical > > behavior as before. > > This does not handle the v3 capabilites xattr with embeds a uid. > So at least at that level you are missing some critical conversions. Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm not sure what you're referring to here. There are tests in xfstests that verify vfs3 capability behavior. Christian
On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote: > On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote: > > Christian Brauner <christian.brauner@ubuntu.com> writes: > > > > > When executing a setuid binary the kernel will verify in bprm_fill_uid() > > > that the inode has a mapping in the caller's user namespace before > > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > > > mounts. If the inode is accessed through an idmapped mount it is mapped > > > according to the mount's user namespace. Afterwards the checks are > > > identical to non-idmapped mounts. If the initial user namespace is > > > passed nothing changes so non-idmapped mounts will see identical > > > behavior as before. > > > > This does not handle the v3 capabilites xattr with embeds a uid. > > So at least at that level you are missing some critical conversions. > > Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm > not sure what you're referring to here. There are tests in xfstests that > verify vfs3 capability behavior. *just* to make sure i'm not misunderstanding - s/vfs3/v3/ right?
On Mon, Jan 25, 2021 at 11:03:16AM -0600, Serge Hallyn wrote: > On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote: > > On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote: > > > Christian Brauner <christian.brauner@ubuntu.com> writes: > > > > > > > When executing a setuid binary the kernel will verify in bprm_fill_uid() > > > > that the inode has a mapping in the caller's user namespace before > > > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > > > > mounts. If the inode is accessed through an idmapped mount it is mapped > > > > according to the mount's user namespace. Afterwards the checks are > > > > identical to non-idmapped mounts. If the initial user namespace is > > > > passed nothing changes so non-idmapped mounts will see identical > > > > behavior as before. > > > > > > This does not handle the v3 capabilites xattr with embeds a uid. > > > So at least at that level you are missing some critical conversions. > > > > Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm > > not sure what you're referring to here. There are tests in xfstests that > > verify vfs3 capability behavior. > > *just* to make sure i'm not misunderstanding - s/vfs3/v3/ right? Yes, in my mind it's always as "vfs v3 caps -> vfs3 caps". Sorry for the confusion.
On Mon, Jan 25, 2021 at 05:44:04PM +0100, Christian Brauner wrote: > On Mon, Jan 25, 2021 at 10:39:01AM -0600, Eric W. Biederman wrote: > > Christian Brauner <christian.brauner@ubuntu.com> writes: > > > > > When executing a setuid binary the kernel will verify in bprm_fill_uid() > > > that the inode has a mapping in the caller's user namespace before > > > setting the callers uid and gid. Let bprm_fill_uid() handle idmapped > > > mounts. If the inode is accessed through an idmapped mount it is mapped > > > according to the mount's user namespace. Afterwards the checks are > > > identical to non-idmapped mounts. If the initial user namespace is > > > passed nothing changes so non-idmapped mounts will see identical > > > behavior as before. > > > > This does not handle the v3 capabilites xattr with embeds a uid. > > So at least at that level you are missing some critical conversions. > > Thanks for looking. Vfs v3 caps are handled earlier in the series. I'm > not sure what you're referring to here. There are tests in xfstests that > verify vfs3 capability behavior. > > Christian So fwiw I just tested it manually as well. Scenario: uid 1000 user ubuntu uid 1001 user u2 sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt su - u2 cp /usr/bin/id /mnt/ unshare -Urm setcap cap_setuid=pe /mnt/id At this point, from init_user_ns, ubuntu@fscaps:~/mount-idmapped$ /home/u2/rcap /mnt/id v3, rootid is 1001 ubuntu@fscaps:~/mount-idmapped$ /home/u2/rcap //home/ubuntu/id v3, rootid is 1000 -serge
diff --git a/fs/exec.c b/fs/exec.c index d803227805f6..48d1e8b1610b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1580,6 +1580,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) { /* Handle suid and sgid on files */ + struct user_namespace *mnt_userns; struct inode *inode; unsigned int mode; kuid_t uid; @@ -1596,13 +1597,15 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) if (!(mode & (S_ISUID|S_ISGID))) return; + mnt_userns = file_mnt_user_ns(file); + /* Be careful if suid/sgid is set */ inode_lock(inode); /* reload atomically mode/uid/gid now that lock held */ mode = inode->i_mode; - uid = inode->i_uid; - gid = inode->i_gid; + uid = i_uid_into_mnt(mnt_userns, inode); + gid = i_gid_into_mnt(mnt_userns, inode); inode_unlock(inode); /* We ignore suid/sgid if there are no mappings for them in the ns */