Message ID | 20170213161834.GA2065@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Feb 13, 2017 at 6:18 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file). > This in turn returns inode of lower filesystem (in a stacked filesystem > setup). > > I was playing with modified patches of shiftfs posted by james bottomley > and realized that through shiftfs setuid bit does not take effect. And > reason being that we fetch uid/gid from inode of lower fs (and not from > shiftfs inode). And that results in following checks failing. > > /* We ignore suid/sgid if there are no mappings for them in the ns */ > if (!kuid_has_mapping(bprm->cred->user_ns, uid) || > !kgid_has_mapping(bprm->cred->user_ns, gid)) > return; > > uid/gid fetched from lower fs inode might not be mapped inside the user > namespace of container. So we need to look at uid/gid fetched from > upper filesystem (shiftfs in this particular case) and these should be > mapped and setuid bit can take affect. This change is consistent with may_linkat() -> safe_hardlink_source() and with all instances of notify_change() in fs/open.c, but is not consistent with file_remove_privs() -> ... should_remove_suid()/notify_change() which also uses file_inode() So perhaps remove of suid/sgid on truncate/write is also broken? Do you know if we already have xfstests that cover these cases? > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > --- > fs/exec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: rhvgoyal-linux/fs/exec.c > =================================================================== > --- rhvgoyal-linux.orig/fs/exec.c 2017-02-10 09:20:31.058642923 -0500 > +++ rhvgoyal-linux/fs/exec.c 2017-02-13 10:51:52.155751128 -0500 > @@ -1479,7 +1479,7 @@ static void bprm_fill_uid(struct linux_b > if (task_no_new_privs(current)) > return; > > - inode = file_inode(bprm->file); > + inode = d_backing_inode(bprm->file->f_path.dentry); I would much rather if we could use a macro for this pattern like locks_inode(), maybe privs_inode()? You could use the same macro if appropriate also in file_remove_privs(). and IMO the use of d_backing_inode() here doesn't make senses. I know it does nothing, but to my understanding its intention is quite the opposite of the change you are introducing (it was designed to give the underlying inode, not the union inode). Cheers, Amir.
On Mon, Feb 13, 2017 at 08:07:16PM +0200, Amir Goldstein wrote: > On Mon, Feb 13, 2017 at 6:18 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > > Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file). > > This in turn returns inode of lower filesystem (in a stacked filesystem > > setup). > > > > I was playing with modified patches of shiftfs posted by james bottomley > > and realized that through shiftfs setuid bit does not take effect. And > > reason being that we fetch uid/gid from inode of lower fs (and not from > > shiftfs inode). And that results in following checks failing. > > > > /* We ignore suid/sgid if there are no mappings for them in the ns */ > > if (!kuid_has_mapping(bprm->cred->user_ns, uid) || > > !kgid_has_mapping(bprm->cred->user_ns, gid)) > > return; > > > > uid/gid fetched from lower fs inode might not be mapped inside the user > > namespace of container. So we need to look at uid/gid fetched from > > upper filesystem (shiftfs in this particular case) and these should be > > mapped and setuid bit can take affect. > > This change is consistent with may_linkat() -> safe_hardlink_source() > and with all instances of notify_change() in fs/open.c, but is not > consistent with file_remove_privs() -> ... > should_remove_suid()/notify_change() > which also uses file_inode() I did a quick grep to figure out who is calling file_remove_privs(). Looks like it is being called by underlying filesystem (ext4, xfs, ntfs, fuse). If that's the case, we probably want to use the dentry and inode of underlying filesystem and not recurse back to top level filesystem? > > So perhaps remove of suid/sgid on truncate/write is also broken? vfs_truncate() seems to be using path->dentry so should be fine for the case of truncate. do_truncate() -->dentry_need_remove_privs() I have taken v2 patches of shiftfs and modifed them to be read-only and testing that. So not doing any write operations right now. > > Do you know if we already have xfstests that cover these cases? No, I have not looked there yet. > > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > --- > > fs/exec.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Index: rhvgoyal-linux/fs/exec.c > > =================================================================== > > --- rhvgoyal-linux.orig/fs/exec.c 2017-02-10 09:20:31.058642923 -0500 > > +++ rhvgoyal-linux/fs/exec.c 2017-02-13 10:51:52.155751128 -0500 > > @@ -1479,7 +1479,7 @@ static void bprm_fill_uid(struct linux_b > > if (task_no_new_privs(current)) > > return; > > > > - inode = file_inode(bprm->file); > > + inode = d_backing_inode(bprm->file->f_path.dentry); > > I would much rather if we could use a macro for this pattern > like locks_inode(), maybe privs_inode()? > You could use the same macro if appropriate also in > file_remove_privs(). > > and IMO the use of d_backing_inode() here doesn't make > senses. I know it does nothing, but to my understanding its intention > is quite the opposite of the change you are introducing > (it was designed to give the underlying inode, not the union inode). Agreed that I also get confused by d_backing_inode(). So file can give information about two inodes. One stored in file->f_inode which is accessed by file_inode() helper function. Can we create another helper function to access that second inode. Say file_path_inode() and that will return file->f_path.dentry->d_inode? Or I could keep it simple for now and drop d_backing_inode() and access inode directly. inode = file->f_path.dentry->d_inode Vivek
On Mon, Feb 13, 2017 at 9:06 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Mon, Feb 13, 2017 at 08:07:16PM +0200, Amir Goldstein wrote: >> On Mon, Feb 13, 2017 at 6:18 PM, Vivek Goyal <vgoyal@redhat.com> wrote: >> > Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file). >> > This in turn returns inode of lower filesystem (in a stacked filesystem >> > setup). >> > >> > I was playing with modified patches of shiftfs posted by james bottomley >> > and realized that through shiftfs setuid bit does not take effect. And >> > reason being that we fetch uid/gid from inode of lower fs (and not from >> > shiftfs inode). And that results in following checks failing. >> > >> > /* We ignore suid/sgid if there are no mappings for them in the ns */ >> > if (!kuid_has_mapping(bprm->cred->user_ns, uid) || >> > !kgid_has_mapping(bprm->cred->user_ns, gid)) >> > return; >> > >> > uid/gid fetched from lower fs inode might not be mapped inside the user >> > namespace of container. So we need to look at uid/gid fetched from >> > upper filesystem (shiftfs in this particular case) and these should be >> > mapped and setuid bit can take affect. >> >> This change is consistent with may_linkat() -> safe_hardlink_source() >> and with all instances of notify_change() in fs/open.c, but is not >> consistent with file_remove_privs() -> ... >> should_remove_suid()/notify_change() >> which also uses file_inode() > > I did a quick grep to figure out who is calling file_remove_privs(). Looks > like it is being called by underlying filesystem (ext4, xfs, ntfs, fuse). > If that's the case, we probably want to use the dentry and inode of > underlying filesystem and not recurse back to top level filesystem? > No I think you are right and there is no problem with file_remove_privs(). Not sure about do_coredump(), though, which uses file_inode(cprm.file). Other places in vfs that check inode->i_uid are even harder to follow. >> >> So perhaps remove of suid/sgid on truncate/write is also broken? > > vfs_truncate() seems to be using path->dentry so should be fine for the case > of truncate. > > do_truncate() -->dentry_need_remove_privs() > > I have taken v2 patches of shiftfs and modifed them to be read-only and > testing that. So not doing any write operations right now. > >> >> Do you know if we already have xfstests that cover these cases? > > No, I have not looked there yet. > >> >> > >> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> >> > --- >> > fs/exec.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > Index: rhvgoyal-linux/fs/exec.c >> > =================================================================== >> > --- rhvgoyal-linux.orig/fs/exec.c 2017-02-10 09:20:31.058642923 -0500 >> > +++ rhvgoyal-linux/fs/exec.c 2017-02-13 10:51:52.155751128 -0500 >> > @@ -1479,7 +1479,7 @@ static void bprm_fill_uid(struct linux_b >> > if (task_no_new_privs(current)) >> > return; >> > >> > - inode = file_inode(bprm->file); >> > + inode = d_backing_inode(bprm->file->f_path.dentry); >> >> I would much rather if we could use a macro for this pattern >> like locks_inode(), maybe privs_inode()? >> You could use the same macro if appropriate also in >> file_remove_privs(). >> >> and IMO the use of d_backing_inode() here doesn't make >> senses. I know it does nothing, but to my understanding its intention >> is quite the opposite of the change you are introducing >> (it was designed to give the underlying inode, not the union inode). > > Agreed that I also get confused by d_backing_inode(). > > So file can give information about two inodes. One stored in file->f_inode > which is accessed by file_inode() helper function. Can we create another > helper function to access that second inode. Say file_path_inode() and that > will return file->f_path.dentry->d_inode? > > Or I could keep it simple for now and drop d_backing_inode() and > access inode directly. > > inode = file->f_path.dentry->d_inode > Yes, that would be better. Using a macro file_path_inode(), IMO, does not add any value. Using a meaningful macro name (like locks_inode()), which is relevant to the context, can help understand where this macro should be used (i.e. instead of file_inode()). Defining those consistent semantics - I think that's the hard part.
On Mon, Feb 13, 2017 at 10:01:09PM +0200, Amir Goldstein wrote: > On Mon, Feb 13, 2017 at 9:06 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Feb 13, 2017 at 08:07:16PM +0200, Amir Goldstein wrote: > >> On Mon, Feb 13, 2017 at 6:18 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > >> > Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file). > >> > This in turn returns inode of lower filesystem (in a stacked filesystem > >> > setup). > >> > > >> > I was playing with modified patches of shiftfs posted by james bottomley > >> > and realized that through shiftfs setuid bit does not take effect. And > >> > reason being that we fetch uid/gid from inode of lower fs (and not from > >> > shiftfs inode). And that results in following checks failing. > >> > > >> > /* We ignore suid/sgid if there are no mappings for them in the ns */ > >> > if (!kuid_has_mapping(bprm->cred->user_ns, uid) || > >> > !kgid_has_mapping(bprm->cred->user_ns, gid)) > >> > return; > >> > > >> > uid/gid fetched from lower fs inode might not be mapped inside the user > >> > namespace of container. So we need to look at uid/gid fetched from > >> > upper filesystem (shiftfs in this particular case) and these should be > >> > mapped and setuid bit can take affect. > >> > >> This change is consistent with may_linkat() -> safe_hardlink_source() > >> and with all instances of notify_change() in fs/open.c, but is not > >> consistent with file_remove_privs() -> ... > >> should_remove_suid()/notify_change() > >> which also uses file_inode() > > > > I did a quick grep to figure out who is calling file_remove_privs(). Looks > > like it is being called by underlying filesystem (ext4, xfs, ntfs, fuse). > > If that's the case, we probably want to use the dentry and inode of > > underlying filesystem and not recurse back to top level filesystem? > > > > No I think you are right and there is no problem with file_remove_privs(). > > Not sure about do_coredump(), though, which uses file_inode(cprm.file). > Other places in vfs that check inode->i_uid are even harder to follow. Hmm.., file_inode(cprm.file) might be a problem too with shiftfs. For now I will post V2 of this patch and follow this core dump issue separately. Vivek
Index: rhvgoyal-linux/fs/exec.c =================================================================== --- rhvgoyal-linux.orig/fs/exec.c 2017-02-10 09:20:31.058642923 -0500 +++ rhvgoyal-linux/fs/exec.c 2017-02-13 10:51:52.155751128 -0500 @@ -1479,7 +1479,7 @@ static void bprm_fill_uid(struct linux_b if (task_no_new_privs(current)) return; - inode = file_inode(bprm->file); + inode = d_backing_inode(bprm->file->f_path.dentry); mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) return;
Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file). This in turn returns inode of lower filesystem (in a stacked filesystem setup). I was playing with modified patches of shiftfs posted by james bottomley and realized that through shiftfs setuid bit does not take effect. And reason being that we fetch uid/gid from inode of lower fs (and not from shiftfs inode). And that results in following checks failing. /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!kuid_has_mapping(bprm->cred->user_ns, uid) || !kgid_has_mapping(bprm->cred->user_ns, gid)) return; uid/gid fetched from lower fs inode might not be mapped inside the user namespace of container. So we need to look at uid/gid fetched from upper filesystem (shiftfs in this particular case) and these should be mapped and setuid bit can take affect. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- fs/exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)