Message ID | 1487053720.3125.73.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
James Bottomley <James.Bottomley@HansenPartnership.com> writes: > Now that we have two different views of filesystem ids (the filesystem > view and the kernel view), we have a problem in that > current_fsuid/fsgid() return the kernel view but are sometimes used in > filesystem code where the filesystem view shoud be used. This patch > introduces helpers to produce the filesystem view of current fsuid and > fsgid. If I am reading this right what we are seeing is that xfs explicitly opted out of type safety with predictable results. Accidentally confusing kuids and uids, which is potentially security issue. All of that said where are you getting sb->s_user_ns != &init_user_ns for an xfs filesystem? There are quite a few xfs interfaces that are not ready for that. xfs has a very wide userspace interface of ioctls that all needs to be looked at and addressed carefully if there is anything like this going on. I think we really need to ask if we should use kuids and kgids for the xfs internal quota code. At the end of the day that is going to be a whole lot less error prone. > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> > > diff --git a/include/linux/cred.h b/include/linux/cred.h > index f0e70a1..18e9c41 100644 > --- a/include/linux/cred.h > +++ b/include/linux/cred.h > @@ -399,4 +399,9 @@ do { \ > *(_fsgid) = __cred->fsgid; \ > } while(0) > > +/* return the current id in the filesystem view */ > +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, current_fsuid()) > +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, current_fsgid()) Could we please place these helpers in fs.h? That should allow them to become inline functions and live with the existing filesystem helpers in there. My gut says the names disk_fsuid(i) and disk_fsgid(i) would be clearer. Of course all of this has the challenge of error handling in the case when current_fsuid or current_fsgid do not map into the current filesystem. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 14, 2017 at 08:46:32PM +1300, Eric W. Biederman wrote: > All of that said where are you getting sb->s_user_ns != &init_user_ns > for an xfs filesystem? There are quite a few xfs interfaces that are > not ready for that. xfs has a very wide userspace interface of ioctls > that all needs to be looked at and addressed carefully if there is > anything like this going on. The only thing exposing uids/gid is the bulkstat code, and that's easy to cover. > > +/* return the current id in the filesystem view */ > > +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, current_fsuid()) > > +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, current_fsgid()) > > Could we please place these helpers in fs.h? > That should allow them to become inline functions and live with the > existing filesystem helpers in there. And give them better names, i_* is rather cryptic. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > Now that we have two different views of filesystem ids (the > > filesystem view and the kernel view), we have a problem in that > > current_fsuid/fsgid() return the kernel view but are sometimes used > > in filesystem code where the filesystem view shoud be used. This > > patch introduces helpers to produce the filesystem view of current > > fsuid and fsgid. > > If I am reading this right what we are seeing is that xfs explicitly > opted out of type safety with predictable results. Accidentally > confusing kuids and uids, which is potentially security issue. > > All of that said where are you getting sb->s_user_ns != &init_user_ns > for an xfs filesystem? There are quite a few xfs interfaces that are > not ready for that. xfs has a very wide userspace interface of > ioctls that all needs to be looked at and addressed carefully if > there is anything like this going on. > > I think we really need to ask if we should use kuids and kgids for > the xfs internal quota code. That question devolves to who administers quota operations in containers. The answer is usually that apparent root in the container needs to be able to administer quotas as though they were real root outside, so transforming the user quota calculations is correct to first order. To second order we need a way of controlling the container's quota which is why we've had a flurry of two level quota patches over the years. We've finally settled on group or project quotas and, if you look at xfs, you'll see the project quota will work even in the face of uid shifts in the user quota, so I think it's all working. > At the end of the day that is going to be a whole lot less error > prone. It would make the job of the filesystem write harder: a lot of quota code is very close to the disk, so they'd need a whole lot of transforms to kernel view. > > Signed-off-by: James Bottomley < > > James.Bottomley@HansenPartnership.com> > > > > diff --git a/include/linux/cred.h b/include/linux/cred.h > > index f0e70a1..18e9c41 100644 > > --- a/include/linux/cred.h > > +++ b/include/linux/cred.h > > @@ -399,4 +399,9 @@ do { > > \ > > *(_fsgid) = __cred->fsgid; \ > > } while(0) > > > > +/* return the current id in the filesystem view */ > > +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, > > current_fsuid()) > > +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, > > current_fsgid()) > > Could we please place these helpers in fs.h? We could ... the current_ helpers are in cred.h, which is why I put the new ones there, but I've no strong feelings either way. > That should allow them to become inline functions and live with the > existing filesystem helpers in there. I don't believe they did. There's code in most filesystems (usually in quota) where they need to perform calculations with the current user id. The problem is that with s_user_ns, they can't use current_fsuid() because it's the kernel view and the places where the filesystem is using it are often in the filesystem view. > My gut says the names disk_fsuid(i) and disk_fsgid(i) would be > clearer. I chose i_fsuid/fsgid for two reasons 1. because it takes an inode as an arguments. 2. to be consistent with i_uid_read/write() which are the other namespace shifting primitives for filesystems. I think 2. is quite compelling, so if you want a different name for this, we should rename i_uid/gid_read/write() as well. > Of course all of this has the challenge of error handling in the case > when current_fsuid or current_fsgid do not map into the current > filesystem. Yes, I think it actually fails in the quota case because unmapped usually gives uid/gid -1 which has no quota set, so you can bust out of your quota with the right s_user_ns. On the other hand if you can set up s_user_ns then you should be admin for that quota and it's caveat emptor. James -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley <James.Bottomley@HansenPartnership.com> writes: > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: >> James Bottomley <James.Bottomley@HansenPartnership.com> writes: >> >> > Now that we have two different views of filesystem ids (the >> > filesystem view and the kernel view), we have a problem in that >> > current_fsuid/fsgid() return the kernel view but are sometimes used >> > in filesystem code where the filesystem view shoud be used. This >> > patch introduces helpers to produce the filesystem view of current >> > fsuid and fsgid. >> >> If I am reading this right what we are seeing is that xfs explicitly >> opted out of type safety with predictable results. Accidentally >> confusing kuids and uids, which is potentially security issue. >> >> All of that said where are you getting sb->s_user_ns != &init_user_ns >> for an xfs filesystem? James please answer this question: Where are you getting sb->s_user_ns != &init_user_ns for an xfs filesystem? None of this matters if sb->s_user_ns == &init_user_ns. This is signification because only xfs keeps any in-core data structure in it's on-disk encoding. So this problem is xfs specific. So understanding how you are getting xfs to have sb->s_user_ns != &init_user_ns is important for discussing which direction we go with helper functions here. xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such no fixes are needed. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote: > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: > > > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > > > > > Now that we have two different views of filesystem ids (the > > > > filesystem view and the kernel view), we have a problem in that > > > > current_fsuid/fsgid() return the kernel view but are sometimes > > > > used > > > > in filesystem code where the filesystem view shoud be used. > > > > This > > > > patch introduces helpers to produce the filesystem view of > > > > current > > > > fsuid and fsgid. > > > > > > If I am reading this right what we are seeing is that xfs > > > explicitly > > > opted out of type safety with predictable results. Accidentally > > > confusing kuids and uids, which is potentially security issue. > > > > > > All of that said where are you getting sb->s_user_ns != > > > &init_user_ns > > > for an xfs filesystem? > > James please answer this question: > > Where are you getting sb->s_user_ns != &init_user_ns for an xfs > filesystem? I'm playing with a patch that allows host admin to set up an unprivileged container for a guest to use. One of the extensions is to allow anything possessing capability(CAP_SYS_ADMIN) to make s_user_ns follow mnt_ns->user_ns for new mounts (as an option). The idea was to see if root could set up an id shifted container with just the current s_user_ns infrastructure. > None of this matters if sb->s_user_ns == &init_user_ns. > > This is signification because only xfs keeps any in-core data > structure in it's on-disk encoding. So this problem is xfs specific. > So understanding how you are getting xfs to have sb->s_user_ns != > &init_user_ns is important for discussing which direction we go with > helper functions here. > > xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such > no fixes are needed. So what you're saying is that unless the unprivileged container could mount the filesystem itself (i.e. only those possessing the FS_USERNS_MOUNT flag) the filesystems are going to be full of problems like this. I suppose whether it's worthwhile trying to fix them all depends on whether the ability of the administrator to set up an id shifted container is useful or not. James -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley <James.Bottomley@HansenPartnership.com> writes: > On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote: >> James Bottomley <James.Bottomley@HansenPartnership.com> writes: >> >> > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: >> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes: >> > > >> > > > Now that we have two different views of filesystem ids (the >> > > > filesystem view and the kernel view), we have a problem in that >> > > > current_fsuid/fsgid() return the kernel view but are sometimes >> > > > used >> > > > in filesystem code where the filesystem view shoud be used. >> > > > This >> > > > patch introduces helpers to produce the filesystem view of >> > > > current >> > > > fsuid and fsgid. >> > > >> > > If I am reading this right what we are seeing is that xfs >> > > explicitly >> > > opted out of type safety with predictable results. Accidentally >> > > confusing kuids and uids, which is potentially security issue. >> > > >> > > All of that said where are you getting sb->s_user_ns != >> > > &init_user_ns >> > > for an xfs filesystem? >> >> James please answer this question: >> >> Where are you getting sb->s_user_ns != &init_user_ns for an xfs >> filesystem? > > I'm playing with a patch that allows host admin to set up an > unprivileged container for a guest to use. One of the extensions is to > allow anything possessing capability(CAP_SYS_ADMIN) to make s_user_ns > follow mnt_ns->user_ns for new mounts (as an option). The idea was to > see if root could set up an id shifted container with just the current > s_user_ns infrastructure. > >> None of this matters if sb->s_user_ns == &init_user_ns. >> >> This is signification because only xfs keeps any in-core data >> structure in it's on-disk encoding. So this problem is xfs specific. >> So understanding how you are getting xfs to have sb->s_user_ns != >> &init_user_ns is important for discussing which direction we go with >> helper functions here. >> >> xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such >> no fixes are needed. > > So what you're saying is that unless the unprivileged container could > mount the filesystem itself (i.e. only those possessing the > FS_USERNS_MOUNT flag) the filesystems are going to be full of problems > like this. I suppose whether it's worthwhile trying to fix them all > depends on whether the ability of the administrator to set up an id > shifted container is useful or not. Yes. Setting s_user_ns and expecting everything to work with a review/test cycle of the filesystem to shake out any rough edges is likely to be problematic. For historical reasons I actually expect xfs is especially bad in this regard. So in practice I would definitely start a feature like that with another filesystem. I would be happy to have a FS_S_USER_NS flag to say all that is well, and the filesystem supports s_user_ns != &init_user_ns. The bar is much lower if a trusted user with CAP_SYS_ADMIN is mounting the filesystem than if an unprivileged user is mounting the filesystem. As we don't have to worry about specially crafted malicious filesystem images. In practice I think I would have passed in the user namespace via a file descriptor to mount rather than inheriting it from the mount namespace (more flexibility for roughly the same amount of code). Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2017-02-17 at 14:15 +1300, Eric W. Biederman wrote: > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote: > > > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > > > > > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: > > > > > James Bottomley <James.Bottomley@HansenPartnership.com> > > > > > writes: > > > > > > > > > > > Now that we have two different views of filesystem ids (the > > > > > > filesystem view and the kernel view), we have a problem in > > > > > > that current_fsuid/fsgid() return the kernel view but are > > > > > > sometimes used in filesystem code where the filesystem view > > > > > > shoud be used. This patch introduces helpers to produce > > > > > > the filesystem view of current fsuid and fsgid. > > > > > > > > > > If I am reading this right what we are seeing is that xfs > > > > > explicitly opted out of type safety with predictable results. > > > > > Accidentally confusing kuids and uids, which is potentially > > > > > security issue. > > > > > > > > > > All of that said where are you getting sb->s_user_ns != > > > > > &init_user_ns for an xfs filesystem? > > > > > > James please answer this question: > > > > > > Where are you getting sb->s_user_ns != &init_user_ns for an xfs > > > filesystem? > > > > I'm playing with a patch that allows host admin to set up an > > unprivileged container for a guest to use. One of the extensions > > is to allow anything possessing capability(CAP_SYS_ADMIN) to make > > s_user_ns follow mnt_ns->user_ns for new mounts (as an option). > > The idea was to see if root could set up an id shifted container > > with just the current s_user_ns infrastructure. > > > > > None of this matters if sb->s_user_ns == &init_user_ns. > > > > > > This is signification because only xfs keeps any in-core data > > > structure in it's on-disk encoding. So this problem is xfs > > > specific. > > > So understanding how you are getting xfs to have sb->s_user_ns > > > != &init_user_ns is important for discussing which direction we > > > go with helper functions here. > > > > > > xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as > > > such no fixes are needed. > > > > So what you're saying is that unless the unprivileged container > > could mount the filesystem itself (i.e. only those possessing the > > FS_USERNS_MOUNT flag) the filesystems are going to be full of > > problems like this. I suppose whether it's worthwhile trying to > > fix them all depends on whether the ability of the administrator to > > set up an id shifted container is useful or not. > > Yes. Setting s_user_ns and expecting everything to work with a > review/test cycle of the filesystem to shake out any rough edges is > likely to be problematic. For historical reasons I actually expect > xfs is especially bad in this regard. So in practice I would > definitely start a feature like that with another filesystem. It's a pragmatic choice: xfs is the filesystem on my current laptop. I know xfs was once very problematic for the user namespace, but having looked through the code several times, the namespace shifts are now nicely abstracted and easy to identify, so I don't anticipate any extra difficulty today. > I would be happy to have a FS_S_USER_NS flag to say all that is well, > and the filesystem supports s_user_ns != &init_user_ns. The bar is > much lower if a trusted user with CAP_SYS_ADMIN is mounting the > filesystem than if an unprivileged user is mounting the filesystem. > As we don't have to worry about specially crafted malicious > filesystem images. > > In practice I think I would have passed in the user namespace via a > file descriptor to mount rather than inheriting it from the mount > namespace (more flexibility for roughly the same amount of code). I agree on this, but lets leave the implementation details on the side for a while and examine the "should we do this?" question. I can see two reasons why we might need to have this functionality 1. Orchestration system use case: the orchestration system wants to build an unprivileged container root from an image file or overlay (I think this covers docker). 2. USB (or other) device insertion redirected to container. In this case, we'd like the mount on insertion to follow the container user_ns. The reason I could see not bothering with this is that it doesn't fix the shift on a subtree issue and fixing that gives a system which can also be used to solve both cases above. James -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley <James.Bottomley@HansenPartnership.com> writes: > On Fri, 2017-02-17 at 14:15 +1300, Eric W. Biederman wrote: >> James Bottomley <James.Bottomley@HansenPartnership.com> writes: >> >> > On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote: >> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes: >> > > >> > > > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote: >> > > > > James Bottomley <James.Bottomley@HansenPartnership.com> >> > > > > writes: >> > > > > >> > > > > > Now that we have two different views of filesystem ids (the >> > > > > > filesystem view and the kernel view), we have a problem in >> > > > > > that current_fsuid/fsgid() return the kernel view but are >> > > > > > sometimes used in filesystem code where the filesystem view >> > > > > > shoud be used. This patch introduces helpers to produce >> > > > > > the filesystem view of current fsuid and fsgid. >> > > > > >> > > > > If I am reading this right what we are seeing is that xfs >> > > > > explicitly opted out of type safety with predictable results. >> > > > > Accidentally confusing kuids and uids, which is potentially >> > > > > security issue. >> > > > > >> > > > > All of that said where are you getting sb->s_user_ns != >> > > > > &init_user_ns for an xfs filesystem? >> > > >> > > James please answer this question: >> > > >> > > Where are you getting sb->s_user_ns != &init_user_ns for an xfs >> > > filesystem? >> > >> > I'm playing with a patch that allows host admin to set up an >> > unprivileged container for a guest to use. One of the extensions >> > is to allow anything possessing capability(CAP_SYS_ADMIN) to make >> > s_user_ns follow mnt_ns->user_ns for new mounts (as an option). >> > The idea was to see if root could set up an id shifted container >> > with just the current s_user_ns infrastructure. >> > >> > > None of this matters if sb->s_user_ns == &init_user_ns. >> > > >> > > This is signification because only xfs keeps any in-core data >> > > structure in it's on-disk encoding. So this problem is xfs >> > > specific. >> > > So understanding how you are getting xfs to have sb->s_user_ns >> > > != &init_user_ns is important for discussing which direction we >> > > go with helper functions here. >> > > >> > > xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as >> > > such no fixes are needed. >> > >> > So what you're saying is that unless the unprivileged container >> > could mount the filesystem itself (i.e. only those possessing the >> > FS_USERNS_MOUNT flag) the filesystems are going to be full of >> > problems like this. I suppose whether it's worthwhile trying to >> > fix them all depends on whether the ability of the administrator to >> > set up an id shifted container is useful or not. >> >> Yes. Setting s_user_ns and expecting everything to work with a >> review/test cycle of the filesystem to shake out any rough edges is >> likely to be problematic. For historical reasons I actually expect >> xfs is especially bad in this regard. So in practice I would >> definitely start a feature like that with another filesystem. > > It's a pragmatic choice: xfs is the filesystem on my current laptop. I > know xfs was once very problematic for the user namespace, but having > looked through the code several times, the namespace shifts are now > nicely abstracted and easy to identify, so I don't anticipate any extra > difficulty today. I think you have already encountered the extra difficulty. For xfs a couple of little things need to be fixed. I expect most filesystems will pretty much work out of the box. >> I would be happy to have a FS_S_USER_NS flag to say all that is well, >> and the filesystem supports s_user_ns != &init_user_ns. The bar is >> much lower if a trusted user with CAP_SYS_ADMIN is mounting the >> filesystem than if an unprivileged user is mounting the filesystem. >> As we don't have to worry about specially crafted malicious >> filesystem images. >> >> In practice I think I would have passed in the user namespace via a >> file descriptor to mount rather than inheriting it from the mount >> namespace (more flexibility for roughly the same amount of code). > > I agree on this, but lets leave the implementation details on the side > for a while and examine the "should we do this?" question. > > I can see two reasons why we might need to have this functionality > > 1. Orchestration system use case: the orchestration system wants to > build an unprivileged container root from an image file or overlay > (I think this covers docker). > 2. USB (or other) device insertion redirected to container. In this > case, we'd like the mount on insertion to follow the container > user_ns. I think those are valid. The Docker/runc cases that I am familiar with really want the sharing of base images between containers. To share the base image between containers requires having a different mapping per container to separate them. The savings on disk space and vfs cache sharing is important for them. I am torn on the fact that this sneaks up on the issue of what happens when someone injects a malicious disk image into this process. If we have a full to handling malicious disk images we can just set FS_USERNS_MOUNT. All of these use case look like cases where it would be very reasy for the mounter of the filesystem to skip ensuring they trust the path that generated the filesystem. On the other hand that is nothing new. > The reason I could see not bothering with this is that it doesn't fix > the shift on a subtree issue and fixing that gives a system which can > also be used to solve both cases above. The only reasons I have been not bothering with this are: - Different mappings into different containers. - It's closeness to S_USER_NS. - A focus and getting fuse and the generic vfs bits covered and merged. But at this point I think a generic vfs option that would set s_user_ns and work on filesystems that opt in would be perfectly reasonable. Especially since (a) we want to be able to display which user namespace s_user_ns is in, and a generic mount option seems like a way to sneak it into existing proc files, and (b) we want the file descriptor parsing code for shiftfs. So it seems like we might as well implement the functionality as a generic mount option and let the filesystems opt in with FS_USERNS_MOUNT or FS_S_USER_NS if the filesystem is not up to a full unprivileged unmount. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/cred.h b/include/linux/cred.h index f0e70a1..18e9c41 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -399,4 +399,9 @@ do { \ *(_fsgid) = __cred->fsgid; \ } while(0) +/* return the current id in the filesystem view */ +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, current_fsuid()) +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, current_fsgid()) + + #endif /* _LINUX_CRED_H */
Now that we have two different views of filesystem ids (the filesystem view and the kernel view), we have a problem in that current_fsuid/fsgid() return the kernel view but are sometimes used in filesystem code where the filesystem view shoud be used. This patch introduces helpers to produce the filesystem view of current fsuid and fsgid. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html