[1/2] fs: add inode helpers for fsuid and fsgid

Message ID	1487053720.3125.73.camel@HansenPartnership.com (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <linux-xfs-owner@kernel.org> Message-ID: <1487053720.3125.73.camel@HansenPartnership.com> Subject: [PATCH 1/2] fs: add inode helpers for fsuid and fsgid From: James Bottomley <James.Bottomley@HansenPartnership.com> To: Christoph Hellwig <hch@infradead.org>, Dave Chinner <david@fromorbit.com> Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Eric W. Biederman" <ebiederm@xmission.com>, Seth Forshee <seth.forshee@canonical.com> Date: Mon, 13 Feb 2017 22:28:40 -0800 In-Reply-To: <1487053651.3125.72.camel@HansenPartnership.com> References: <1487008001.3125.41.camel@HansenPartnership.com> <20170213194337.GA9852@infradead.org> <20170213213416.GA15349@dastard> <20170214060809.GA21114@infradead.org> <1487053651.3125.72.camel@HansenPartnership.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

James Bottomley Feb. 14, 2017, 6:28 a.m. UTC

Now that we have two different views of filesystem ids (the filesystem
view and the kernel view), we have a problem in that
current_fsuid/fsgid() return the kernel view but are sometimes used in
filesystem code where the filesystem view shoud be used.  This patch
introduces helpers to produce the filesystem view of current fsuid and
fsgid.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Feb. 14, 2017, 7:46 a.m. UTC | #1

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> Now that we have two different views of filesystem ids (the filesystem
> view and the kernel view), we have a problem in that
> current_fsuid/fsgid() return the kernel view but are sometimes used in
> filesystem code where the filesystem view shoud be used.  This patch
> introduces helpers to produce the filesystem view of current fsuid and
> fsgid.

If I am reading this right what we are seeing is that xfs explicitly
opted out of type safety with predictable results.   Accidentally
confusing kuids and uids, which is potentially security issue.

All of that said where are you getting sb->s_user_ns != &init_user_ns
for an xfs filesystem?  There are quite a few xfs interfaces that are
not ready for that.   xfs has a very wide userspace interface of ioctls
that all needs to be looked at and addressed carefully if there is
anything like this going on.

I think we really need to ask if we should use kuids and kgids for the
xfs internal quota code.  At the end of the day that is going to be
a whole lot less error prone.

> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index f0e70a1..18e9c41 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -399,4 +399,9 @@ do {						\
>  	*(_fsgid) = __cred->fsgid;		\
>  } while(0)
>  
> +/* return the current id in the filesystem view */
> +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, current_fsuid())
> +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, current_fsgid())

Could we please place these helpers in fs.h?
That should allow them to become inline functions and live with the
existing filesystem helpers in there.

My gut says the names disk_fsuid(i) and disk_fsgid(i) would be clearer.

Of course all of this has the challenge of error handling in the case
when current_fsuid or current_fsgid do not map into the current
filesystem.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Feb. 14, 2017, 8 a.m. UTC | #2

On Tue, Feb 14, 2017 at 08:46:32PM +1300, Eric W. Biederman wrote:
> All of that said where are you getting sb->s_user_ns != &init_user_ns
> for an xfs filesystem?  There are quite a few xfs interfaces that are
> not ready for that.   xfs has a very wide userspace interface of ioctls
> that all needs to be looked at and addressed carefully if there is
> anything like this going on.

The only thing exposing uids/gid is the bulkstat code, and that's
easy to cover.

> > +/* return the current id in the filesystem view */
> > +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns, current_fsuid())
> > +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns, current_fsgid())
> 
> Could we please place these helpers in fs.h?
> That should allow them to become inline functions and live with the
> existing filesystem helpers in there.

And give them better names, i_* is rather cryptic.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley Feb. 14, 2017, 4:09 p.m. UTC | #3

On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > Now that we have two different views of filesystem ids (the 
> > filesystem view and the kernel view), we have a problem in that
> > current_fsuid/fsgid() return the kernel view but are sometimes used 
> > in filesystem code where the filesystem view shoud be used.  This
> > patch introduces helpers to produce the filesystem view of current 
> > fsuid and fsgid.
> 
> If I am reading this right what we are seeing is that xfs explicitly
> opted out of type safety with predictable results.   Accidentally
> confusing kuids and uids, which is potentially security issue.
> 
> All of that said where are you getting sb->s_user_ns != &init_user_ns
> for an xfs filesystem?  There are quite a few xfs interfaces that are
> not ready for that.   xfs has a very wide userspace interface of 
> ioctls that all needs to be looked at and addressed carefully if 
> there is anything like this going on.
> 
> I think we really need to ask if we should use kuids and kgids for 
> the xfs internal quota code.

That question devolves to who administers quota operations in
containers.   The answer is usually that apparent root in the container
needs to be able to administer quotas as though they were real root
outside, so transforming the user quota calculations is correct to
first order.

To second order we need a way of controlling the container's quota which is why we've had a flurry of two level quota patches over the years.  We've finally settled on group or project quotas and, if you look at xfs, you'll see the project quota will work even in the face of uid shifts in the user quota, so I think it's all working.

>   At the end of the day that is going to be a whole lot less error
> prone.

It would make the job of the filesystem write harder: a lot of quota
code is very close to the disk, so they'd need a whole lot of
transforms to kernel view.

> > Signed-off-by: James Bottomley <
> > James.Bottomley@HansenPartnership.com>
> > 
> > diff --git a/include/linux/cred.h b/include/linux/cred.h
> > index f0e70a1..18e9c41 100644
> > --- a/include/linux/cred.h
> > +++ b/include/linux/cred.h
> > @@ -399,4 +399,9 @@ do {						
> > \
> >  	*(_fsgid) = __cred->fsgid;		\
> >  } while(0)
> >  
> > +/* return the current id in the filesystem view */
> > +#define i_fsuid(i) from_kuid((i)->i_sb->s_user_ns,
> > current_fsuid())
> > +#define i_fsgid(i) from_kgid((i)->i_sb->s_user_ns,
> > current_fsgid())
> 
> Could we please place these helpers in fs.h?

We could ... the current_ helpers are in cred.h, which is why I put the
new ones there, but I've no strong feelings either way.

> That should allow them to become inline functions and live with the
> existing filesystem helpers in there.

I don't believe they did.  There's code in most filesystems (usually in
quota) where they need to perform calculations with the current user
id.  The problem is that with s_user_ns, they can't use current_fsuid()
because it's the kernel view and the places where the filesystem is
using it are often in the filesystem view.

> My gut says the names disk_fsuid(i) and disk_fsgid(i) would be
> clearer.

I chose i_fsuid/fsgid for two reasons

   1. because it takes an inode as an arguments.
   2. to be consistent with i_uid_read/write() which are the other
      namespace shifting primitives for filesystems.

I think 2. is quite compelling, so if you want a different name for
this, we should rename i_uid/gid_read/write() as well.

> Of course all of this has the challenge of error handling in the case
> when current_fsuid or current_fsgid do not map into the current
> filesystem.

Yes, I think it actually fails in the quota case because unmapped
usually gives uid/gid -1 which has no quota set, so you can bust out of
your quota with the right s_user_ns.  On the other hand if you can set
up s_user_ns then you should be admin for that quota and it's caveat
emptor.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Feb. 15, 2017, 2:29 a.m. UTC | #4

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
>> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> 
>> > Now that we have two different views of filesystem ids (the 
>> > filesystem view and the kernel view), we have a problem in that
>> > current_fsuid/fsgid() return the kernel view but are sometimes used 
>> > in filesystem code where the filesystem view shoud be used.  This
>> > patch introduces helpers to produce the filesystem view of current 
>> > fsuid and fsgid.
>> 
>> If I am reading this right what we are seeing is that xfs explicitly
>> opted out of type safety with predictable results.   Accidentally
>> confusing kuids and uids, which is potentially security issue.
>> 
>> All of that said where are you getting sb->s_user_ns != &init_user_ns
>> for an xfs filesystem?

James please answer this question:

 Where are you getting sb->s_user_ns != &init_user_ns for an xfs filesystem?

None of this matters if sb->s_user_ns == &init_user_ns.

This is signification because only xfs keeps any in-core data structure
in it's on-disk encoding.  So this problem is xfs specific.   So
understanding how you are getting xfs to have sb->s_user_ns !=
&init_user_ns is important for discussing which direction we go with
helper functions here.

xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such no
fixes are needed.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley Feb. 16, 2017, 3:43 p.m. UTC | #5

On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > 
> > > > Now that we have two different views of filesystem ids (the 
> > > > filesystem view and the kernel view), we have a problem in that
> > > > current_fsuid/fsgid() return the kernel view but are sometimes
> > > > used 
> > > > in filesystem code where the filesystem view shoud be used. 
> > > >  This
> > > > patch introduces helpers to produce the filesystem view of
> > > > current 
> > > > fsuid and fsgid.
> > > 
> > > If I am reading this right what we are seeing is that xfs
> > > explicitly
> > > opted out of type safety with predictable results.   Accidentally
> > > confusing kuids and uids, which is potentially security issue.
> > > 
> > > All of that said where are you getting sb->s_user_ns !=
> > > &init_user_ns
> > > for an xfs filesystem?
> 
> James please answer this question:
> 
>  Where are you getting sb->s_user_ns != &init_user_ns for an xfs
> filesystem?

I'm playing with a patch that allows host admin to set up an
unprivileged container for a guest to use.  One of the extensions is to
allow anything possessing capability(CAP_SYS_ADMIN) to make s_user_ns
follow mnt_ns->user_ns for new mounts (as an option).  The idea was to
see if root could set up an id shifted container with just the current
s_user_ns infrastructure.

> None of this matters if sb->s_user_ns == &init_user_ns.
> 
> This is signification because only xfs keeps any in-core data 
> structure in it's on-disk encoding.  So this problem is xfs specific.
>    So understanding how you are getting xfs to have sb->s_user_ns !=
> &init_user_ns is important for discussing which direction we go with
> helper functions here.
> 
> xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such 
> no fixes are needed.

So what you're saying is that unless the unprivileged container could
mount the filesystem itself (i.e. only those possessing the
FS_USERNS_MOUNT flag) the filesystems are going to be full of problems
like this.  I suppose whether it's worthwhile trying to fix them all
depends on whether the ability of the administrator to set up an id
shifted container is useful or not.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Feb. 17, 2017, 1:15 a.m. UTC | #6

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote:
>> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> 
>> > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
>> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> > > 
>> > > > Now that we have two different views of filesystem ids (the 
>> > > > filesystem view and the kernel view), we have a problem in that
>> > > > current_fsuid/fsgid() return the kernel view but are sometimes
>> > > > used 
>> > > > in filesystem code where the filesystem view shoud be used. 
>> > > >  This
>> > > > patch introduces helpers to produce the filesystem view of
>> > > > current 
>> > > > fsuid and fsgid.
>> > > 
>> > > If I am reading this right what we are seeing is that xfs
>> > > explicitly
>> > > opted out of type safety with predictable results.   Accidentally
>> > > confusing kuids and uids, which is potentially security issue.
>> > > 
>> > > All of that said where are you getting sb->s_user_ns !=
>> > > &init_user_ns
>> > > for an xfs filesystem?
>> 
>> James please answer this question:
>> 
>>  Where are you getting sb->s_user_ns != &init_user_ns for an xfs
>> filesystem?
>
> I'm playing with a patch that allows host admin to set up an
> unprivileged container for a guest to use.  One of the extensions is to
> allow anything possessing capability(CAP_SYS_ADMIN) to make s_user_ns
> follow mnt_ns->user_ns for new mounts (as an option).  The idea was to
> see if root could set up an id shifted container with just the current
> s_user_ns infrastructure.
>
>> None of this matters if sb->s_user_ns == &init_user_ns.
>> 
>> This is signification because only xfs keeps any in-core data 
>> structure in it's on-disk encoding.  So this problem is xfs specific.
>>    So understanding how you are getting xfs to have sb->s_user_ns !=
>> &init_user_ns is important for discussing which direction we go with
>> helper functions here.
>> 
>> xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as such 
>> no fixes are needed.
>
> So what you're saying is that unless the unprivileged container could
> mount the filesystem itself (i.e. only those possessing the
> FS_USERNS_MOUNT flag) the filesystems are going to be full of problems
> like this.  I suppose whether it's worthwhile trying to fix them all
> depends on whether the ability of the administrator to set up an id
> shifted container is useful or not.

Yes.  Setting s_user_ns and expecting everything to work with a
review/test cycle of the filesystem to shake out any rough edges is
likely to be problematic.  For historical reasons I actually expect xfs
is especially bad in this regard.  So in practice I would definitely
start a feature like that with another filesystem.

I would be happy to have a FS_S_USER_NS flag to say all that is well,
and the filesystem supports s_user_ns != &init_user_ns.  The bar is much
lower if a trusted user with CAP_SYS_ADMIN is mounting the filesystem
than if an unprivileged user is mounting the filesystem.  As we don't
have to worry about specially crafted malicious filesystem images.

In practice I think I would have passed in the user namespace via a file
descriptor to mount rather than inheriting it from the mount namespace
(more flexibility for roughly the same amount of code).

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley Feb. 17, 2017, 5:12 p.m. UTC | #7

On Fri, 2017-02-17 at 14:15 +1300, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote:
> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > 
> > > > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
> > > > > James Bottomley <James.Bottomley@HansenPartnership.com>
> > > > > writes:
> > > > > 
> > > > > > Now that we have two different views of filesystem ids (the
> > > > > > filesystem view and the kernel view), we have a problem in 
> > > > > > that current_fsuid/fsgid() return the kernel view but are
> > > > > > sometimes used in filesystem code where the filesystem view 
> > > > > > shoud be used.  This patch introduces helpers to produce 
> > > > > > the filesystem view of current fsuid and fsgid.
> > > > > 
> > > > > If I am reading this right what we are seeing is that xfs
> > > > > explicitly opted out of type safety with predictable results.
> > > > >  Accidentally confusing kuids and uids, which is potentially 
> > > > > security issue.
> > > > > 
> > > > > All of that said where are you getting sb->s_user_ns !=
> > > > > &init_user_ns for an xfs filesystem?
> > > 
> > > James please answer this question:
> > > 
> > >  Where are you getting sb->s_user_ns != &init_user_ns for an xfs
> > > filesystem?
> > 
> > I'm playing with a patch that allows host admin to set up an
> > unprivileged container for a guest to use.  One of the extensions 
> > is to allow anything possessing capability(CAP_SYS_ADMIN) to make
> > s_user_ns follow mnt_ns->user_ns for new mounts (as an option). 
> >  The idea was to see if root could set up an id shifted container 
> > with just the current s_user_ns infrastructure.
> > 
> > > None of this matters if sb->s_user_ns == &init_user_ns.
> > > 
> > > This is signification because only xfs keeps any in-core data 
> > > structure in it's on-disk encoding.  So this problem is xfs
> > > specific.
> > >    So understanding how you are getting xfs to have sb->s_user_ns 
> > > != &init_user_ns is important for discussing which direction we 
> > > go with helper functions here.
> > > 
> > > xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as 
> > > such no fixes are needed.
> > 
> > So what you're saying is that unless the unprivileged container 
> > could mount the filesystem itself (i.e. only those possessing the
> > FS_USERNS_MOUNT flag) the filesystems are going to be full of 
> > problems like this.  I suppose whether it's worthwhile trying to 
> > fix them all depends on whether the ability of the administrator to 
> > set up an id shifted container is useful or not.
> 
> Yes.  Setting s_user_ns and expecting everything to work with a
> review/test cycle of the filesystem to shake out any rough edges is
> likely to be problematic.  For historical reasons I actually expect 
> xfs is especially bad in this regard.  So in practice I would 
> definitely start a feature like that with another filesystem.

It's a pragmatic choice: xfs is the filesystem on my current laptop.  I
know xfs was once very problematic for the user namespace, but having
looked through the code several times, the namespace shifts are now
nicely abstracted and easy to identify, so I don't anticipate any extra
difficulty today.

> I would be happy to have a FS_S_USER_NS flag to say all that is well,
> and the filesystem supports s_user_ns != &init_user_ns.  The bar is 
> much lower if a trusted user with CAP_SYS_ADMIN is mounting the 
> filesystem than if an unprivileged user is mounting the filesystem. 
>  As we don't have to worry about specially crafted malicious
> filesystem images.
> 
> In practice I think I would have passed in the user namespace via a 
> file descriptor to mount rather than inheriting it from the mount
> namespace (more flexibility for roughly the same amount of code).

I agree on this, but lets leave the implementation details on the side
for a while and examine the "should we do this?" question.

I can see two reasons why we might need to have this functionality

   1. Orchestration system use case: the orchestration system wants to
      build an unprivileged container root from an image file or overlay
      (I think this covers docker).
   2. USB (or other) device insertion redirected to container.  In this
      case, we'd like the mount on insertion to follow the container
      user_ns.

The reason I could see not bothering with this is that it doesn't fix
the shift on a subtree issue and fixing that gives a system which can
also be used to solve both cases above.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Feb. 20, 2017, 4:56 a.m. UTC | #8

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Fri, 2017-02-17 at 14:15 +1300, Eric W. Biederman wrote:
>> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> 
>> > On Wed, 2017-02-15 at 15:29 +1300, Eric W. Biederman wrote:
>> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> > > 
>> > > > On Tue, 2017-02-14 at 20:46 +1300, Eric W. Biederman wrote:
>> > > > > James Bottomley <James.Bottomley@HansenPartnership.com>
>> > > > > writes:
>> > > > > 
>> > > > > > Now that we have two different views of filesystem ids (the
>> > > > > > filesystem view and the kernel view), we have a problem in 
>> > > > > > that current_fsuid/fsgid() return the kernel view but are
>> > > > > > sometimes used in filesystem code where the filesystem view 
>> > > > > > shoud be used.  This patch introduces helpers to produce 
>> > > > > > the filesystem view of current fsuid and fsgid.
>> > > > > 
>> > > > > If I am reading this right what we are seeing is that xfs
>> > > > > explicitly opted out of type safety with predictable results.
>> > > > >  Accidentally confusing kuids and uids, which is potentially 
>> > > > > security issue.
>> > > > > 
>> > > > > All of that said where are you getting sb->s_user_ns !=
>> > > > > &init_user_ns for an xfs filesystem?
>> > > 
>> > > James please answer this question:
>> > > 
>> > >  Where are you getting sb->s_user_ns != &init_user_ns for an xfs
>> > > filesystem?
>> > 
>> > I'm playing with a patch that allows host admin to set up an
>> > unprivileged container for a guest to use.  One of the extensions 
>> > is to allow anything possessing capability(CAP_SYS_ADMIN) to make
>> > s_user_ns follow mnt_ns->user_ns for new mounts (as an option). 
>> >  The idea was to see if root could set up an id shifted container 
>> > with just the current s_user_ns infrastructure.
>> > 
>> > > None of this matters if sb->s_user_ns == &init_user_ns.
>> > > 
>> > > This is signification because only xfs keeps any in-core data 
>> > > structure in it's on-disk encoding.  So this problem is xfs
>> > > specific.
>> > >    So understanding how you are getting xfs to have sb->s_user_ns 
>> > > != &init_user_ns is important for discussing which direction we 
>> > > go with helper functions here.
>> > > 
>> > > xfs with sb->s_user_ns == &init_user_ns is perfectly fine and as 
>> > > such no fixes are needed.
>> > 
>> > So what you're saying is that unless the unprivileged container 
>> > could mount the filesystem itself (i.e. only those possessing the
>> > FS_USERNS_MOUNT flag) the filesystems are going to be full of 
>> > problems like this.  I suppose whether it's worthwhile trying to 
>> > fix them all depends on whether the ability of the administrator to 
>> > set up an id shifted container is useful or not.
>> 
>> Yes.  Setting s_user_ns and expecting everything to work with a
>> review/test cycle of the filesystem to shake out any rough edges is
>> likely to be problematic.  For historical reasons I actually expect 
>> xfs is especially bad in this regard.  So in practice I would 
>> definitely start a feature like that with another filesystem.
>
> It's a pragmatic choice: xfs is the filesystem on my current laptop.  I
> know xfs was once very problematic for the user namespace, but having
> looked through the code several times, the namespace shifts are now
> nicely abstracted and easy to identify, so I don't anticipate any extra
> difficulty today.

I think you have already encountered the extra difficulty.  For xfs a
couple of little things need to be fixed.  I expect most filesystems
will pretty much work out of the box.

>> I would be happy to have a FS_S_USER_NS flag to say all that is well,
>> and the filesystem supports s_user_ns != &init_user_ns.  The bar is 
>> much lower if a trusted user with CAP_SYS_ADMIN is mounting the 
>> filesystem than if an unprivileged user is mounting the filesystem. 
>>  As we don't have to worry about specially crafted malicious
>> filesystem images.
>> 
>> In practice I think I would have passed in the user namespace via a 
>> file descriptor to mount rather than inheriting it from the mount
>> namespace (more flexibility for roughly the same amount of code).
>
> I agree on this, but lets leave the implementation details on the side
> for a while and examine the "should we do this?" question.
>
> I can see two reasons why we might need to have this functionality
>
>    1. Orchestration system use case: the orchestration system wants to
>       build an unprivileged container root from an image file or overlay
>       (I think this covers docker).
>    2. USB (or other) device insertion redirected to container.  In this
>       case, we'd like the mount on insertion to follow the container
>       user_ns.

I think those are valid.

The Docker/runc cases that I am familiar with really want the sharing of
base images between containers.  To share the base image between
containers requires having a different mapping per container to separate
them.  The savings on disk space and vfs cache sharing is important for
them.

I am torn on the fact that this sneaks up on the issue of what happens
when someone injects a malicious disk image into this process.  If we
have a full to handling malicious disk images we can just set
FS_USERNS_MOUNT.  All of these use case look like cases where
it would be very reasy for the mounter of the filesystem to skip
ensuring they trust the path that generated the filesystem.  On the
other hand that is nothing new.

> The reason I could see not bothering with this is that it doesn't fix
> the shift on a subtree issue and fixing that gives a system which can
> also be used to solve both cases above.

The only reasons I have been not bothering with this are:
- Different mappings into different containers.
- It's closeness to S_USER_NS.
- A focus and getting fuse and the generic vfs bits covered and merged.

But at this point I think a generic vfs option that would set s_user_ns
and work on filesystems that opt in would be perfectly reasonable.
Especially since (a) we want to be able to display which user namespace
s_user_ns is in, and a generic mount option seems like a way to sneak it
into existing proc files, and (b) we want the file descriptor parsing code
for shiftfs.

So it seems like we might as well implement the functionality as a
generic mount option and let the filesystems opt in with FS_USERNS_MOUNT
or FS_S_USER_NS if the filesystem is not up to a full unprivileged
unmount.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/2] fs: add inode helpers for fsuid and fsgid

Commit Message

Comments

Patch