[0/7] Initial support for user namespace owned mounts
diff mbox

Message ID 20150728204009.GF83521@ubuntu-hedt
State New
Headers show

Commit Message

Seth Forshee July 28, 2015, 8:40 p.m. UTC
On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
> > This is what I currently think you want for user ns mounts:
> >
> >  1. smk_root and smk_default are assigned the label of the backing
> >     device.
> >  2. s_root is assigned the transmute property.
> >  3. For existing files:
> >     a. Files with the same label as the backing device are accessible.
> >     b. Files with any other label are not accessible.
> 
> That's right. Accept correct data, reject anything that's not right.
> 
> > If this is right, there are a couple lingering questions in my mind.
> >
> > First, what happens with files created in directories with the same
> > label as the backing device but without the transmute property set? The
> > inode for the new file will initially be labeled with smk_of_current(),
> > but then during d_instantiate it will get smk_default and thus end up
> > with the label we want. So that seems okay.
> 
> Yes.
> 
> > The second is whether files with the SMACK64EXEC attribute is still a
> > problem. It seems it is, for files with the same label as the backing
> > store at least. I think we can simply skip the code that reads out this
> > xattr and sets smk_task for user ns mounts, or else skip assigning the
> > label to the new task in bprm_set_creds. The latter seems more
> > consistent with the approach you've suggested for dealing with labels
> > from disk.
> 
> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
> smack_d_instantiate for unprivileged mounts would do the trick.
> 
> > So I guess all of that seems okay, though perhaps a bit restrictive
> > given that the user who mounted the filesystem already has full access
> > to the backing store.
> 
> In truth, there is no reason to expect that the "user" who did the
> mount will ever have a Smack label that differs from the label of
> the backing store. If what we've got here seems restrictive, it's
> because you've got access from someone other than the "user".
> 
> > Please let me know whether or not this matches up with what you are
> > thinking, then I can procede with the implementation.
> 
> My current mindset is that, if you're going to allow unprivileged
> mounts of user defined backing stores, this is as safe as we can
> make it.

All right, I've got a patch which I think does this, and I've managed to
do some testing to confirm that it behaves like I expect. How does this
look?

What's missing is getting the label from the block device inode; as
Stephen discovered the inode that I thought we could get the label from
turned out to be the wrong one. Afaict we would need a new hook in order
to do that, so for now I'm using the label of the proccess calling
mount.

---

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Casey Schaufler July 30, 2015, 4:18 p.m. UTC | #1
On 7/28/2015 1:40 PM, Seth Forshee wrote:
> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
>>> This is what I currently think you want for user ns mounts:
>>>
>>>  1. smk_root and smk_default are assigned the label of the backing
>>>     device.
>>>  2. s_root is assigned the transmute property.
>>>  3. For existing files:
>>>     a. Files with the same label as the backing device are accessible.
>>>     b. Files with any other label are not accessible.
>> That's right. Accept correct data, reject anything that's not right.
>>
>>> If this is right, there are a couple lingering questions in my mind.
>>>
>>> First, what happens with files created in directories with the same
>>> label as the backing device but without the transmute property set? The
>>> inode for the new file will initially be labeled with smk_of_current(),
>>> but then during d_instantiate it will get smk_default and thus end up
>>> with the label we want. So that seems okay.
>> Yes.
>>
>>> The second is whether files with the SMACK64EXEC attribute is still a
>>> problem. It seems it is, for files with the same label as the backing
>>> store at least. I think we can simply skip the code that reads out this
>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
>>> label to the new task in bprm_set_creds. The latter seems more
>>> consistent with the approach you've suggested for dealing with labels
>>> from disk.
>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
>> smack_d_instantiate for unprivileged mounts would do the trick.
>>
>>> So I guess all of that seems okay, though perhaps a bit restrictive
>>> given that the user who mounted the filesystem already has full access
>>> to the backing store.
>> In truth, there is no reason to expect that the "user" who did the
>> mount will ever have a Smack label that differs from the label of
>> the backing store. If what we've got here seems restrictive, it's
>> because you've got access from someone other than the "user".
>>
>>> Please let me know whether or not this matches up with what you are
>>> thinking, then I can procede with the implementation.
>> My current mindset is that, if you're going to allow unprivileged
>> mounts of user defined backing stores, this is as safe as we can
>> make it.
> All right, I've got a patch which I think does this, and I've managed to
> do some testing to confirm that it behaves like I expect. How does this
> look?
>
> What's missing is getting the label from the block device inode; as
> Stephen discovered the inode that I thought we could get the label from
> turned out to be the wrong one. Afaict we would need a new hook in order
> to do that, so for now I'm using the label of the proccess calling
> mount.

That will be OK if the mount processing checks for write access to
the backing store. I haven't looked to see if it does. If it doesn't
the problems should be pretty obvious.

>
> ---
>
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index a143328f75eb..8e631a66b03c 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data)
>  		skp = smk_of_current();
>  		sp->smk_root = skp;
>  		sp->smk_default = skp;
> +		if (sb_in_userns(sb))
> +			transmute = 1;
>  	}
>  	/*
>  	 * Initialize the root inode.
> @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask)
>  	if (mask == 0)
>  		return 0;
>  
> +	if (sb_in_userns(inode->i_sb)) {
> +		struct superblock_smack *sbsp = inode->i_sb->s_security;
> +		if (smk_of_inode(inode) != sbsp->smk_root)
> +			return -EACCES;
> +	}
> +
>  	/* May be droppable after audit */
>  	if (no_block)
>  		return -ECHILD;
> @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
>  			if (rc >= 0)
>  				transflag = SMK_INODE_TRANSMUTE;
>  		}
> -		/*
> -		 * Don't let the exec or mmap label be "*" or "@".
> -		 */
> -		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> -		if (IS_ERR(skp) || skp == &smack_known_star ||
> -		    skp == &smack_known_web)
> -			skp = NULL;
> -		isp->smk_task = skp;
> +		if (!sb_in_userns(inode->i_sb)) {
> +			/*
> +			 * Don't let the exec or mmap label be "*" or "@".
> +			 */
> +			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
> +			if (IS_ERR(skp) || skp == &smack_known_star ||
> +			    skp == &smack_known_web)
> +				skp = NULL;
> +			isp->smk_task = skp;
> +		}
>  
>  		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
>  		if (IS_ERR(skp) || skp == &smack_known_star ||
>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman July 30, 2015, 5:05 p.m. UTC | #2
Casey Schaufler <casey@schaufler-ca.com> writes:

> On 7/28/2015 1:40 PM, Seth Forshee wrote:
>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
>>>> This is what I currently think you want for user ns mounts:
>>>>
>>>>  1. smk_root and smk_default are assigned the label of the backing
>>>>     device.
>>>>  2. s_root is assigned the transmute property.
>>>>  3. For existing files:
>>>>     a. Files with the same label as the backing device are accessible.
>>>>     b. Files with any other label are not accessible.
>>> That's right. Accept correct data, reject anything that's not right.
>>>
>>>> If this is right, there are a couple lingering questions in my mind.
>>>>
>>>> First, what happens with files created in directories with the same
>>>> label as the backing device but without the transmute property set? The
>>>> inode for the new file will initially be labeled with smk_of_current(),
>>>> but then during d_instantiate it will get smk_default and thus end up
>>>> with the label we want. So that seems okay.
>>> Yes.
>>>
>>>> The second is whether files with the SMACK64EXEC attribute is still a
>>>> problem. It seems it is, for files with the same label as the backing
>>>> store at least. I think we can simply skip the code that reads out this
>>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
>>>> label to the new task in bprm_set_creds. The latter seems more
>>>> consistent with the approach you've suggested for dealing with labels
>>>> from disk.
>>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
>>> smack_d_instantiate for unprivileged mounts would do the trick.
>>>
>>>> So I guess all of that seems okay, though perhaps a bit restrictive
>>>> given that the user who mounted the filesystem already has full access
>>>> to the backing store.
>>> In truth, there is no reason to expect that the "user" who did the
>>> mount will ever have a Smack label that differs from the label of
>>> the backing store. If what we've got here seems restrictive, it's
>>> because you've got access from someone other than the "user".
>>>
>>>> Please let me know whether or not this matches up with what you are
>>>> thinking, then I can procede with the implementation.
>>> My current mindset is that, if you're going to allow unprivileged
>>> mounts of user defined backing stores, this is as safe as we can
>>> make it.
>> All right, I've got a patch which I think does this, and I've managed to
>> do some testing to confirm that it behaves like I expect. How does this
>> look?
>>
>> What's missing is getting the label from the block device inode; as
>> Stephen discovered the inode that I thought we could get the label from
>> turned out to be the wrong one. Afaict we would need a new hook in order
>> to do that, so for now I'm using the label of the proccess calling
>> mount.
>
> That will be OK if the mount processing checks for write access to
> the backing store. I haven't looked to see if it does. If it doesn't
> the problems should be pretty obvious.


do_new_mount
  vfs_kern_mount
    mount_fs
      ...
        mount_bdev
          blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...)
            lookup_bdev
              kern_path
                filename_lookup
                  path_lookupat
                    lookup_last
                      walk_component
            blkdev_get(...,mode,...)
              __blkdev_get(...,mode,...)
                devcgroup_inode_permission(bdev->bd_inode, perm)

*scratches my head*

It looks like we don't actually check the permissions on the block
device.  Tomoyo has a hack for it.  nfsd does something.  There is
devcgroup silliness.

But overall it looks like we depend on capable(CAP_SYS_ADMIN).

Seth I do believe we have found another area of the vfs we will need to
short up before allowing unprivileged mounts of block device based
filesystems.

It looks like there are enough hacks someone with a clue coming through
and making the code make more sense seems like a good idea anyway.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seth Forshee July 30, 2015, 5:25 p.m. UTC | #3
On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote:
> Casey Schaufler <casey@schaufler-ca.com> writes:
> 
> > On 7/28/2015 1:40 PM, Seth Forshee wrote:
> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
> >>>> This is what I currently think you want for user ns mounts:
> >>>>
> >>>>  1. smk_root and smk_default are assigned the label of the backing
> >>>>     device.
> >>>>  2. s_root is assigned the transmute property.
> >>>>  3. For existing files:
> >>>>     a. Files with the same label as the backing device are accessible.
> >>>>     b. Files with any other label are not accessible.
> >>> That's right. Accept correct data, reject anything that's not right.
> >>>
> >>>> If this is right, there are a couple lingering questions in my mind.
> >>>>
> >>>> First, what happens with files created in directories with the same
> >>>> label as the backing device but without the transmute property set? The
> >>>> inode for the new file will initially be labeled with smk_of_current(),
> >>>> but then during d_instantiate it will get smk_default and thus end up
> >>>> with the label we want. So that seems okay.
> >>> Yes.
> >>>
> >>>> The second is whether files with the SMACK64EXEC attribute is still a
> >>>> problem. It seems it is, for files with the same label as the backing
> >>>> store at least. I think we can simply skip the code that reads out this
> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
> >>>> label to the new task in bprm_set_creds. The latter seems more
> >>>> consistent with the approach you've suggested for dealing with labels
> >>>> from disk.
> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
> >>> smack_d_instantiate for unprivileged mounts would do the trick.
> >>>
> >>>> So I guess all of that seems okay, though perhaps a bit restrictive
> >>>> given that the user who mounted the filesystem already has full access
> >>>> to the backing store.
> >>> In truth, there is no reason to expect that the "user" who did the
> >>> mount will ever have a Smack label that differs from the label of
> >>> the backing store. If what we've got here seems restrictive, it's
> >>> because you've got access from someone other than the "user".
> >>>
> >>>> Please let me know whether or not this matches up with what you are
> >>>> thinking, then I can procede with the implementation.
> >>> My current mindset is that, if you're going to allow unprivileged
> >>> mounts of user defined backing stores, this is as safe as we can
> >>> make it.
> >> All right, I've got a patch which I think does this, and I've managed to
> >> do some testing to confirm that it behaves like I expect. How does this
> >> look?
> >>
> >> What's missing is getting the label from the block device inode; as
> >> Stephen discovered the inode that I thought we could get the label from
> >> turned out to be the wrong one. Afaict we would need a new hook in order
> >> to do that, so for now I'm using the label of the proccess calling
> >> mount.
> >
> > That will be OK if the mount processing checks for write access to
> > the backing store. I haven't looked to see if it does. If it doesn't
> > the problems should be pretty obvious.
> 
> 
> do_new_mount
>   vfs_kern_mount
>     mount_fs
>       ...
>         mount_bdev
>           blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...)
>             lookup_bdev
>               kern_path
>                 filename_lookup
>                   path_lookupat
>                     lookup_last
>                       walk_component
>             blkdev_get(...,mode,...)
>               __blkdev_get(...,mode,...)
>                 devcgroup_inode_permission(bdev->bd_inode, perm)
> 
> *scratches my head*
> 
> It looks like we don't actually check the permissions on the block
> device.  Tomoyo has a hack for it.  nfsd does something.  There is
> devcgroup silliness.
> 
> But overall it looks like we depend on capable(CAP_SYS_ADMIN).
> 
> Seth I do believe we have found another area of the vfs we will need to
> short up before allowing unprivileged mounts of block device based
> filesystems.
> 
> It looks like there are enough hacks someone with a clue coming through
> and making the code make more sense seems like a good idea anyway.

Yep, I just came to the same conclusion myself, and I also verified the
behavior emperically. That's definitely a problem. I'll get to work on
fixing that.

Seth
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman July 30, 2015, 5:33 p.m. UTC | #4
Seth Forshee <seth.forshee@canonical.com> writes:

> On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote:
>> Casey Schaufler <casey@schaufler-ca.com> writes:
>> 
>> > On 7/28/2015 1:40 PM, Seth Forshee wrote:
>> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote:
>> >>>> This is what I currently think you want for user ns mounts:
>> >>>>
>> >>>>  1. smk_root and smk_default are assigned the label of the backing
>> >>>>     device.
>> >>>>  2. s_root is assigned the transmute property.
>> >>>>  3. For existing files:
>> >>>>     a. Files with the same label as the backing device are accessible.
>> >>>>     b. Files with any other label are not accessible.
>> >>> That's right. Accept correct data, reject anything that's not right.
>> >>>
>> >>>> If this is right, there are a couple lingering questions in my mind.
>> >>>>
>> >>>> First, what happens with files created in directories with the same
>> >>>> label as the backing device but without the transmute property set? The
>> >>>> inode for the new file will initially be labeled with smk_of_current(),
>> >>>> but then during d_instantiate it will get smk_default and thus end up
>> >>>> with the label we want. So that seems okay.
>> >>> Yes.
>> >>>
>> >>>> The second is whether files with the SMACK64EXEC attribute is still a
>> >>>> problem. It seems it is, for files with the same label as the backing
>> >>>> store at least. I think we can simply skip the code that reads out this
>> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the
>> >>>> label to the new task in bprm_set_creds. The latter seems more
>> >>>> consistent with the approach you've suggested for dealing with labels
>> >>>> from disk.
>> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in
>> >>> smack_d_instantiate for unprivileged mounts would do the trick.
>> >>>
>> >>>> So I guess all of that seems okay, though perhaps a bit restrictive
>> >>>> given that the user who mounted the filesystem already has full access
>> >>>> to the backing store.
>> >>> In truth, there is no reason to expect that the "user" who did the
>> >>> mount will ever have a Smack label that differs from the label of
>> >>> the backing store. If what we've got here seems restrictive, it's
>> >>> because you've got access from someone other than the "user".
>> >>>
>> >>>> Please let me know whether or not this matches up with what you are
>> >>>> thinking, then I can procede with the implementation.
>> >>> My current mindset is that, if you're going to allow unprivileged
>> >>> mounts of user defined backing stores, this is as safe as we can
>> >>> make it.
>> >> All right, I've got a patch which I think does this, and I've managed to
>> >> do some testing to confirm that it behaves like I expect. How does this
>> >> look?
>> >>
>> >> What's missing is getting the label from the block device inode; as
>> >> Stephen discovered the inode that I thought we could get the label from
>> >> turned out to be the wrong one. Afaict we would need a new hook in order
>> >> to do that, so for now I'm using the label of the proccess calling
>> >> mount.
>> >
>> > That will be OK if the mount processing checks for write access to
>> > the backing store. I haven't looked to see if it does. If it doesn't
>> > the problems should be pretty obvious.
>> 
>> 
>> do_new_mount
>>   vfs_kern_mount
>>     mount_fs
>>       ...
>>         mount_bdev
>>           blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...)
>>             lookup_bdev
>>               kern_path
>>                 filename_lookup
>>                   path_lookupat
>>                     lookup_last
>>                       walk_component
>>             blkdev_get(...,mode,...)
>>               __blkdev_get(...,mode,...)
>>                 devcgroup_inode_permission(bdev->bd_inode, perm)
>> 
>> *scratches my head*
>> 
>> It looks like we don't actually check the permissions on the block
>> device.  Tomoyo has a hack for it.  nfsd does something.  There is
>> devcgroup silliness.
>> 
>> But overall it looks like we depend on capable(CAP_SYS_ADMIN).
>> 
>> Seth I do believe we have found another area of the vfs we will need to
>> short up before allowing unprivileged mounts of block device based
>> filesystems.
>> 
>> It looks like there are enough hacks someone with a clue coming through
>> and making the code make more sense seems like a good idea anyway.
>
> Yep, I just came to the same conclusion myself, and I also verified the
> behavior emperically. That's definitely a problem. I'll get to work on
> fixing that.

At a quick glance it looks like lookup_bdev, and most of it's callers
need to be modified to do potentially do the additional permission
checking.

I expect we could move the devcgroup checks into whatever new checks we
wind up adding.

Fun, fun fun.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index a143328f75eb..8e631a66b03c 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -662,6 +662,8 @@  static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data)
 		skp = smk_of_current();
 		sp->smk_root = skp;
 		sp->smk_default = skp;
+		if (sb_in_userns(sb))
+			transmute = 1;
 	}
 	/*
 	 * Initialize the root inode.
@@ -1023,6 +1025,12 @@  static int smack_inode_permission(struct inode *inode, int mask)
 	if (mask == 0)
 		return 0;
 
+	if (sb_in_userns(inode->i_sb)) {
+		struct superblock_smack *sbsp = inode->i_sb->s_security;
+		if (smk_of_inode(inode) != sbsp->smk_root)
+			return -EACCES;
+	}
+
 	/* May be droppable after audit */
 	if (no_block)
 		return -ECHILD;
@@ -3220,14 +3228,16 @@  static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			if (rc >= 0)
 				transflag = SMK_INODE_TRANSMUTE;
 		}
-		/*
-		 * Don't let the exec or mmap label be "*" or "@".
-		 */
-		skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
-		if (IS_ERR(skp) || skp == &smack_known_star ||
-		    skp == &smack_known_web)
-			skp = NULL;
-		isp->smk_task = skp;
+		if (!sb_in_userns(inode->i_sb)) {
+			/*
+			 * Don't let the exec or mmap label be "*" or "@".
+			 */
+			skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp);
+			if (IS_ERR(skp) || skp == &smack_known_star ||
+			    skp == &smack_known_web)
+				skp = NULL;
+			isp->smk_task = skp;
+		}
 
 		skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp);
 		if (IS_ERR(skp) || skp == &smack_known_star ||