diff mbox series

[v8,1/4] fs: add mode_strip_sgid() helper

Message ID 1650971490-4532-1-git-send-email-xuyang2018.jy@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series [v8,1/4] fs: add mode_strip_sgid() helper | expand

Commit Message

Yang Xu (Fujitsu) April 26, 2022, 11:11 a.m. UTC
Add a dedicated helper to handle the setgid bit when creating a new file
in a setgid directory. This is a preparatory patch for moving setgid
stripping into the vfs. The patch contains no functional changes.

Currently the setgid stripping logic is open-coded directly in
inode_init_owner() and the individual filesystems are responsible for
handling setgid inheritance. Since this has proven to be brittle as
evidenced by old issues we uncovered over the last months (see [1] to
[3] below) we will try to move this logic into the vfs.

Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
---
 fs/inode.c         | 37 +++++++++++++++++++++++++++++++++----
 include/linux/fs.h |  2 ++
 2 files changed, 35 insertions(+), 4 deletions(-)

Comments

Jeff Layton April 26, 2022, 2:52 p.m. UTC | #1
On Tue, 2022-04-26 at 19:11 +0800, Yang Xu wrote:
> Add a dedicated helper to handle the setgid bit when creating a new file
> in a setgid directory. This is a preparatory patch for moving setgid
> stripping into the vfs. The patch contains no functional changes.
> 
> Currently the setgid stripping logic is open-coded directly in
> inode_init_owner() and the individual filesystems are responsible for
> handling setgid inheritance. Since this has proven to be brittle as
> evidenced by old issues we uncovered over the last months (see [1] to
> [3] below) we will try to move this logic into the vfs.
> 
> Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
> Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
> Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
> ---
>  fs/inode.c         | 37 +++++++++++++++++++++++++++++++++----
>  include/linux/fs.h |  2 ++
>  2 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 9d9b422504d1..e9a5f2ec2f89 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2246,10 +2246,8 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		/* Directories are special, and always inherit S_ISGID */
>  		if (S_ISDIR(mode))
>  			mode |= S_ISGID;
> -		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
> -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
> -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> -			mode &= ~S_ISGID;
> +		else
> +			mode = mode_strip_sgid(mnt_userns, dir, mode);
>  	} else
>  		inode_fsgid_set(inode, mnt_userns);
>  	inode->i_mode = mode;
> @@ -2405,3 +2403,34 @@ struct timespec64 current_time(struct inode *inode)
>  	return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> +
> +/**
> + * mode_strip_sgid - handle the sgid bit for non-directories
> + * @mnt_userns: User namespace of the mount the inode was created from
> + * @dir: parent directory inode
> + * @mode: mode of the file to be created in @dir
> + *
> + * If the @mode of the new file has both the S_ISGID and S_IXGRP bit
> + * raised and @dir has the S_ISGID bit raised ensure that the caller is
> + * either in the group of the parent directory or they have CAP_FSETID
> + * in their user namespace and are privileged over the parent directory.
> + * In all other cases, strip the S_ISGID bit from @mode.
> + *
> + * Return: the new mode to use for the file
> + */
> +umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
> +			 const struct inode *dir, umode_t mode)
> +{
> +	if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
> +		return mode;
> +	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return mode;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return mode;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return mode;
> +
> +	mode &= ~S_ISGID;
> +	return mode;
> +}
> +EXPORT_SYMBOL(mode_strip_sgid);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bbde95387a23..98b44a2732f5 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1897,6 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>  void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>  		      const struct inode *dir, umode_t mode);
>  extern bool may_open_dev(const struct path *path);
> +umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
> +			 const struct inode *dir, umode_t mode);
>  
>  /*
>   * This is the "filldir" function type, used by readdir() to let

This series looks like a nice cleanup. I went ahead and added this pile
to another kernel I was testing with xfstests and it seemed to do fine.

You can add this (or some variant of it) to all 4 patches.

Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Yang Xu (Fujitsu) April 27, 2022, 1:34 a.m. UTC | #2
于 2022/4/26 22:52, Jeff Layton 写道:
> On Tue, 2022-04-26 at 19:11 +0800, Yang Xu wrote:
>> Add a dedicated helper to handle the setgid bit when creating a new file
>> in a setgid directory. This is a preparatory patch for moving setgid
>> stripping into the vfs. The patch contains no functional changes.
>>
>> Currently the setgid stripping logic is open-coded directly in
>> inode_init_owner() and the individual filesystems are responsible for
>> handling setgid inheritance. Since this has proven to be brittle as
>> evidenced by old issues we uncovered over the last months (see [1] to
>> [3] below) we will try to move this logic into the vfs.
>>
>> Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
>> Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
>> Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
>> Reviewed-by: Darrick J. Wong<djwong@kernel.org>
>> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>>   fs/inode.c         | 37 +++++++++++++++++++++++++++++++++----
>>   include/linux/fs.h |  2 ++
>>   2 files changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 9d9b422504d1..e9a5f2ec2f89 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2246,10 +2246,8 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		/* Directories are special, and always inherit S_ISGID */
>>   		if (S_ISDIR(mode))
>>   			mode |= S_ISGID;
>> -		else if ((mode&  (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)&&
>> -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir))&&
>> -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>> -			mode&= ~S_ISGID;
>> +		else
>> +			mode = mode_strip_sgid(mnt_userns, dir, mode);
>>   	} else
>>   		inode_fsgid_set(inode, mnt_userns);
>>   	inode->i_mode = mode;
>> @@ -2405,3 +2403,34 @@ struct timespec64 current_time(struct inode *inode)
>>   	return timestamp_truncate(now, inode);
>>   }
>>   EXPORT_SYMBOL(current_time);
>> +
>> +/**
>> + * mode_strip_sgid - handle the sgid bit for non-directories
>> + * @mnt_userns: User namespace of the mount the inode was created from
>> + * @dir: parent directory inode
>> + * @mode: mode of the file to be created in @dir
>> + *
>> + * If the @mode of the new file has both the S_ISGID and S_IXGRP bit
>> + * raised and @dir has the S_ISGID bit raised ensure that the caller is
>> + * either in the group of the parent directory or they have CAP_FSETID
>> + * in their user namespace and are privileged over the parent directory.
>> + * In all other cases, strip the S_ISGID bit from @mode.
>> + *
>> + * Return: the new mode to use for the file
>> + */
>> +umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
>> +			 const struct inode *dir, umode_t mode)
>> +{
>> +	if (S_ISDIR(mode) || !dir || !(dir->i_mode&  S_ISGID))
>> +		return mode;
>> +	if ((mode&  (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
>> +		return mode;
>> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
>> +		return mode;
>> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
>> +		return mode;
>> +
>> +	mode&= ~S_ISGID;
>> +	return mode;
>> +}
>> +EXPORT_SYMBOL(mode_strip_sgid);
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index bbde95387a23..98b44a2732f5 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1897,6 +1897,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>>   void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		      const struct inode *dir, umode_t mode);
>>   extern bool may_open_dev(const struct path *path);
>> +umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
>> +			 const struct inode *dir, umode_t mode);
>>
>>   /*
>>    * This is the "filldir" function type, used by readdir() to let
>
> This series looks like a nice cleanup. I went ahead and added this pile
> to another kernel I was testing with xfstests and it seemed to do fine.
>
> You can add this (or some variant of it) to all 4 patches.
>
> Reviewed-and-Tested-by: Jeff Layton<jlayton@kernel.org>

Thanks, I also have a xfstests patch set[1] for testing this, but now it 
is not a final version.

[1]https://patchwork.kernel.org/project/fstests/list/?series=631461
Al Viro April 28, 2022, 1:59 a.m. UTC | #3
On Tue, Apr 26, 2022 at 07:11:27PM +0800, Yang Xu wrote:
> Add a dedicated helper to handle the setgid bit when creating a new file
> in a setgid directory. This is a preparatory patch for moving setgid
> stripping into the vfs. The patch contains no functional changes.
> 
> Currently the setgid stripping logic is open-coded directly in
> inode_init_owner() and the individual filesystems are responsible for
> handling setgid inheritance. Since this has proven to be brittle as
> evidenced by old issues we uncovered over the last months (see [1] to
> [3] below) we will try to move this logic into the vfs.

First of all, inode_init_owner() is (and always had been) an optional helper.
Filesystems are *NOT* required to call it, so putting any common functionality
in there had always been a mistake.

That goes for inode_fsuid_set() and inode_fsgid_set() calls as well.
Consider e.g. this:
struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
                             const struct qstr *qstr)
{
	...
        if (test_opt(sb, GRPID)) {
		inode->i_mode = mode;
		inode->i_uid = current_fsuid();
		inode->i_gid = dir->i_gid;
	} else
		inode_init_owner(&init_user_ns, inode, dir, mode);

Here we have an explicit mount option, selecting the way S_ISGID on directories
is handled.  Mount ext2 with -o grpid and see for yourself - no inode_init_owner()
calls there.

The same goes for ext4 - that code is copied there unchanged.

What's more, I'm not sure that Jann's fix made any sense in the first place.
After all, the file being created here is empty; exec on it won't give you
anything - it'll simply fail.  And modifying that file ought to strip SGID,
or we have much more interesting problems.

What am I missing here?
Al Viro April 28, 2022, 2:15 a.m. UTC | #4
On Thu, Apr 28, 2022 at 01:59:01AM +0000, Al Viro wrote:
> On Tue, Apr 26, 2022 at 07:11:27PM +0800, Yang Xu wrote:
> > Add a dedicated helper to handle the setgid bit when creating a new file
> > in a setgid directory. This is a preparatory patch for moving setgid
> > stripping into the vfs. The patch contains no functional changes.
> > 
> > Currently the setgid stripping logic is open-coded directly in
> > inode_init_owner() and the individual filesystems are responsible for
> > handling setgid inheritance. Since this has proven to be brittle as
> > evidenced by old issues we uncovered over the last months (see [1] to
> > [3] below) we will try to move this logic into the vfs.
> 
> First of all, inode_init_owner() is (and always had been) an optional helper.
> Filesystems are *NOT* required to call it, so putting any common functionality
> in there had always been a mistake.
> 
> That goes for inode_fsuid_set() and inode_fsgid_set() calls as well.
> Consider e.g. this:
> struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
>                              const struct qstr *qstr)
> {
> 	...
>         if (test_opt(sb, GRPID)) {
> 		inode->i_mode = mode;
> 		inode->i_uid = current_fsuid();
> 		inode->i_gid = dir->i_gid;
> 	} else
> 		inode_init_owner(&init_user_ns, inode, dir, mode);
> 
> Here we have an explicit mount option, selecting the way S_ISGID on directories
> is handled.  Mount ext2 with -o grpid and see for yourself - no inode_init_owner()
> calls there.
> 
> The same goes for ext4 - that code is copied there unchanged.
> 
> What's more, I'm not sure that Jann's fix made any sense in the first place.
> After all, the file being created here is empty; exec on it won't give you
> anything - it'll simply fail.  And modifying that file ought to strip SGID,
> or we have much more interesting problems.
> 
> What am I missing here?

BTW, xfs has grpid option as well:
	if (dir && !(dir->i_mode & S_ISGID) && xfs_has_grpid(mp)) {
		inode_fsuid_set(inode, mnt_userns);
		inode->i_gid = dir->i_gid;
		inode->i_mode = mode;
	} else {
		inode_init_owner(mnt_userns, inode, dir, mode);
	}

We could lift that stuff into VFS, but it would require lifting that flag
(BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
and ignores SGID on directories) into generic superblock.  Otherwise we'd
be breaking existing behaviour for ext* and xfs...
Yang Xu (Fujitsu) April 28, 2022, 2:23 a.m. UTC | #5
on 2022/4/28 10:15, Al Viro wrote:
> On Thu, Apr 28, 2022 at 01:59:01AM +0000, Al Viro wrote:
>> On Tue, Apr 26, 2022 at 07:11:27PM +0800, Yang Xu wrote:
>>> Add a dedicated helper to handle the setgid bit when creating a new file
>>> in a setgid directory. This is a preparatory patch for moving setgid
>>> stripping into the vfs. The patch contains no functional changes.
>>>
>>> Currently the setgid stripping logic is open-coded directly in
>>> inode_init_owner() and the individual filesystems are responsible for
>>> handling setgid inheritance. Since this has proven to be brittle as
>>> evidenced by old issues we uncovered over the last months (see [1] to
>>> [3] below) we will try to move this logic into the vfs.
>>
>> First of all, inode_init_owner() is (and always had been) an optional helper.
>> Filesystems are *NOT* required to call it, so putting any common functionality
>> in there had always been a mistake.
>>
>> That goes for inode_fsuid_set() and inode_fsgid_set() calls as well.
>> Consider e.g. this:
>> struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
>>                               const struct qstr *qstr)
>> {
>> 	...
>>          if (test_opt(sb, GRPID)) {
>> 		inode->i_mode = mode;
>> 		inode->i_uid = current_fsuid();
>> 		inode->i_gid = dir->i_gid;
>> 	} else
>> 		inode_init_owner(&init_user_ns, inode, dir, mode);
>>
>> Here we have an explicit mount option, selecting the way S_ISGID on directories
>> is handled.  Mount ext2 with -o grpid and see for yourself - no inode_init_owner()
>> calls there.
>>
>> The same goes for ext4 - that code is copied there unchanged.
>>
>> What's more, I'm not sure that Jann's fix made any sense in the first place.
>> After all, the file being created here is empty; exec on it won't give you
>> anything - it'll simply fail.  And modifying that file ought to strip SGID,
>> or we have much more interesting problems.
>>
>> What am I missing here?
>
> BTW, xfs has grpid option as well:
> 	if (dir&&  !(dir->i_mode&  S_ISGID)&&  xfs_has_grpid(mp)) {
> 		inode_fsuid_set(inode, mnt_userns);
> 		inode->i_gid = dir->i_gid;
> 		inode->i_mode = mode;
> 	} else {
> 		inode_init_owner(mnt_userns, inode, dir, mode);
> 	}
>
> We could lift that stuff into VFS, but it would require lifting that flag
> (BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
> and ignores SGID on directories) into generic superblock.  Otherwise we'd
> be breaking existing behaviour for ext* and xfs...

I also mentioned it in my previous version(in the 3/4 patch)
"This patch also changed grpid behaviour for ext4/xfs because the mode 
passed to them may been changed by vfs_prepare_mode.
"

I guess we can add a  grpid option check in vfs_prepare_mode or in 
mode_strip_sgid, then it should not break the existing behaviour for 
ext* and xfs.
Al Viro April 28, 2022, 2:49 a.m. UTC | #6
On Thu, Apr 28, 2022 at 02:23:25AM +0000, xuyang2018.jy@fujitsu.com wrote:

> > BTW, xfs has grpid option as well:
> > 	if (dir&&  !(dir->i_mode&  S_ISGID)&&  xfs_has_grpid(mp)) {
> > 		inode_fsuid_set(inode, mnt_userns);
> > 		inode->i_gid = dir->i_gid;
> > 		inode->i_mode = mode;
> > 	} else {
> > 		inode_init_owner(mnt_userns, inode, dir, mode);
> > 	}
> >
> > We could lift that stuff into VFS, but it would require lifting that flag
> > (BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
> > and ignores SGID on directories) into generic superblock.  Otherwise we'd
> > be breaking existing behaviour for ext* and xfs...
> 
> I also mentioned it in my previous version(in the 3/4 patch)
> "This patch also changed grpid behaviour for ext4/xfs because the mode 
> passed to them may been changed by vfs_prepare_mode.
> "
> 
> I guess we can add a  grpid option check in vfs_prepare_mode or in 
> mode_strip_sgid, then it should not break the existing behaviour for 
> ext* and xfs.

I don't like it, TBH.  That way we have
	1) caller mangles mode (after having looked at grpid, etc.)
	2) filesystem checks grpid and either calls inode_init_owner(),
which might (or might not) modify the gid to be used or skips the
call and assigns gid directly.

It's asking for trouble.  We have two places where the predicate is
checked; the first mangles mode (and I'm still not convinced that we
need to bother with that at all), the second affects gid (and for
mkdir in sgid directory on non-grpid ones inherits sgid).

That kind of structure is asking for trouble.  *IF* we make inode_init_owner()
aware of grpid (and make that predicate usable from generic helper), we might
as well just make inode_init_owner() mandatory (for the first time ever) and
get rid of grpid checks in filesystems themselves.  But then there's no point
doing it in method callers.

Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
ext2 doesn't bother with either in such case...

Let's try to separate the issues here.  Jann, could you explain what makes
empty sgid files dangerous?
Al Viro April 28, 2022, 3:12 a.m. UTC | #7
On Thu, Apr 28, 2022 at 02:49:11AM +0000, Al Viro wrote:
> On Thu, Apr 28, 2022 at 02:23:25AM +0000, xuyang2018.jy@fujitsu.com wrote:
> 
> > > BTW, xfs has grpid option as well:
> > > 	if (dir&&  !(dir->i_mode&  S_ISGID)&&  xfs_has_grpid(mp)) {
> > > 		inode_fsuid_set(inode, mnt_userns);
> > > 		inode->i_gid = dir->i_gid;
> > > 		inode->i_mode = mode;
> > > 	} else {
> > > 		inode_init_owner(mnt_userns, inode, dir, mode);
> > > 	}
> > >
> > > We could lift that stuff into VFS, but it would require lifting that flag
> > > (BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
> > > and ignores SGID on directories) into generic superblock.  Otherwise we'd
> > > be breaking existing behaviour for ext* and xfs...
> > 
> > I also mentioned it in my previous version(in the 3/4 patch)
> > "This patch also changed grpid behaviour for ext4/xfs because the mode 
> > passed to them may been changed by vfs_prepare_mode.
> > "
> > 
> > I guess we can add a  grpid option check in vfs_prepare_mode or in 
> > mode_strip_sgid, then it should not break the existing behaviour for 
> > ext* and xfs.
> 
> I don't like it, TBH.  That way we have
> 	1) caller mangles mode (after having looked at grpid, etc.)
> 	2) filesystem checks grpid and either calls inode_init_owner(),
> which might (or might not) modify the gid to be used or skips the
> call and assigns gid directly.
> 
> It's asking for trouble.  We have two places where the predicate is
> checked; the first mangles mode (and I'm still not convinced that we
> need to bother with that at all), the second affects gid (and for
> mkdir in sgid directory on non-grpid ones inherits sgid).
> 
> That kind of structure is asking for trouble.  *IF* we make inode_init_owner()
> aware of grpid (and make that predicate usable from generic helper), we might
> as well just make inode_init_owner() mandatory (for the first time ever) and
> get rid of grpid checks in filesystems themselves.  But then there's no point
> doing it in method callers.
> 
> Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> ext2 doesn't bother with either in such case...
> 
> Let's try to separate the issues here.  Jann, could you explain what makes
> empty sgid files dangerous?

Found the original thread in old mailbox, and the method of avoiding the
SGID removal on modification is usable.  Which answers the question above...
Al Viro April 28, 2022, 3:46 a.m. UTC | #8
On Thu, Apr 28, 2022 at 03:12:30AM +0000, Al Viro wrote:

> > Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> > path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> > ext2 doesn't bother with either in such case...
> > 
> > Let's try to separate the issues here.  Jann, could you explain what makes
> > empty sgid files dangerous?
> 
> Found the original thread in old mailbox, and the method of avoiding the
> SGID removal on modification is usable.  Which answers the question above...

OK, what do we want for grpid mounts?  Aside of "don't forget inode_fsuid_set(),
please", that is.  We don't want inode_fsgid_set() there (whatever went for
the parent directory should be the right value for the child).  Same "strip
SGID from non-directory child, unless we are in the resulting group"?
Al Viro April 28, 2022, 4:40 a.m. UTC | #9
On Tue, Apr 26, 2022 at 07:11:27PM +0800, Yang Xu wrote:


> +umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
> +			 const struct inode *dir, umode_t mode)
> +{
> +	if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
> +		return mode;
> +	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
> +		return mode;
> +	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
> +		return mode;
> +	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> +		return mode;
> +
> +	mode &= ~S_ISGID;
> +	return mode;
> +}

	I'm fairly sure that absolute majority of calls will not
have S_ISGID passed in mode.  So I'd prefer to reorder the first
two ifs.
Jann Horn April 28, 2022, 8:06 a.m. UTC | #10
On Thu, Apr 28, 2022 at 5:12 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Apr 28, 2022 at 02:49:11AM +0000, Al Viro wrote:
> > Let's try to separate the issues here.  Jann, could you explain what makes
> > empty sgid files dangerous?
>
> Found the original thread in old mailbox, and the method of avoiding the
> SGID removal on modification is usable.  Which answers the question above...

As context for everyone on the thread who isn't on security@, you can see a
public copy of the bug report here:
https://bugs.chromium.org/p/project-zero/issues/detail?id=1611
and also here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779923

And the kernel patch in question is this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7
Christian Brauner April 28, 2022, 8:25 a.m. UTC | #11
On Thu, Apr 28, 2022 at 01:59:01AM +0000, Al Viro wrote:
> On Tue, Apr 26, 2022 at 07:11:27PM +0800, Yang Xu wrote:
> > Add a dedicated helper to handle the setgid bit when creating a new file
> > in a setgid directory. This is a preparatory patch for moving setgid
> > stripping into the vfs. The patch contains no functional changes.
> > 
> > Currently the setgid stripping logic is open-coded directly in
> > inode_init_owner() and the individual filesystems are responsible for
> > handling setgid inheritance. Since this has proven to be brittle as
> > evidenced by old issues we uncovered over the last months (see [1] to
> > [3] below) we will try to move this logic into the vfs.
> 
> First of all, inode_init_owner() is (and always had been) an optional helper.

The whole patch series was triggered because ever since I added setgid
inheritance tests (see [1]) as part of the idmapped mounts test suite
into xfstests we found 3 setgid inheritance bugs (The bugs are linked in
the commit messages.).
The bugs showed up whenever a filesystem didn't call inode_init_owner()
or had custom code in place that deviated from expectations.

That's what triggered this whole patch series. Yang took it on and seems
here to see it through.

I should point out that it was rather unclear what expectations are btw
because of the ordering dependency between umask and POSIX ACLs and
setgid stripping. I've describe this at length in the commit message I
gave Yang.

It took a lot of digging and over the course of me reviewing this patch
series more and more corner-cases pop up that we haven't handled.

> Filesystems are *NOT* required to call it, so putting any common functionality
> in there had always been a mistake.

See above. I pointed this out in earlier version.
I very much agree which is why we should move it into the vfs proper if
we can with reasonably minimal regression risk.

[1]: https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c#n7812
Christian Brauner April 28, 2022, 8:44 a.m. UTC | #12
On Thu, Apr 28, 2022 at 02:49:11AM +0000, Al Viro wrote:
> On Thu, Apr 28, 2022 at 02:23:25AM +0000, xuyang2018.jy@fujitsu.com wrote:
> 
> > > BTW, xfs has grpid option as well:
> > > 	if (dir&&  !(dir->i_mode&  S_ISGID)&&  xfs_has_grpid(mp)) {
> > > 		inode_fsuid_set(inode, mnt_userns);
> > > 		inode->i_gid = dir->i_gid;
> > > 		inode->i_mode = mode;
> > > 	} else {
> > > 		inode_init_owner(mnt_userns, inode, dir, mode);
> > > 	}
> > >
> > > We could lift that stuff into VFS, but it would require lifting that flag
> > > (BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
> > > and ignores SGID on directories) into generic superblock.  Otherwise we'd
> > > be breaking existing behaviour for ext* and xfs...
> > 
> > I also mentioned it in my previous version(in the 3/4 patch)
> > "This patch also changed grpid behaviour for ext4/xfs because the mode 
> > passed to them may been changed by vfs_prepare_mode.
> > "
> > 
> > I guess we can add a  grpid option check in vfs_prepare_mode or in 
> > mode_strip_sgid, then it should not break the existing behaviour for 
> > ext* and xfs.
> 
> I don't like it, TBH.  That way we have
> 	1) caller mangles mode (after having looked at grpid, etc.)
> 	2) filesystem checks grpid and either calls inode_init_owner(),
> which might (or might not) modify the gid to be used or skips the
> call and assigns gid directly.
> 
> It's asking for trouble.  We have two places where the predicate is
> checked; the first mangles mode (and I'm still not convinced that we
> need to bother with that at all), the second affects gid (and for
> mkdir in sgid directory on non-grpid ones inherits sgid).
> 
> That kind of structure is asking for trouble.  *IF* we make inode_init_owner()
> aware of grpid (and make that predicate usable from generic helper), we might
> as well just make inode_init_owner() mandatory (for the first time ever) and
> get rid of grpid checks in filesystems themselves.  But then there's no point
> doing it in method callers.

This has ordering issues. In the vfs the umask is stripped and then we
call into the filesystem and inode_init_owner() is called. So if
S_IXGRP is removed by the umask then we inherit the S_ISGID bit.

But if the filesystem uses POSIX ACLs then the umask isn't stripped in
the vfs and instead may be done (I say "may" because it depends on
whether or not applicable POSIX ACLs are found) in posix_acl_create().

But in order to call posix_acl_create() the filesystem will have often
ran through inode_init_owner() first. For example, ext4 does
inode_init_owner() and afterwards calls ext4_init_acl() which in turn
ends up calling posix_acl_create() which _may_ strip the umask.

Iow, you end up with two possible setgid removal paths:
* strip setgid first, apply umask (no POSIX ACLs supported)
* apply umask, strip setgid (POSIX ACLs supported)
with possibly different results.

Mandating inode_init_owner() being used doesn't solve that and I think
it's still brittle overall.

If we can hoist all of this into vfs_*() before we call into the
filesystem we're better off in the long run. It's worth the imho
negible regression risk. 

> 
> Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> ext2 doesn't bother with either in such case...

Using inode_fs*id_set() is only relevant when the inode is initialized
based on the caller's fs*id. If you request to inherit the parent
directories gid then the caller's gid doesn't matter.

ext2 doesn't need to care at all about this because it doesn't raise
FS_ALLOW_IDMAP.

> 
> Let's try to separate the issues here.  Jann, could you explain what makes
> empty sgid files dangerous?

I see that's answered in a later mail.
Christian Brauner April 28, 2022, 9:34 a.m. UTC | #13
On Thu, Apr 28, 2022 at 03:46:59AM +0000, Al Viro wrote:
> On Thu, Apr 28, 2022 at 03:12:30AM +0000, Al Viro wrote:
> 
> > > Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> > > path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> > > ext2 doesn't bother with either in such case...
> > > 
> > > Let's try to separate the issues here.  Jann, could you explain what makes
> > > empty sgid files dangerous?
> > 
> > Found the original thread in old mailbox, and the method of avoiding the
> > SGID removal on modification is usable.  Which answers the question above...
> 
> OK, what do we want for grpid mounts?  Aside of "don't forget inode_fsuid_set(),
> please", that is.  We don't want inode_fsgid_set() there (whatever went for
> the parent directory should be the right value for the child).  Same "strip

Exactly. You sounded puzzled why we don't call that in an earlier mail.

> SGID from non-directory child, unless we are in the resulting group"?

Honestly, I think we should try and see if we can't use the same setgid
inheritance enforcement of the new mode_strip_sgid() helper for the
grpid mount option as well. Iow, just don't give the grpid mount option
a separate setgid treatment and try it with the current approach.

It'll allow us to move things into vfs proper which I think is a robust
solution with clear semantics. It also gives us a uniform ordering wrt
to umask stripping and POSIX ACLs.

Yes, as we've pointed out in the thread this carries a non-zero
regression risk. But so does the whole patch series. But this might end
up being a big win security wise and makes maintenance way easier going
forward.

The current setgid situation is thoroughly messy though and we keep
plugging holes. Even writing tests for the current situation is an
almost herculean task let alone reviewing it.
Christian Brauner April 28, 2022, 11:55 a.m. UTC | #14
On Thu, Apr 28, 2022 at 10:45:01AM +0200, Christian Brauner wrote:
> On Thu, Apr 28, 2022 at 02:49:11AM +0000, Al Viro wrote:
> > On Thu, Apr 28, 2022 at 02:23:25AM +0000, xuyang2018.jy@fujitsu.com wrote:
> > 
> > > > BTW, xfs has grpid option as well:
> > > > 	if (dir&&  !(dir->i_mode&  S_ISGID)&&  xfs_has_grpid(mp)) {
> > > > 		inode_fsuid_set(inode, mnt_userns);
> > > > 		inode->i_gid = dir->i_gid;
> > > > 		inode->i_mode = mode;
> > > > 	} else {
> > > > 		inode_init_owner(mnt_userns, inode, dir, mode);
> > > > 	}
> > > >
> > > > We could lift that stuff into VFS, but it would require lifting that flag
> > > > (BSD vs. SysV behaviour wrt GID - BSD *always* inherits GID from parent
> > > > and ignores SGID on directories) into generic superblock.  Otherwise we'd
> > > > be breaking existing behaviour for ext* and xfs...
> > > 
> > > I also mentioned it in my previous version(in the 3/4 patch)
> > > "This patch also changed grpid behaviour for ext4/xfs because the mode 
> > > passed to them may been changed by vfs_prepare_mode.
> > > "
> > > 
> > > I guess we can add a  grpid option check in vfs_prepare_mode or in 
> > > mode_strip_sgid, then it should not break the existing behaviour for 
> > > ext* and xfs.
> > 
> > I don't like it, TBH.  That way we have
> > 	1) caller mangles mode (after having looked at grpid, etc.)
> > 	2) filesystem checks grpid and either calls inode_init_owner(),
> > which might (or might not) modify the gid to be used or skips the
> > call and assigns gid directly.
> > 
> > It's asking for trouble.  We have two places where the predicate is
> > checked; the first mangles mode (and I'm still not convinced that we
> > need to bother with that at all), the second affects gid (and for
> > mkdir in sgid directory on non-grpid ones inherits sgid).
> > 
> > That kind of structure is asking for trouble.  *IF* we make inode_init_owner()
> > aware of grpid (and make that predicate usable from generic helper), we might
> > as well just make inode_init_owner() mandatory (for the first time ever) and
> > get rid of grpid checks in filesystems themselves.  But then there's no point
> > doing it in method callers.
> 
> This has ordering issues. In the vfs the umask is stripped and then we
> call into the filesystem and inode_init_owner() is called. So if
> S_IXGRP is removed by the umask then we inherit the S_ISGID bit.
> 
> But if the filesystem uses POSIX ACLs then the umask isn't stripped in
> the vfs and instead may be done (I say "may" because it depends on
> whether or not applicable POSIX ACLs are found) in posix_acl_create().
> 
> But in order to call posix_acl_create() the filesystem will have often
> ran through inode_init_owner() first. For example, ext4 does
> inode_init_owner() and afterwards calls ext4_init_acl() which in turn
> ends up calling posix_acl_create() which _may_ strip the umask.
> 
> Iow, you end up with two possible setgid removal paths:
> * strip setgid first, apply umask (no POSIX ACLs supported)
> * apply umask, strip setgid (POSIX ACLs supported)

Ugh, that's reversed:
* POSIX ACLs supported:
  1. strip setgid first
  2. (possibly) strip umask
* POSIX ACLS unsupported:
  1. apply umask
  2. strip setgid

> with possibly different results.
> 
> Mandating inode_init_owner() being used doesn't solve that and I think
> it's still brittle overall.
> 
> If we can hoist all of this into vfs_*() before we call into the
> filesystem we're better off in the long run. It's worth the imho
> negible regression risk. 
> 
> > 
> > Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> > path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> > ext2 doesn't bother with either in such case...
> 
> Using inode_fs*id_set() is only relevant when the inode is initialized
> based on the caller's fs*id. If you request to inherit the parent
> directories gid then the caller's gid doesn't matter.
> 
> ext2 doesn't need to care at all about this because it doesn't raise
> FS_ALLOW_IDMAP.
> 
> > 
> > Let's try to separate the issues here.  Jann, could you explain what makes
> > empty sgid files dangerous?
> 
> I see that's answered in a later mail.
Yang Xu (Fujitsu) May 19, 2022, 1:03 a.m. UTC | #15
on 2022/4/28 17:34, Christian Brauner wrote:
> On Thu, Apr 28, 2022 at 03:46:59AM +0000, Al Viro wrote:
>> On Thu, Apr 28, 2022 at 03:12:30AM +0000, Al Viro wrote:
>>
>>>> Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
>>>> path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
>>>> ext2 doesn't bother with either in such case...
>>>>
>>>> Let's try to separate the issues here.  Jann, could you explain what makes
>>>> empty sgid files dangerous?
>>>
>>> Found the original thread in old mailbox, and the method of avoiding the
>>> SGID removal on modification is usable.  Which answers the question above...
>>
>> OK, what do we want for grpid mounts?  Aside of "don't forget inode_fsuid_set(),
>> please", that is.  We don't want inode_fsgid_set() there (whatever went for
>> the parent directory should be the right value for the child).  Same "strip
>
> Exactly. You sounded puzzled why we don't call that in an earlier mail.
>
>> SGID from non-directory child, unless we are in the resulting group"?
>
> Honestly, I think we should try and see if we can't use the same setgid
> inheritance enforcement of the new mode_strip_sgid() helper for the
> grpid mount option as well. Iow, just don't give the grpid mount option
> a separate setgid treatment and try it with the current approach.
>
> It'll allow us to move things into vfs proper which I think is a robust
> solution with clear semantics. It also gives us a uniform ordering wrt
> to umask stripping and POSIX ACLs.
>
> Yes, as we've pointed out in the thread this carries a non-zero
> regression risk. But so does the whole patch series. But this might end
> up being a big win security wise and makes maintenance way easier going
> forward.
>
> The current setgid situation is thoroughly messy though and we keep
> plugging holes. Even writing tests for the current situation is an
> almost herculean task let alone reviewing it.

Sorry for the late reply.
I am agree with these. So what should I do in next step?

ps:I also think I may send test case to xfstests for posix acl and umask 
ASAP, then xfstests can merge these test and so more people will notice 
this problem.
Christian Brauner May 19, 2022, 9:14 a.m. UTC | #16
On Thu, May 19, 2022 at 01:03:12AM +0000, xuyang2018.jy@fujitsu.com wrote:
> on 2022/4/28 17:34, Christian Brauner wrote:
> > On Thu, Apr 28, 2022 at 03:46:59AM +0000, Al Viro wrote:
> >> On Thu, Apr 28, 2022 at 03:12:30AM +0000, Al Viro wrote:
> >>
> >>>> Note, BTW, that while XFS has inode_fsuid_set() on the non-inode_init_owner()
> >>>> path, it doesn't have inode_fsgid_set() there.  Same goes for ext4, while
> >>>> ext2 doesn't bother with either in such case...
> >>>>
> >>>> Let's try to separate the issues here.  Jann, could you explain what makes
> >>>> empty sgid files dangerous?
> >>>
> >>> Found the original thread in old mailbox, and the method of avoiding the
> >>> SGID removal on modification is usable.  Which answers the question above...
> >>
> >> OK, what do we want for grpid mounts?  Aside of "don't forget inode_fsuid_set(),
> >> please", that is.  We don't want inode_fsgid_set() there (whatever went for
> >> the parent directory should be the right value for the child).  Same "strip
> >
> > Exactly. You sounded puzzled why we don't call that in an earlier mail.
> >
> >> SGID from non-directory child, unless we are in the resulting group"?
> >
> > Honestly, I think we should try and see if we can't use the same setgid
> > inheritance enforcement of the new mode_strip_sgid() helper for the
> > grpid mount option as well. Iow, just don't give the grpid mount option
> > a separate setgid treatment and try it with the current approach.
> >
> > It'll allow us to move things into vfs proper which I think is a robust
> > solution with clear semantics. It also gives us a uniform ordering wrt
> > to umask stripping and POSIX ACLs.
> >
> > Yes, as we've pointed out in the thread this carries a non-zero
> > regression risk. But so does the whole patch series. But this might end
> > up being a big win security wise and makes maintenance way easier going
> > forward.
> >
> > The current setgid situation is thoroughly messy though and we keep
> > plugging holes. Even writing tests for the current situation is an
> > almost herculean task let alone reviewing it.
> 
> Sorry for the late reply.
> I am agree with these. So what should I do in next step?

I'd say we try to go forward with the original approach and not
special-casing the grpid option.

> 
> ps:I also think I may send test case to xfstests for posix acl and umask 
> ASAP, then xfstests can merge these test and so more people will notice 
> this problem.

The big refactoring of the idmapped mounts testsuite into a generic
vfstest refactoring is sitting in for-next of xfstests so make sure to
base your patches on that because you really won't enjoy having to
rebase them...
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index 9d9b422504d1..e9a5f2ec2f89 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2246,10 +2246,8 @@  void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		/* Directories are special, and always inherit S_ISGID */
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
-		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
-			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
-			mode &= ~S_ISGID;
+		else
+			mode = mode_strip_sgid(mnt_userns, dir, mode);
 	} else
 		inode_fsgid_set(inode, mnt_userns);
 	inode->i_mode = mode;
@@ -2405,3 +2403,34 @@  struct timespec64 current_time(struct inode *inode)
 	return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
+
+/**
+ * mode_strip_sgid - handle the sgid bit for non-directories
+ * @mnt_userns: User namespace of the mount the inode was created from
+ * @dir: parent directory inode
+ * @mode: mode of the file to be created in @dir
+ *
+ * If the @mode of the new file has both the S_ISGID and S_IXGRP bit
+ * raised and @dir has the S_ISGID bit raised ensure that the caller is
+ * either in the group of the parent directory or they have CAP_FSETID
+ * in their user namespace and are privileged over the parent directory.
+ * In all other cases, strip the S_ISGID bit from @mode.
+ *
+ * Return: the new mode to use for the file
+ */
+umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
+			 const struct inode *dir, umode_t mode)
+{
+	if (S_ISDIR(mode) || !dir || !(dir->i_mode & S_ISGID))
+		return mode;
+	if ((mode & (S_ISGID | S_IXGRP)) != (S_ISGID | S_IXGRP))
+		return mode;
+	if (in_group_p(i_gid_into_mnt(mnt_userns, dir)))
+		return mode;
+	if (capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
+		return mode;
+
+	mode &= ~S_ISGID;
+	return mode;
+}
+EXPORT_SYMBOL(mode_strip_sgid);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..98b44a2732f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1897,6 +1897,8 @@  extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
 void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 		      const struct inode *dir, umode_t mode);
 extern bool may_open_dev(const struct path *path);
+umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
+			 const struct inode *dir, umode_t mode);
 
 /*
  * This is the "filldir" function type, used by readdir() to let