mbox series

[GIT,PULL] setgid inheritance for v5.20/v6.0

Message ID 20220809103957.1851931-1-brauner@kernel.org (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] setgid inheritance for v5.20/v6.0 | expand

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/fs.setgid.v6.0

Message

Christian Brauner Aug. 9, 2022, 10:39 a.m. UTC
Hey Linus,

/* Summary */
This contains the work to move setgid stripping out of individual filesystems
and into the VFS itself. (I think you might've been Cced on this briefly a few
weeks ago.)

Creating files that have both the S_IXGRP and S_ISGID bit raised in directories
that themselves have the S_ISGID bit set requires additional privileges to
avoid security issues.

When a filesystem creates a new inode it needs to take care that the caller is
either in the group of the newly created inode or they have CAP_FSETID in their
current user namespace and are privileged over the parent directory of the new
inode. If any of these two conditions is true then the S_ISGID bit can be
raised for an S_IXGRP file and if not it needs to be stripped.

However, there are several key issues with the current implementation:

* S_ISGID stripping logic is entangled with umask stripping.

  For example, if the umask removes the S_IXGRP bit from the file about to be
  created then the S_ISGID bit will be kept.

  The inode_init_owner() helper is responsible for S_ISGID stripping and is
  called before posix_acl_create(). So we can end up with two different
  orderings:

  1. FS without POSIX ACL support
     First strip umask then strip S_ISGID in inode_init_owner().

     In other words, if a filesystem doesn't support or enable POSIX ACLs then
     umask stripping is done directly in the vfs before calling into the
     filesystem:

  2. FS with POSIX ACL support
     First strip S_ISGID in inode_init_owner() then strip umask in
     posix_acl_create().

     In other words, if the filesystem does support POSIX ACLs then unmask
     stripping may be done in the filesystem itself when calling
     posix_acl_create().

  Note that technically filesystems are free to impose their own ordering
  between posix_acl_create() and inode_init_owner() meaning that there's
  additional ordering issues that influence S_ISGID inheritance.

  (Note that the commit message of commit 1639a49ccdce ("fs: move S_ISGID
   stripping into the vfs_*() helpers") gets the ordering between
   inode_init_owner() and posix_acl_create() the wrong way around. I realized
   this too late.)

* Filesystems that don't rely on inode_init_owner() don't get S_ISGID
  stripping logic.

  While that may be intentional (e.g. network filesystems might just
  defer setgid stripping to a server) it is often just a security issue.

  Note that mandating the use of inode_init_owner() was proposed as an
  alternative solution but that wouldn't fix the ordering issues and there are
  examples such as afs where the use of inode_init_owner() isn't possible. In
  any case, we should also try the cleaner and generalized solution first
  before resorting to this approach.

* We still have S_ISGID inheritance bugs years after the initial round of
  S_ISGID inheritance fixes:
  e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes")
  01ea173e103e ("xfs: fix up non-directory creation in SGID directories")
  fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories")

All of us led us to conclude that the current state is too messy. While we
won't be able to make it completely clean as posix_acl_create() is still a
filesystem specific call we can improve the S_SIGD stripping situation quite a
bit by hoisting it out of inode_init_owner() and into the respective vfs
creation operations.

The obvious advantage is that we don't need to rely on individual filesystems
getting S_ISGID stripping right and instead can standardize the ordering
between S_ISGID and umask stripping directly in the VFS.

A few short implementation notes:
* The stripping logic needs to happen in vfs_*() helpers for the sake of
  stacking filesystems such as overlayfs that rely on these helpers taking care
  of S_ISGID stripping.
* Security hooks have never seen the mode as it is ultimately seen by the
  filesystem because of the ordering issue we mentioned. Nothing is changed for
  them. We simply continue to strip the umask before passing the mode down to
  the security hooks.
* The following filesystems use inode_init_owner() and thus relied on
   S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
   hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
   reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. We've audited all
   callchains as best as we could. More details can be found in the commit
   message to 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers").

Finally a note on __regression potential__. I want to be very clear and open
that this carries a non-zero regression risk which is also why I defered the
pull request for this until this week because I was out to get married last
week and wouldn't have been around to deal with potential fallout:

  The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount option
  is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used, and on xfs
  when the XFS_FEAT_GRPID mount option is used.

  When any of these filesystems are mounted with their respective GRPID option
  then newly created files inherit the parent directory's group
  unconditionally. In these cases none of the filesystems call
  inode_init_owner() and thus they did never strip the S_ISGID bit for newly
  created files.

  Moving this logic into the VFS means that they now get the S_ISGID bit
  stripped. This is a potential user visible change. If this leads to
  regressions we will either need to figure out a better way or we need to
  revert. However, given the various setgid bugs that we found just in the last
  two years this is a regression risk we should take.

Associated with this change is a new set of fstests and LTP tests that are
currently reviewed and waiting for this series to be upstreamed.

/* Testing */
All patches are based on v5.19-rc7 and have been sitting in linux-next. No
build failures or warnings were observed and fstests and selftests have seen no
regressions. The series has existed for at least two kernel releases and
multiple eyes on it.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with current
mainline.

The following changes since commit ff6992735ade75aae3e35d16b17da1008d753d28:

  Linux 5.19-rc7 (2022-07-17 13:30:22 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/fs.setgid.v6.0

for you to fetch changes up to 5fadbd992996e9dda7ebcb62f5352866057bd619:

  ceph: rely on vfs for setgid stripping (2022-07-21 11:34:16 +0200)

Please consider pulling these changes from the signed fs.setgid.v6.0 tag.

Thanks!
Christian

----------------------------------------------------------------
fs.setgid.v6.0

----------------------------------------------------------------
Yang Xu (4):
      fs: add mode_strip_sgid() helper
      fs: Add missing umask strip in vfs_tmpfile
      fs: move S_ISGID stripping into the vfs_*() helpers
      ceph: rely on vfs for setgid stripping

 fs/ceph/file.c     |  4 ---
 fs/inode.c         | 34 ++++++++++++++++++++---
 fs/namei.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++--------
 fs/ocfs2/namei.c   |  1 +
 include/linux/fs.h |  2 ++
 5 files changed, 102 insertions(+), 19 deletions(-)

Comments

Linus Torvalds Aug. 9, 2022, 4:58 p.m. UTC | #1
On Tue, Aug 9, 2022 at 3:40 AM Christian Brauner <brauner@kernel.org> wrote:
>
> Finally a note on __regression potential__. I want to be very clear and open
> that this carries a non-zero regression risk which is also why I defered the
> pull request for this until this week because I was out to get married last
> week and wouldn't have been around to deal with potential fallout:

.. excuses, excuses.

Congratulations.

              Linus
Christian Brauner Aug. 9, 2022, 5:02 p.m. UTC | #2
On Tue, Aug 09, 2022 at 09:58:56AM -0700, Linus Torvalds wrote:
> On Tue, Aug 9, 2022 at 3:40 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > Finally a note on __regression potential__. I want to be very clear and open
> > that this carries a non-zero regression risk which is also why I defered the
> > pull request for this until this week because I was out to get married last
> > week and wouldn't have been around to deal with potential fallout:
> 
> .. excuses, excuses.

I had to choose whether I'll be physically slapped for working or
virtually slapped for sending in the PR late. This time I took the
virtual beating. ;)

> 
> Congratulations.

Thank you!
Christian
pr-tracker-bot@kernel.org Aug. 9, 2022, 5:02 p.m. UTC | #3
The pull request you sent on Tue,  9 Aug 2022 12:39:57 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/fs.setgid.v6.0

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/426b4ca2d6a5ab51f6b6175d06e4f8ddea434cdf

Thank you!