mbox series

[v19,0/4] overlayfs override_creds=off & nested get xattr fix

Message ID 20211117015806.2192263-1-dvander@google.com (mailing list archive)
Headers show
Series overlayfs override_creds=off & nested get xattr fix | expand

Message

David Anderson Nov. 17, 2021, 1:58 a.m. UTC
Mark Salyzyn (3):
  Add flags option to get xattr method paired to __vfs_getxattr
  overlayfs: handle XATTR_NOSECURITY flag for get xattr method
  overlayfs: override_creds=off option bypass creator_cred

Mark Salyzyn + John Stultz (1):
  overlayfs: inode_owner_or_capable called during execv

The first three patches address fundamental security issues that should
be solved regardless of the override_creds=off feature.

The fourth adds the feature depends on these other fixes.

By default, all access to the upper, lower and work directories is the
recorded mounter's MAC and DAC credentials.  The incoming accesses are
checked against the caller's credentials.

If the principles of least privilege are applied for sepolicy, the
mounter's credentials might not overlap the credentials of the caller's
when accessing the overlayfs filesystem.  For example, a file that a
lower DAC privileged caller can execute, is MAC denied to the
generally higher DAC privileged mounter, to prevent an attack vector.

We add the option to turn off override_creds in the mount options; all
subsequent operations after mount on the filesystem will be only the
caller's credentials.  The module boolean parameter and mount option
override_creds is also added as a presence check for this "feature",
existence of /sys/module/overlay/parameters/overlay_creds

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: David Anderson <dvander@google.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: selinux@vger.kernel.org
Cc: paulmoore@microsoft.com
Cc: Luca.Boccassi@microsoft.com

---

v19
- rebase.

v18
- rebase + fix minor cut and paste error for inode argument in __vfs_getxattr

v17
- correct some zero-day build failures.
- fix up documentation

v16
- rebase and merge of two patches.
- add adjustment to deal with execv when overrides is off.

v15
- Revert back to v4 with fixes from on the way from v5-v14. The single
  structure argument passing to address the complaints about too many
  arguments was rejected by the community.
- Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
  check. Can address that independently.
- ToDo: upstream test frame for thes security fixes (currently testing
  is all in Android).

v14:
- Rejoin, rebase and a few adjustments.

v13:
- Pull out first patch and try to get it in alone feedback, some
  Acks, and then <crickets> because people forgot why we were doing i.

v12:
- Restore squished out patch 2 and 3 in the series,
  then change algorithm to add flags argument.
  Per-thread flag is a large security surface.

v11:
- Squish out v10 introduced patch 2 and 3 in the series,
  then and use per-thread flag instead for nesting.
- Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
- Add sb argument to ovl_revert_creds to match future work.

v10:
- Return NULL on CAP_DAC_READ_SEARCH
- Add __get xattr method to solve sepolicy logging issue
- Drop unnecessary sys_admin sepolicy checking for administrative
  driver internal xattr functions.

v6:
- Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
- Do better with the documentation, drop rationalizations.
- pr_warn message adjusted to report consequences.

v5:
- beefed up the caveats in the Documentation
- Is dependent on
  "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
  "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
- Added prwarn when override_creds=off

v4:
- spelling and grammar errors in text

v3:
- Change name from caller_credentials / creator_credentials to the
  boolean override_creds.
- Changed from creator to mounter credentials.
- Updated and fortified the documentation.
- Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS

v2:
- Forward port changed attr to stat, resulting in a build error.
- altered commit message.

David Anderson (4):
  Add flags option to get xattr method paired to __vfs_getxattr
  overlayfs: handle XATTR_NOSECURITY flag for get xattr method
  overlayfs: override_creds=off option bypass creator_cred
  overlayfs: inode_owner_or_capable called during execv

 Documentation/filesystems/locking.rst   |  2 +-
 Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
 fs/9p/acl.c                             |  3 +-
 fs/9p/xattr.c                           |  3 +-
 fs/afs/xattr.c                          | 10 +++---
 fs/attr.c                               |  2 +-
 fs/btrfs/xattr.c                        |  3 +-
 fs/ceph/xattr.c                         |  3 +-
 fs/cifs/xattr.c                         |  2 +-
 fs/ecryptfs/inode.c                     |  6 ++--
 fs/ecryptfs/mmap.c                      |  5 +--
 fs/erofs/xattr.c                        |  3 +-
 fs/ext2/xattr_security.c                |  2 +-
 fs/ext2/xattr_trusted.c                 |  2 +-
 fs/ext2/xattr_user.c                    |  2 +-
 fs/ext4/xattr_hurd.c                    |  2 +-
 fs/ext4/xattr_security.c                |  2 +-
 fs/ext4/xattr_trusted.c                 |  2 +-
 fs/ext4/xattr_user.c                    |  2 +-
 fs/f2fs/xattr.c                         |  4 +--
 fs/fuse/xattr.c                         |  4 +--
 fs/gfs2/xattr.c                         |  3 +-
 fs/hfs/attr.c                           |  2 +-
 fs/hfsplus/xattr.c                      |  3 +-
 fs/hfsplus/xattr_security.c             |  3 +-
 fs/hfsplus/xattr_trusted.c              |  3 +-
 fs/hfsplus/xattr_user.c                 |  3 +-
 fs/inode.c                              |  7 +++--
 fs/internal.h                           |  3 +-
 fs/jffs2/security.c                     |  3 +-
 fs/jffs2/xattr_trusted.c                |  3 +-
 fs/jffs2/xattr_user.c                   |  3 +-
 fs/jfs/xattr.c                          |  5 +--
 fs/kernfs/inode.c                       |  3 +-
 fs/nfs/nfs4proc.c                       |  9 ++++--
 fs/ntfs3/xattr.c                        |  2 +-
 fs/ocfs2/xattr.c                        |  9 ++++--
 fs/open.c                               |  2 +-
 fs/orangefs/xattr.c                     |  3 +-
 fs/overlayfs/copy_up.c                  |  2 +-
 fs/overlayfs/dir.c                      | 17 +++++-----
 fs/overlayfs/file.c                     | 25 ++++++++-------
 fs/overlayfs/inode.c                    | 29 ++++++++---------
 fs/overlayfs/namei.c                    |  6 ++--
 fs/overlayfs/overlayfs.h                |  7 +++--
 fs/overlayfs/ovl_entry.h                |  1 +
 fs/overlayfs/readdir.c                  |  8 ++---
 fs/overlayfs/super.c                    | 34 ++++++++++++++++----
 fs/overlayfs/util.c                     | 13 ++++++--
 fs/posix_acl.c                          |  2 +-
 fs/reiserfs/xattr_security.c            |  3 +-
 fs/reiserfs/xattr_trusted.c             |  3 +-
 fs/reiserfs/xattr_user.c                |  3 +-
 fs/squashfs/xattr.c                     |  2 +-
 fs/ubifs/xattr.c                        |  3 +-
 fs/xattr.c                              | 42 +++++++++++++------------
 fs/xfs/xfs_xattr.c                      |  3 +-
 include/linux/lsm_hook_defs.h           |  3 +-
 include/linux/security.h                |  6 ++--
 include/linux/xattr.h                   |  6 ++--
 include/uapi/linux/xattr.h              |  7 +++--
 mm/shmem.c                              |  3 +-
 net/socket.c                            |  3 +-
 security/commoncap.c                    | 11 ++++---
 security/integrity/evm/evm_main.c       | 13 +++++---
 security/security.c                     |  5 +--
 security/selinux/hooks.c                | 19 ++++++-----
 security/smack/smack_lsm.c              | 18 ++++++-----
 68 files changed, 289 insertions(+), 167 deletions(-)

Comments

Casey Schaufler Nov. 17, 2021, 2:18 a.m. UTC | #1
On 11/16/2021 5:58 PM, David Anderson wrote:
> Mark Salyzyn (3):
>    Add flags option to get xattr method paired to __vfs_getxattr
>    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>    overlayfs: override_creds=off option bypass creator_cred
>
> Mark Salyzyn + John Stultz (1):
>    overlayfs: inode_owner_or_capable called during execv
>
> The first three patches address fundamental security issues that should
> be solved regardless of the override_creds=off feature.
>
> The fourth adds the feature depends on these other fixes.
>
> By default, all access to the upper, lower and work directories is the
> recorded mounter's MAC and DAC credentials.  The incoming accesses are
> checked against the caller's credentials.

This isn't very clear. Are you saying that the security attributes
of the upper, lower, and work directories are determined by the
attributes of the process that mounted the filesystem? What is an
"incoming access"? I'm sure that means something if you're steeped
in the lore of overlayfs, but it isn't obvious to me.

> If the principles of least privilege are applied for sepolicy, the
> mounter's credentials might not overlap the credentials of the caller's
> when accessing the overlayfs filesystem.

I'm sorry, but I've tried pretty hard, and can't puzzle that one out.

>    For example, a file that a
> lower DAC privileged caller can execute, is MAC denied to the
> generally higher DAC privileged mounter, to prevent an attack vector.

DAC privileges are not hierarchical. This doesn't make any sense.

> We add the option to turn off override_creds in the mount options; all
> subsequent operations after mount on the filesystem will be only the
> caller's credentials.

I think I might have figured that one out, but in order to do so
I have to make way too many assumptions about the earlier paragraph.
Could you please try to explain what you're doing with more context?

>    The module boolean parameter and mount option
> override_creds is also added as a presence check for this "feature",
> existence of /sys/module/overlay/parameters/overlay_creds
>
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> Signed-off-by: David Anderson <dvander@google.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-team@android.com
> Cc: selinux@vger.kernel.org
> Cc: paulmoore@microsoft.com
> Cc: Luca.Boccassi@microsoft.com
>
> ---
>
> v19
> - rebase.
>
> v18
> - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr
>
> v17
> - correct some zero-day build failures.
> - fix up documentation
>
> v16
> - rebase and merge of two patches.
> - add adjustment to deal with execv when overrides is off.
>
> v15
> - Revert back to v4 with fixes from on the way from v5-v14. The single
>    structure argument passing to address the complaints about too many
>    arguments was rejected by the community.
> - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
>    check. Can address that independently.
> - ToDo: upstream test frame for thes security fixes (currently testing
>    is all in Android).
>
> v14:
> - Rejoin, rebase and a few adjustments.
>
> v13:
> - Pull out first patch and try to get it in alone feedback, some
>    Acks, and then <crickets> because people forgot why we were doing i.
>
> v12:
> - Restore squished out patch 2 and 3 in the series,
>    then change algorithm to add flags argument.
>    Per-thread flag is a large security surface.
>
> v11:
> - Squish out v10 introduced patch 2 and 3 in the series,
>    then and use per-thread flag instead for nesting.
> - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
> - Add sb argument to ovl_revert_creds to match future work.
>
> v10:
> - Return NULL on CAP_DAC_READ_SEARCH
> - Add __get xattr method to solve sepolicy logging issue
> - Drop unnecessary sys_admin sepolicy checking for administrative
>    driver internal xattr functions.
>
> v6:
> - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
> - Do better with the documentation, drop rationalizations.
> - pr_warn message adjusted to report consequences.
>
> v5:
> - beefed up the caveats in the Documentation
> - Is dependent on
>    "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
>    "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
> - Added prwarn when override_creds=off
>
> v4:
> - spelling and grammar errors in text
>
> v3:
> - Change name from caller_credentials / creator_credentials to the
>    boolean override_creds.
> - Changed from creator to mounter credentials.
> - Updated and fortified the documentation.
> - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS
>
> v2:
> - Forward port changed attr to stat, resulting in a build error.
> - altered commit message.
>
> David Anderson (4):
>    Add flags option to get xattr method paired to __vfs_getxattr
>    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>    overlayfs: override_creds=off option bypass creator_cred
>    overlayfs: inode_owner_or_capable called during execv
>
>   Documentation/filesystems/locking.rst   |  2 +-
>   Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
>   fs/9p/acl.c                             |  3 +-
>   fs/9p/xattr.c                           |  3 +-
>   fs/afs/xattr.c                          | 10 +++---
>   fs/attr.c                               |  2 +-
>   fs/btrfs/xattr.c                        |  3 +-
>   fs/ceph/xattr.c                         |  3 +-
>   fs/cifs/xattr.c                         |  2 +-
>   fs/ecryptfs/inode.c                     |  6 ++--
>   fs/ecryptfs/mmap.c                      |  5 +--
>   fs/erofs/xattr.c                        |  3 +-
>   fs/ext2/xattr_security.c                |  2 +-
>   fs/ext2/xattr_trusted.c                 |  2 +-
>   fs/ext2/xattr_user.c                    |  2 +-
>   fs/ext4/xattr_hurd.c                    |  2 +-
>   fs/ext4/xattr_security.c                |  2 +-
>   fs/ext4/xattr_trusted.c                 |  2 +-
>   fs/ext4/xattr_user.c                    |  2 +-
>   fs/f2fs/xattr.c                         |  4 +--
>   fs/fuse/xattr.c                         |  4 +--
>   fs/gfs2/xattr.c                         |  3 +-
>   fs/hfs/attr.c                           |  2 +-
>   fs/hfsplus/xattr.c                      |  3 +-
>   fs/hfsplus/xattr_security.c             |  3 +-
>   fs/hfsplus/xattr_trusted.c              |  3 +-
>   fs/hfsplus/xattr_user.c                 |  3 +-
>   fs/inode.c                              |  7 +++--
>   fs/internal.h                           |  3 +-
>   fs/jffs2/security.c                     |  3 +-
>   fs/jffs2/xattr_trusted.c                |  3 +-
>   fs/jffs2/xattr_user.c                   |  3 +-
>   fs/jfs/xattr.c                          |  5 +--
>   fs/kernfs/inode.c                       |  3 +-
>   fs/nfs/nfs4proc.c                       |  9 ++++--
>   fs/ntfs3/xattr.c                        |  2 +-
>   fs/ocfs2/xattr.c                        |  9 ++++--
>   fs/open.c                               |  2 +-
>   fs/orangefs/xattr.c                     |  3 +-
>   fs/overlayfs/copy_up.c                  |  2 +-
>   fs/overlayfs/dir.c                      | 17 +++++-----
>   fs/overlayfs/file.c                     | 25 ++++++++-------
>   fs/overlayfs/inode.c                    | 29 ++++++++---------
>   fs/overlayfs/namei.c                    |  6 ++--
>   fs/overlayfs/overlayfs.h                |  7 +++--
>   fs/overlayfs/ovl_entry.h                |  1 +
>   fs/overlayfs/readdir.c                  |  8 ++---
>   fs/overlayfs/super.c                    | 34 ++++++++++++++++----
>   fs/overlayfs/util.c                     | 13 ++++++--
>   fs/posix_acl.c                          |  2 +-
>   fs/reiserfs/xattr_security.c            |  3 +-
>   fs/reiserfs/xattr_trusted.c             |  3 +-
>   fs/reiserfs/xattr_user.c                |  3 +-
>   fs/squashfs/xattr.c                     |  2 +-
>   fs/ubifs/xattr.c                        |  3 +-
>   fs/xattr.c                              | 42 +++++++++++++------------
>   fs/xfs/xfs_xattr.c                      |  3 +-
>   include/linux/lsm_hook_defs.h           |  3 +-
>   include/linux/security.h                |  6 ++--
>   include/linux/xattr.h                   |  6 ++--
>   include/uapi/linux/xattr.h              |  7 +++--
>   mm/shmem.c                              |  3 +-
>   net/socket.c                            |  3 +-
>   security/commoncap.c                    | 11 ++++---
>   security/integrity/evm/evm_main.c       | 13 +++++---
>   security/security.c                     |  5 +--
>   security/selinux/hooks.c                | 19 ++++++-----
>   security/smack/smack_lsm.c              | 18 ++++++-----
>   68 files changed, 289 insertions(+), 167 deletions(-)
>
Amir Goldstein Nov. 17, 2021, 7:36 a.m. UTC | #2
On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
>
> Mark Salyzyn (3):
>   Add flags option to get xattr method paired to __vfs_getxattr
>   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>   overlayfs: override_creds=off option bypass creator_cred
>
> Mark Salyzyn + John Stultz (1):
>   overlayfs: inode_owner_or_capable called during execv
>
> The first three patches address fundamental security issues that should
> be solved regardless of the override_creds=off feature.
>
> The fourth adds the feature depends on these other fixes.
>
> By default, all access to the upper, lower and work directories is the
> recorded mounter's MAC and DAC credentials.  The incoming accesses are
> checked against the caller's credentials.
>
> If the principles of least privilege are applied for sepolicy, the
> mounter's credentials might not overlap the credentials of the caller's
> when accessing the overlayfs filesystem.  For example, a file that a
> lower DAC privileged caller can execute, is MAC denied to the
> generally higher DAC privileged mounter, to prevent an attack vector.
>
> We add the option to turn off override_creds in the mount options; all
> subsequent operations after mount on the filesystem will be only the
> caller's credentials.  The module boolean parameter and mount option
> override_creds is also added as a presence check for this "feature",
> existence of /sys/module/overlay/parameters/overlay_creds
>
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> Signed-off-by: David Anderson <dvander@google.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Stephen Smalley <sds@tycho.nsa.gov>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-unionfs@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-team@android.com
> Cc: selinux@vger.kernel.org
> Cc: paulmoore@microsoft.com
> Cc: Luca.Boccassi@microsoft.com
>
> ---
>
> v19
> - rebase.
>

Hi David,

I see that the patch set has changed hands (presumably to Android upstreaming
team), but you just rebased v18 without addressing the maintainers concerns [1].

Specifically, the patch 2/4 is very wrong for unprivileged mount and
I think that the very noisy patch 1/4 could be completely avoided:
Can't you use -o userxattr mount option for Android use case and limit
the manipulation of user.ovrelay.* xattr based on sepolicy for actors
that are allowed
to make changes in overlayfs mount? or not limit at all?
The access to those xattr is forbidden via "incoming" xattr ops on
overlay inodes.

Also, IMO, the Documentation section about "Non overlapping credentials" does
not hold the same standards as the section about "Permission model" -
it does not
state the requirements clear enough for my non-security-oriented brain to be
able to understand if the "Ignore mounter's credentials" model can be exploited.

Can an unprivileged user create an overlay over a directory that they have
access to and redirect an innocent looking file name to an underlying file that
said the mounting user has no access to and by doing that, tricking a privileged
user to modify the innocent looking file on the  mounter's behalf?
Of course this could be avoided by forbidding unprivileged mount with
override_creds=off, but there could be other scenarios, so a clear model
would help to understand the risks.

For example:
If user 1 was able to read in lower dir A, now the content of overlay dir A
is cached and user 2, that has permissions to read upper dir A and does
not have read permissions on lower dir A will see the content of lower dir A.

I think that the core problem with the approach is using Non-uniform
credentials to access underlying layers. I don't see a simple way around
a big audit that checks all those cases, but maybe I'm missing some quick
shortcut or maybe your use case can add some restrictions about the
users that could access this overlay that would simplify the generic problem.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
David Anderson Nov. 18, 2021, 7:59 a.m. UTC | #3
On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/16/2021 5:58 PM, David Anderson wrote:
> > Mark Salyzyn (3):
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
>
> This isn't very clear. Are you saying that the security attributes
> of the upper, lower, and work directories are determined by the
> attributes of the process that mounted the filesystem? What is an
> "incoming access"? I'm sure that means something if you're steeped
> in the lore of overlayfs, but it isn't obvious to me.

(Sorry, hitting "Reply All" this time...)

Thanks for taking a look - Yes. An "incoming access" is the user
application security context accessing the filesystem.

> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.
>
> I'm sorry, but I've tried pretty hard, and can't puzzle that one out.

If your sepolicy is designed to give processes minimal privileges (as ours is),
then "init" might lack privileges even though other processes have them. For
example, init can mount /x but not access /x/y/z. But, process XYZ can access
/x/y/z. In our system processes have no privileges to anything by default,
and permissions are granted as needed, as narrowly as possible.

> DAC privileges are not hierarchical. This doesn't make any sense.

Sorry, that was probably not the right word. The intent was to say that a
process with minimal DAC privileges might be able to access a file, but
a process with expansive DAC privileges might be denied access to the
same file due to MAC restrictions.

> I think I might have figured that one out, but in order to do so
> I have to make way too many assumptions about the earlier paragraph.
> Could you please try to explain what you're doing with more context?

Hopefully the above helps explain: overlayfs uses the mounter's privileges,
which does not work on a system where the mounter does not have a
superset of child processes' privileges. That's the crux of the issue and
I'll keep working on how it's communicated in the patch description.

-David

On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/16/2021 5:58 PM, David Anderson wrote:
> > Mark Salyzyn (3):
> >    Add flags option to get xattr method paired to __vfs_getxattr
> >    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >    overlayfs: override_creds=off option bypass creator_cred
> >
> > Mark Salyzyn + John Stultz (1):
> >    overlayfs: inode_owner_or_capable called during execv
> >
> > The first three patches address fundamental security issues that should
> > be solved regardless of the override_creds=off feature.
> >
> > The fourth adds the feature depends on these other fixes.
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
>
> This isn't very clear. Are you saying that the security attributes
> of the upper, lower, and work directories are determined by the
> attributes of the process that mounted the filesystem? What is an
> "incoming access"? I'm sure that means something if you're steeped
> in the lore of overlayfs, but it isn't obvious to me.
>
> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.
>
> I'm sorry, but I've tried pretty hard, and can't puzzle that one out.
>
> >    For example, a file that a
> > lower DAC privileged caller can execute, is MAC denied to the
> > generally higher DAC privileged mounter, to prevent an attack vector.
>
> DAC privileges are not hierarchical. This doesn't make any sense.
>
> > We add the option to turn off override_creds in the mount options; all
> > subsequent operations after mount on the filesystem will be only the
> > caller's credentials.
>
> I think I might have figured that one out, but in order to do so
> I have to make way too many assumptions about the earlier paragraph.
> Could you please try to explain what you're doing with more context?
>
> >    The module boolean parameter and mount option
> > override_creds is also added as a presence check for this "feature",
> > existence of /sys/module/overlay/parameters/overlay_creds
> >
> > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > Signed-off-by: David Anderson <dvander@google.com>
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-team@android.com
> > Cc: selinux@vger.kernel.org
> > Cc: paulmoore@microsoft.com
> > Cc: Luca.Boccassi@microsoft.com
> >
> > ---
> >
> > v19
> > - rebase.
> >
> > v18
> > - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr
> >
> > v17
> > - correct some zero-day build failures.
> > - fix up documentation
> >
> > v16
> > - rebase and merge of two patches.
> > - add adjustment to deal with execv when overrides is off.
> >
> > v15
> > - Revert back to v4 with fixes from on the way from v5-v14. The single
> >    structure argument passing to address the complaints about too many
> >    arguments was rejected by the community.
> > - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH
> >    check. Can address that independently.
> > - ToDo: upstream test frame for thes security fixes (currently testing
> >    is all in Android).
> >
> > v14:
> > - Rejoin, rebase and a few adjustments.
> >
> > v13:
> > - Pull out first patch and try to get it in alone feedback, some
> >    Acks, and then <crickets> because people forgot why we were doing i.
> >
> > v12:
> > - Restore squished out patch 2 and 3 in the series,
> >    then change algorithm to add flags argument.
> >    Per-thread flag is a large security surface.
> >
> > v11:
> > - Squish out v10 introduced patch 2 and 3 in the series,
> >    then and use per-thread flag instead for nesting.
> > - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper.
> > - Add sb argument to ovl_revert_creds to match future work.
> >
> > v10:
> > - Return NULL on CAP_DAC_READ_SEARCH
> > - Add __get xattr method to solve sepolicy logging issue
> > - Drop unnecessary sys_admin sepolicy checking for administrative
> >    driver internal xattr functions.
> >
> > v6:
> > - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS.
> > - Do better with the documentation, drop rationalizations.
> > - pr_warn message adjusted to report consequences.
> >
> > v5:
> > - beefed up the caveats in the Documentation
> > - Is dependent on
> >    "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
> >    "overlayfs: check CAP_MKNOD before issuing vfs_whiteout"
> > - Added prwarn when override_creds=off
> >
> > v4:
> > - spelling and grammar errors in text
> >
> > v3:
> > - Change name from caller_credentials / creator_credentials to the
> >    boolean override_creds.
> > - Changed from creator to mounter credentials.
> > - Updated and fortified the documentation.
> > - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS
> >
> > v2:
> > - Forward port changed attr to stat, resulting in a build error.
> > - altered commit message.
> >
> > David Anderson (4):
> >    Add flags option to get xattr method paired to __vfs_getxattr
> >    overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >    overlayfs: override_creds=off option bypass creator_cred
> >    overlayfs: inode_owner_or_capable called during execv
> >
> >   Documentation/filesystems/locking.rst   |  2 +-
> >   Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++-
> >   fs/9p/acl.c                             |  3 +-
> >   fs/9p/xattr.c                           |  3 +-
> >   fs/afs/xattr.c                          | 10 +++---
> >   fs/attr.c                               |  2 +-
> >   fs/btrfs/xattr.c                        |  3 +-
> >   fs/ceph/xattr.c                         |  3 +-
> >   fs/cifs/xattr.c                         |  2 +-
> >   fs/ecryptfs/inode.c                     |  6 ++--
> >   fs/ecryptfs/mmap.c                      |  5 +--
> >   fs/erofs/xattr.c                        |  3 +-
> >   fs/ext2/xattr_security.c                |  2 +-
> >   fs/ext2/xattr_trusted.c                 |  2 +-
> >   fs/ext2/xattr_user.c                    |  2 +-
> >   fs/ext4/xattr_hurd.c                    |  2 +-
> >   fs/ext4/xattr_security.c                |  2 +-
> >   fs/ext4/xattr_trusted.c                 |  2 +-
> >   fs/ext4/xattr_user.c                    |  2 +-
> >   fs/f2fs/xattr.c                         |  4 +--
> >   fs/fuse/xattr.c                         |  4 +--
> >   fs/gfs2/xattr.c                         |  3 +-
> >   fs/hfs/attr.c                           |  2 +-
> >   fs/hfsplus/xattr.c                      |  3 +-
> >   fs/hfsplus/xattr_security.c             |  3 +-
> >   fs/hfsplus/xattr_trusted.c              |  3 +-
> >   fs/hfsplus/xattr_user.c                 |  3 +-
> >   fs/inode.c                              |  7 +++--
> >   fs/internal.h                           |  3 +-
> >   fs/jffs2/security.c                     |  3 +-
> >   fs/jffs2/xattr_trusted.c                |  3 +-
> >   fs/jffs2/xattr_user.c                   |  3 +-
> >   fs/jfs/xattr.c                          |  5 +--
> >   fs/kernfs/inode.c                       |  3 +-
> >   fs/nfs/nfs4proc.c                       |  9 ++++--
> >   fs/ntfs3/xattr.c                        |  2 +-
> >   fs/ocfs2/xattr.c                        |  9 ++++--
> >   fs/open.c                               |  2 +-
> >   fs/orangefs/xattr.c                     |  3 +-
> >   fs/overlayfs/copy_up.c                  |  2 +-
> >   fs/overlayfs/dir.c                      | 17 +++++-----
> >   fs/overlayfs/file.c                     | 25 ++++++++-------
> >   fs/overlayfs/inode.c                    | 29 ++++++++---------
> >   fs/overlayfs/namei.c                    |  6 ++--
> >   fs/overlayfs/overlayfs.h                |  7 +++--
> >   fs/overlayfs/ovl_entry.h                |  1 +
> >   fs/overlayfs/readdir.c                  |  8 ++---
> >   fs/overlayfs/super.c                    | 34 ++++++++++++++++----
> >   fs/overlayfs/util.c                     | 13 ++++++--
> >   fs/posix_acl.c                          |  2 +-
> >   fs/reiserfs/xattr_security.c            |  3 +-
> >   fs/reiserfs/xattr_trusted.c             |  3 +-
> >   fs/reiserfs/xattr_user.c                |  3 +-
> >   fs/squashfs/xattr.c                     |  2 +-
> >   fs/ubifs/xattr.c                        |  3 +-
> >   fs/xattr.c                              | 42 +++++++++++++------------
> >   fs/xfs/xfs_xattr.c                      |  3 +-
> >   include/linux/lsm_hook_defs.h           |  3 +-
> >   include/linux/security.h                |  6 ++--
> >   include/linux/xattr.h                   |  6 ++--
> >   include/uapi/linux/xattr.h              |  7 +++--
> >   mm/shmem.c                              |  3 +-
> >   net/socket.c                            |  3 +-
> >   security/commoncap.c                    | 11 ++++---
> >   security/integrity/evm/evm_main.c       | 13 +++++---
> >   security/security.c                     |  5 +--
> >   security/selinux/hooks.c                | 19 ++++++-----
> >   security/smack/smack_lsm.c              | 18 ++++++-----
> >   68 files changed, 289 insertions(+), 167 deletions(-)
> >
David Anderson Nov. 18, 2021, 9:53 a.m. UTC | #4
On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> Hi David,
>
> I see that the patch set has changed hands (presumably to Android upstreaming
> team), but you just rebased v18 without addressing the maintainers concerns [1].

Indeed I'm carrying this forward as Mark is no longer working on it.
My apologies for
missing those comments!

> Specifically, the patch 2/4 is very wrong for unprivileged mount and
> I think that the very noisy patch 1/4 could be completely avoided:
> Can't you use -o userxattr mount option for Android use case and limit
> the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> that are allowed
> to make changes in overlayfs mount? or not limit at all?
> The access to those xattr is forbidden via "incoming" xattr ops on
> overlay inodes.

Can you clarify a bit more? The patch is definitely super noisy and I'd love
to have a better solution. The problem it's trying to solve is:
 1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
 2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
 2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
 3. Kernel-privileged init tries to execute /blah/init to initiate a
domain transition.
 4. exec() fails because the overlayfs mounter creds (kernel domain) does
     not have getxattr permission to /blah/init.

Eg, we're hitting this problem without even making changes to the mount, and
without anything being written to /mnt/blah-upper.

> Can an unprivileged user create an overlay over a directory that they have
> access to and redirect an innocent looking file name to an underlying file that
> said the mounting user has no access to and by doing that, tricking a privileged
> user to modify the innocent looking file on the  mounter's behalf?
> Of course this could be avoided by forbidding unprivileged mount with
> override_creds=off, but there could be other scenarios, so a clear model
> would help to understand the risks.
>
> For example:
> If user 1 was able to read in lower dir A, now the content of overlay dir A
> is cached and user 2, that has permissions to read upper dir A and does
> not have read permissions on lower dir A will see the content of lower dir A.

I'll need to think about this more and test to verify. It's not a scenario that
would come up in our use case (both dirs effectively have the same permissions).

If the answer is "yes, that can happen" - do you see this as a problem of
clarifying the model, or a problem of fixing that loophole?

>> I think that the core problem with the approach is using Non-uniform
> credentials to access underlying layers. I don't see a simple way around
> a big audit that checks all those cases, but maybe I'm missing some quick
> shortcut or maybe your use case can add some restrictions about the
> users that could access this overlay that would simplify the generic problem.

In a security model like ours, I think there's no way around it, that
we really need
accesses to be from the caller's credentials and not the mounter's. It's even
worse than earlier iterations of this patch perhaps let on: we mount
before sepolicy
is loaded (so we can overlay the policy itself), and thus the
mounter's creds are
effectively "the kernel". This domain is highly restricted in our
sepolicy for obvious
reasons. There's no way our security team will let us unrestrict it.

Best,

-David




>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
Amir Goldstein Nov. 18, 2021, 10:20 a.m. UTC | #5
On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote:
>
> On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Hi David,
> >
> > I see that the patch set has changed hands (presumably to Android upstreaming
> > team), but you just rebased v18 without addressing the maintainers concerns [1].
>
> Indeed I'm carrying this forward as Mark is no longer working on it.
> My apologies for
> missing those comments!
>
> > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> > I think that the very noisy patch 1/4 could be completely avoided:
> > Can't you use -o userxattr mount option for Android use case and limit
> > the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> > that are allowed
> > to make changes in overlayfs mount? or not limit at all?
> > The access to those xattr is forbidden via "incoming" xattr ops on
> > overlay inodes.
>
> Can you clarify a bit more? The patch is definitely super noisy and I'd love
> to have a better solution. The problem it's trying to solve is:
>  1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
>  2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
>  2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
>  3. Kernel-privileged init tries to execute /blah/init to initiate a
> domain transition.
>  4. exec() fails because the overlayfs mounter creds (kernel domain) does
>      not have getxattr permission to /blah/init.
>
> Eg, we're hitting this problem without even making changes to the mount, and
> without anything being written to /mnt/blah-upper.
>

So what is your solution?
Remove the security check from overlayfs setting xattr?
How does that work for the person who composed the security policy?
You will need to grant some basic privileges to the mounter.
If you do not want to grant the mounter privileges to set trusted.overlay xattr
you may use mount option -o userxattr and grant it permissions to set
user.overlay xattrs.

> > Can an unprivileged user create an overlay over a directory that they have
> > access to and redirect an innocent looking file name to an underlying file that
> > said the mounting user has no access to and by doing that, tricking a privileged
> > user to modify the innocent looking file on the  mounter's behalf?
> > Of course this could be avoided by forbidding unprivileged mount with
> > override_creds=off, but there could be other scenarios, so a clear model
> > would help to understand the risks.
> >
> > For example:
> > If user 1 was able to read in lower dir A, now the content of overlay dir A
> > is cached and user 2, that has permissions to read upper dir A and does
> > not have read permissions on lower dir A will see the content of lower dir A.
>
> I'll need to think about this more and test to verify. It's not a scenario that
> would come up in our use case (both dirs effectively have the same permissions).
>

Your argument is taking the wrong POV.
The reason that previous posts of this patch set have been rejected
is not because it doesn't work for your use case.
It is because other players can exploit the feature to bypass security
policies, so the feature cannot be merged as is.

> If the answer is "yes, that can happen" - do you see this as a problem of
> clarifying the model, or a problem of fixing that loophole?
>

It is something that is not at all easy to fix.
In the example above, instead of checking permissions against the
overlay inode (on "incoming" readdir) will need to check permissions of every
accessing user against all layers, before allowing access to the merged
directory content (which is cached).
A lot more work - and this is just for this one example.

> >> I think that the core problem with the approach is using Non-uniform
> > credentials to access underlying layers. I don't see a simple way around
> > a big audit that checks all those cases, but maybe I'm missing some quick
> > shortcut or maybe your use case can add some restrictions about the
> > users that could access this overlay that would simplify the generic problem.
>
> In a security model like ours, I think there's no way around it, that
> we really need
> accesses to be from the caller's credentials and not the mounter's. It's even
> worse than earlier iterations of this patch perhaps let on: we mount
> before sepolicy
> is loaded (so we can overlay the policy itself), and thus the
> mounter's creds are
> effectively "the kernel". This domain is highly restricted in our
> sepolicy for obvious
> reasons. There's no way our security team will let us unrestrict it.
>

Not sure what that means or what I can do with this information.
If I had a simple suggestion on how to solve your problem I would have
suggested it, but I cannot think of any right now.
The best I can do is to try to make you understand the problems that your
patch causes to others, so you can figure out a way that meets your goals
without breaking other use cases.

Thanks,
Amir.
David Anderson Nov. 18, 2021, 8:15 p.m. UTC | #6
On Thu, Nov 18, 2021 at 2:20 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote:
> >
> > On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Hi David,
> > >
> > > I see that the patch set has changed hands (presumably to Android upstreaming
> > > team), but you just rebased v18 without addressing the maintainers concerns [1].
> >
> > Indeed I'm carrying this forward as Mark is no longer working on it.
> > My apologies for
> > missing those comments!
> >
> > > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> > > I think that the very noisy patch 1/4 could be completely avoided:
> > > Can't you use -o userxattr mount option for Android use case and limit
> > > the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> > > that are allowed
> > > to make changes in overlayfs mount? or not limit at all?
> > > The access to those xattr is forbidden via "incoming" xattr ops on
> > > overlay inodes.
> >
> > Can you clarify a bit more? The patch is definitely super noisy and I'd love
> > to have a better solution. The problem it's trying to solve is:
> >  1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper.
> >  2. Kernel-privileged init mounts /blah with overlayfs using the above dirs.
> >  2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins.
> >  3. Kernel-privileged init tries to execute /blah/init to initiate a
> > domain transition.
> >  4. exec() fails because the overlayfs mounter creds (kernel domain) does
> >      not have getxattr permission to /blah/init.
> >
> > Eg, we're hitting this problem without even making changes to the mount, and
> > without anything being written to /mnt/blah-upper.
> >
>
> So what is your solution?
> Remove the security check from overlayfs setting xattr?
> How does that work for the person who composed the security policy?
> You will need to grant some basic privileges to the mounter.
> If you do not want to grant the mounter privileges to set trusted.overlay xattr
> you may use mount option -o userxattr and grant it permissions to set
> user.overlay xattrs.

Thanks for the tips. I think it might be possible for us to work
around this one in
userspace, either by granting the mounter this one permission across
all contexts,
or with a much narrower kernel fix that doesn't require us keeping the big noisy
patch in our tree.

> > > Can an unprivileged user create an overlay over a directory that they have
> > > access to and redirect an innocent looking file name to an underlying file that
> > > said the mounting user has no access to and by doing that, tricking a privileged
> > > user to modify the innocent looking file on the  mounter's behalf?
> > > Of course this could be avoided by forbidding unprivileged mount with
> > > override_creds=off, but there could be other scenarios, so a clear model
> > > would help to understand the risks.
> > >
> > > For example:
> > > If user 1 was able to read in lower dir A, now the content of overlay dir A
> > > is cached and user 2, that has permissions to read upper dir A and does
> > > not have read permissions on lower dir A will see the content of lower dir A.
> >
> > I'll need to think about this more and test to verify. It's not a scenario that
> > would come up in our use case (both dirs effectively have the same permissions).
> >
>
> Your argument is taking the wrong POV.
> The reason that previous posts of this patch set have been rejected
> is not because it doesn't work for your use case.
> It is because other players can exploit the feature to bypass security
> policies, so the feature cannot be merged as is.

Ack, understood.

> > If the answer is "yes, that can happen" - do you see this as a problem of
> > clarifying the model, or a problem of fixing that loophole?
> >
>
> It is something that is not at all easy to fix.
> In the example above, instead of checking permissions against the
> overlay inode (on "incoming" readdir) will need to check permissions of every
> accessing user against all layers, before allowing access to the merged
> directory content (which is cached).
> A lot more work - and this is just for this one example.

I see your point. If we could implement that, behind a mount flag, would that be
an acceptable solution?

> > >> I think that the core problem with the approach is using Non-uniform
> > > credentials to access underlying layers. I don't see a simple way around
> > > a big audit that checks all those cases, but maybe I'm missing some quick
> > > shortcut or maybe your use case can add some restrictions about the
> > > users that could access this overlay that would simplify the generic problem.
> >
> > In a security model like ours, I think there's no way around it, that
> > we really need
> > accesses to be from the caller's credentials and not the mounter's. It's even
> > worse than earlier iterations of this patch perhaps let on: we mount
> > before sepolicy
> > is loaded (so we can overlay the policy itself), and thus the
> > mounter's creds are
> > effectively "the kernel". This domain is highly restricted in our
> > sepolicy for obvious
> > reasons. There's no way our security team will let us unrestrict it.
> >
>
> Not sure what that means or what I can do with this information.
> If I had a simple suggestion on how to solve your problem I would have
> suggested it, but I cannot think of any right now.
> The best I can do is to try to make you understand the problems that your
> patch causes to others, so you can figure out a way that meets your goals
> without breaking other use cases.

To help clarify, we do not have any processes which have access to everything.
Almost every file in the system has a specific selabel and processes are only
granted access to the labels they need.

To give the mounter (init in kernel domain) access to every single label doesn't
make sense. It only makes sense to have the access be "passthrough", making
overlayfs as transparent as possible.

Best,

-David

>
> Thanks,
> Amir.
Amir Goldstein Nov. 18, 2021, 8:32 p.m. UTC | #7
> > It is something that is not at all easy to fix.
> > In the example above, instead of checking permissions against the
> > overlay inode (on "incoming" readdir) will need to check permissions of every
> > accessing user against all layers, before allowing access to the merged
> > directory content (which is cached).
> > A lot more work - and this is just for this one example.
>
> I see your point. If we could implement that, behind a mount flag, would that be
> an acceptable solution?
>

As I wrote, this is one specific problem that I identified.
If you propose a different behavior base on mount flag you should
be able to argue that is cannot be exploited to circumvent security
access policies, by peaking into cached copies of objects that the user
has no access to, or by any other way.

I have no idea how to implement what you want and prove that
it is safe.
Maybe if you explained the use case in greater details with some
examples someone could help you reach a possible solution.

Thanks,
Amir.
Vivek Goyal Dec. 3, 2021, 3:37 p.m. UTC | #8
On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> >
> > Mark Salyzyn (3):
> >   Add flags option to get xattr method paired to __vfs_getxattr
> >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> >   overlayfs: override_creds=off option bypass creator_cred
> >
> > Mark Salyzyn + John Stultz (1):
> >   overlayfs: inode_owner_or_capable called during execv
> >
> > The first three patches address fundamental security issues that should
> > be solved regardless of the override_creds=off feature.
> >
> > The fourth adds the feature depends on these other fixes.
> >
> > By default, all access to the upper, lower and work directories is the
> > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > checked against the caller's credentials.
> >
> > If the principles of least privilege are applied for sepolicy, the
> > mounter's credentials might not overlap the credentials of the caller's
> > when accessing the overlayfs filesystem.  For example, a file that a
> > lower DAC privileged caller can execute, is MAC denied to the
> > generally higher DAC privileged mounter, to prevent an attack vector.
> >
> > We add the option to turn off override_creds in the mount options; all
> > subsequent operations after mount on the filesystem will be only the
> > caller's credentials.  The module boolean parameter and mount option
> > override_creds is also added as a presence check for this "feature",
> > existence of /sys/module/overlay/parameters/overlay_creds
> >
> > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > Signed-off-by: David Anderson <dvander@google.com>
> > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: linux-unionfs@vger.kernel.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-team@android.com
> > Cc: selinux@vger.kernel.org
> > Cc: paulmoore@microsoft.com
> > Cc: Luca.Boccassi@microsoft.com
> >
> > ---
> >
> > v19
> > - rebase.
> >
> 
> Hi David,
> 
> I see that the patch set has changed hands (presumably to Android upstreaming
> team), but you just rebased v18 without addressing the maintainers concerns [1].
> 

BTW, where is patch 1 of the series. I can't seem to find it.

I think I was running into issues with getxattr() on underlying filesystem
as well (if mounter did not have sufficient privileges) and tried to fix
it. But did not find a good solution at that point of time.

https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/

So basically when overlay inode is being initialized, code will try to
query "security.selinux" xattr on underlying file to initialize selinux
label on the overlay inode. For regular filesystems, they bypass the
security check by calling __vfs_getxattr() when trying to initialize
this selinux security label. But with layered filesystem, it still
ends up calling vfs_getxattr() on underlying filesyste. Which means
it checks for caller's creds and if caller is not priviliged enough,
access will be denied.

To solve this problem, looks like this patch set is passing a flag
XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
path in underlying filesystem. As long as this information is
not leaked to user space (and remains in overlayfs), it probably is
fine? And if information is not going to user space, then it probably
is fine for unprivileged overlayfs mounts as well?

I see a comment from Miklos as well as you that it is not safe to
do for unprivileged mounts. Can you help me understand why that's
the case.


> Specifically, the patch 2/4 is very wrong for unprivileged mount and

Can you help me understand why it is wrong. (/me should spend more
time reading the patch. But I am taking easy route of asking you. :-)).

> I think that the very noisy patch 1/4 could be completely avoided:

How can it completely avoided. If mounter is not privileged then 
vfs_getxattr() on underlying filesystem will fail. Or if
override_creds=off, then caller might not be privileged enough to
do getxattr() but we still should be able to initialize overlay
inode security label. 

> Can't you use -o userxattr mount option

user xattrs done't work for device nodes and symlinks.

BTW, how will userxattr solve the problem completely. It can be used
to store overlay specific xattrs but accessing security xattrs on
underlying filesystem will still be a problem?

Thanks
Vivek

> for Android use case and limit
> the manipulation of user.ovrelay.* xattr based on sepolicy for actors
> that are allowed
> to make changes in overlayfs mount? or not limit at all?
> The access to those xattr is forbidden via "incoming" xattr ops on
> overlay inodes.
> 
> Also, IMO, the Documentation section about "Non overlapping credentials" does
> not hold the same standards as the section about "Permission model" -
> it does not
> state the requirements clear enough for my non-security-oriented brain to be
> able to understand if the "Ignore mounter's credentials" model can be exploited.
> 
> Can an unprivileged user create an overlay over a directory that they have
> access to and redirect an innocent looking file name to an underlying file that
> said the mounting user has no access to and by doing that, tricking a privileged
> user to modify the innocent looking file on the  mounter's behalf?
> Of course this could be avoided by forbidding unprivileged mount with
> override_creds=off, but there could be other scenarios, so a clear model
> would help to understand the risks.
> 
> For example:
> If user 1 was able to read in lower dir A, now the content of overlay dir A
> is cached and user 2, that has permissions to read upper dir A and does
> not have read permissions on lower dir A will see the content of lower dir A.
> 
> I think that the core problem with the approach is using Non-uniform
> credentials to access underlying layers. I don't see a simple way around
> a big audit that checks all those cases, but maybe I'm missing some quick
> shortcut or maybe your use case can add some restrictions about the
> users that could access this overlay that would simplify the generic problem.
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
>
Paul Moore Dec. 3, 2021, 4:04 p.m. UTC | #9
On Fri, Dec 3, 2021 at 10:38 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > >
> > > Mark Salyzyn (3):
> > >   Add flags option to get xattr method paired to __vfs_getxattr
> > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > >   overlayfs: override_creds=off option bypass creator_cred
> > >
> > > Mark Salyzyn + John Stultz (1):
> > >   overlayfs: inode_owner_or_capable called during execv
> > >
> > > The first three patches address fundamental security issues that should
> > > be solved regardless of the override_creds=off feature.
> > >
> > > The fourth adds the feature depends on these other fixes.
> > >
> > > By default, all access to the upper, lower and work directories is the
> > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > checked against the caller's credentials.
> > >
> > > If the principles of least privilege are applied for sepolicy, the
> > > mounter's credentials might not overlap the credentials of the caller's
> > > when accessing the overlayfs filesystem.  For example, a file that a
> > > lower DAC privileged caller can execute, is MAC denied to the
> > > generally higher DAC privileged mounter, to prevent an attack vector.
> > >
> > > We add the option to turn off override_creds in the mount options; all
> > > subsequent operations after mount on the filesystem will be only the
> > > caller's credentials.  The module boolean parameter and mount option
> > > override_creds is also added as a presence check for this "feature",
> > > existence of /sys/module/overlay/parameters/overlay_creds
>
> BTW, where is patch 1 of the series. I can't seem to find it.

Lore to the rescue ...

https://lore.kernel.org/selinux/20211117015806.2192263-2-dvander@google.com/
Amir Goldstein Dec. 3, 2021, 4:31 p.m. UTC | #10
On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > >
> > > Mark Salyzyn (3):
> > >   Add flags option to get xattr method paired to __vfs_getxattr
> > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > >   overlayfs: override_creds=off option bypass creator_cred
> > >
> > > Mark Salyzyn + John Stultz (1):
> > >   overlayfs: inode_owner_or_capable called during execv
> > >
> > > The first three patches address fundamental security issues that should
> > > be solved regardless of the override_creds=off feature.
> > >
> > > The fourth adds the feature depends on these other fixes.
> > >
> > > By default, all access to the upper, lower and work directories is the
> > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > checked against the caller's credentials.
> > >
> > > If the principles of least privilege are applied for sepolicy, the
> > > mounter's credentials might not overlap the credentials of the caller's
> > > when accessing the overlayfs filesystem.  For example, a file that a
> > > lower DAC privileged caller can execute, is MAC denied to the
> > > generally higher DAC privileged mounter, to prevent an attack vector.
> > >
> > > We add the option to turn off override_creds in the mount options; all
> > > subsequent operations after mount on the filesystem will be only the
> > > caller's credentials.  The module boolean parameter and mount option
> > > override_creds is also added as a presence check for this "feature",
> > > existence of /sys/module/overlay/parameters/overlay_creds
> > >
> > > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > > Signed-off-by: David Anderson <dvander@google.com>
> > > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > Cc: Amir Goldstein <amir73il@gmail.com>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > > Cc: John Stultz <john.stultz@linaro.org>
> > > Cc: linux-doc@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: linux-fsdevel@vger.kernel.org
> > > Cc: linux-unionfs@vger.kernel.org
> > > Cc: linux-security-module@vger.kernel.org
> > > Cc: kernel-team@android.com
> > > Cc: selinux@vger.kernel.org
> > > Cc: paulmoore@microsoft.com
> > > Cc: Luca.Boccassi@microsoft.com
> > >
> > > ---
> > >
> > > v19
> > > - rebase.
> > >
> >
> > Hi David,
> >
> > I see that the patch set has changed hands (presumably to Android upstreaming
> > team), but you just rebased v18 without addressing the maintainers concerns [1].
> >
>
> BTW, where is patch 1 of the series. I can't seem to find it.
>
> I think I was running into issues with getxattr() on underlying filesystem
> as well (if mounter did not have sufficient privileges) and tried to fix
> it. But did not find a good solution at that point of time.
>
> https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/
>
> So basically when overlay inode is being initialized, code will try to
> query "security.selinux" xattr on underlying file to initialize selinux
> label on the overlay inode. For regular filesystems, they bypass the
> security check by calling __vfs_getxattr() when trying to initialize
> this selinux security label. But with layered filesystem, it still
> ends up calling vfs_getxattr() on underlying filesyste. Which means
> it checks for caller's creds and if caller is not priviliged enough,
> access will be denied.
>
> To solve this problem, looks like this patch set is passing a flag
> XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
> path in underlying filesystem. As long as this information is
> not leaked to user space (and remains in overlayfs), it probably is
> fine? And if information is not going to user space, then it probably
> is fine for unprivileged overlayfs mounts as well?
>
> I see a comment from Miklos as well as you that it is not safe to
> do for unprivileged mounts. Can you help me understand why that's
> the case.
>
>
> > Specifically, the patch 2/4 is very wrong for unprivileged mount and
>
> Can you help me understand why it is wrong. (/me should spend more
> time reading the patch. But I am taking easy route of asking you. :-)).
>

I should have spent more time reading the patch too :-)
I was not referring to the selinux part. That looks fine I guess.

I was referring to the part of:
"Check impure, opaque, origin & meta xattr with no sepolicy audit
(using __vfs_getxattr) since these operations are internal to
overlayfs operations and do not disclose any data."
I don't know how safe that really is to ignore the security checks
for reading trusted xattr and allow non-privileged mounts to do that.
Certainly since non privileged mounts are likely to use userxattr
anyway, so what's the reason to bypass security?

> > I think that the very noisy patch 1/4 could be completely avoided:
>
> How can it completely avoided. If mounter is not privileged then
> vfs_getxattr() on underlying filesystem will fail. Or if
> override_creds=off, then caller might not be privileged enough to
> do getxattr() but we still should be able to initialize overlay
> inode security label.
>

My bad. I didn't read the description of the selinux problem
with the re-post and forgot about it.

> > Can't you use -o userxattr mount option
>
> user xattrs done't work for device nodes and symlinks.
>
> BTW, how will userxattr solve the problem completely. It can be used
> to store overlay specific xattrs but accessing security xattrs on
> underlying filesystem will still be a problem?

It cannot.
As long as the patch sticks with passing through the
getxattr flags, it looks fine to me.
passing security for trusted.overlay seems dodgy.

Thanks,
Amir.
Vivek Goyal Dec. 3, 2021, 6:34 p.m. UTC | #11
On Fri, Dec 03, 2021 at 06:31:01PM +0200, Amir Goldstein wrote:
> On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote:
> > > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote:
> > > >
> > > > Mark Salyzyn (3):
> > > >   Add flags option to get xattr method paired to __vfs_getxattr
> > > >   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
> > > >   overlayfs: override_creds=off option bypass creator_cred
> > > >
> > > > Mark Salyzyn + John Stultz (1):
> > > >   overlayfs: inode_owner_or_capable called during execv
> > > >
> > > > The first three patches address fundamental security issues that should
> > > > be solved regardless of the override_creds=off feature.
> > > >
> > > > The fourth adds the feature depends on these other fixes.
> > > >
> > > > By default, all access to the upper, lower and work directories is the
> > > > recorded mounter's MAC and DAC credentials.  The incoming accesses are
> > > > checked against the caller's credentials.
> > > >
> > > > If the principles of least privilege are applied for sepolicy, the
> > > > mounter's credentials might not overlap the credentials of the caller's
> > > > when accessing the overlayfs filesystem.  For example, a file that a
> > > > lower DAC privileged caller can execute, is MAC denied to the
> > > > generally higher DAC privileged mounter, to prevent an attack vector.
> > > >
> > > > We add the option to turn off override_creds in the mount options; all
> > > > subsequent operations after mount on the filesystem will be only the
> > > > caller's credentials.  The module boolean parameter and mount option
> > > > override_creds is also added as a presence check for this "feature",
> > > > existence of /sys/module/overlay/parameters/overlay_creds
> > > >
> > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> > > > Signed-off-by: David Anderson <dvander@google.com>
> > > > Cc: Miklos Szeredi <miklos@szeredi.hu>
> > > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > > Cc: Vivek Goyal <vgoyal@redhat.com>
> > > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > > Cc: Amir Goldstein <amir73il@gmail.com>
> > > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > > Cc: Stephen Smalley <sds@tycho.nsa.gov>
> > > > Cc: John Stultz <john.stultz@linaro.org>
> > > > Cc: linux-doc@vger.kernel.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: linux-fsdevel@vger.kernel.org
> > > > Cc: linux-unionfs@vger.kernel.org
> > > > Cc: linux-security-module@vger.kernel.org
> > > > Cc: kernel-team@android.com
> > > > Cc: selinux@vger.kernel.org
> > > > Cc: paulmoore@microsoft.com
> > > > Cc: Luca.Boccassi@microsoft.com
> > > >
> > > > ---
> > > >
> > > > v19
> > > > - rebase.
> > > >
> > >
> > > Hi David,
> > >
> > > I see that the patch set has changed hands (presumably to Android upstreaming
> > > team), but you just rebased v18 without addressing the maintainers concerns [1].
> > >
> >
> > BTW, where is patch 1 of the series. I can't seem to find it.
> >
> > I think I was running into issues with getxattr() on underlying filesystem
> > as well (if mounter did not have sufficient privileges) and tried to fix
> > it. But did not find a good solution at that point of time.
> >
> > https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/
> >
> > So basically when overlay inode is being initialized, code will try to
> > query "security.selinux" xattr on underlying file to initialize selinux
> > label on the overlay inode. For regular filesystems, they bypass the
> > security check by calling __vfs_getxattr() when trying to initialize
> > this selinux security label. But with layered filesystem, it still
> > ends up calling vfs_getxattr() on underlying filesyste. Which means
> > it checks for caller's creds and if caller is not priviliged enough,
> > access will be denied.
> >
> > To solve this problem, looks like this patch set is passing a flag
> > XATTR_NOSECUROTY so that permission checks are skipped in getxattr()
> > path in underlying filesystem. As long as this information is
> > not leaked to user space (and remains in overlayfs), it probably is
> > fine? And if information is not going to user space, then it probably
> > is fine for unprivileged overlayfs mounts as well?
> >
> > I see a comment from Miklos as well as you that it is not safe to
> > do for unprivileged mounts. Can you help me understand why that's
> > the case.
> >
> >
> > > Specifically, the patch 2/4 is very wrong for unprivileged mount and
> >
> > Can you help me understand why it is wrong. (/me should spend more
> > time reading the patch. But I am taking easy route of asking you. :-)).
> >
> 
> I should have spent more time reading the patch too :-)
> I was not referring to the selinux part. That looks fine I guess.
> 
> I was referring to the part of:
> "Check impure, opaque, origin & meta xattr with no sepolicy audit
> (using __vfs_getxattr) since these operations are internal to
> overlayfs operations and do not disclose any data."
> I don't know how safe that really is to ignore the security checks
> for reading trusted xattr and allow non-privileged mounts to do that.

I am also concerned about this.

> Certainly since non privileged mounts are likely to use userxattr
> anyway, so what's the reason to bypass security?

I am not sure. In the early version of patches I think argument was
that do not switch to mounter's creds and use caller's creds on 
underlying filesystem as well. And each caller will be privileged
enough to be able to perform the operation.

Our take was that how is this model better because in current model
only mounter needs to be privileged while in this new model each
caller will have to be privileged. But Android guys seemed to be ok
with that. So has this assumption changed since early days. If callers
are privileged, then vfs_getxattr() on underlying filesystem for
overaly internal xattrs should succeed and there is no need for this
change.

I suspect patches have evolved since then and callers are not as
privileged as we expect them to and that's why we are bypassing this
check on all overlayfs internal trusted xattrs? This definitely requires
much close scrutiny. My initial reaction is that this sounds very scary.

In general I would think overlayfs should not bypass the check on
underlying fs. Either checks should be done in mounter's context or
caller's context (depending on override_creds=on/off).

Thanks
Vivek

> 
> > > I think that the very noisy patch 1/4 could be completely avoided:
> >
> > How can it completely avoided. If mounter is not privileged then
> > vfs_getxattr() on underlying filesystem will fail. Or if
> > override_creds=off, then caller might not be privileged enough to
> > do getxattr() but we still should be able to initialize overlay
> > inode security label.
> >
> 
> My bad. I didn't read the description of the selinux problem
> with the re-post and forgot about it.
> 
> > > Can't you use -o userxattr mount option
> >
> > user xattrs done't work for device nodes and symlinks.
> >
> > BTW, how will userxattr solve the problem completely. It can be used
> > to store overlay specific xattrs but accessing security xattrs on
> > underlying filesystem will still be a problem?
> 
> It cannot.
> As long as the patch sticks with passing through the
> getxattr flags, it looks fine to me.
> passing security for trusted.overlay seems dodgy.
> 
> Thanks,
> Amir.
>
Paul Moore March 1, 2022, 1:09 a.m. UTC | #12
I wanted to try and bring this thread back from the dead (?) as I
believe the use-case is still valid and worth supporting.  Some more
brief comments below ...

On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> I am not sure. In the early version of patches I think argument was
> that do not switch to mounter's creds and use caller's creds on
> underlying filesystem as well. And each caller will be privileged
> enough to be able to perform the operation.

The basic idea is that we can now build Linux systems with enough
access control granularity such that a given process can have the
necessary privileges to mount a filesystem, but not necessarily access
all of the data on the filesystem, while other processes, with
different access rights, are allowed to read and write data on the
mounted filesystem.  Granted, this is a bit different from how things
are usually done, but in my opinion it's a valid and interesting use
case in that it allows us to remove unneeded access rights from
historically very privileged system startup services/scripts: the
service that runs to mount my homedir shouldn't be allowed to access
my files just to mount the directory.

Unfortunately, this idea falls apart when we attempt to use overlayfs
due to the clever/usual way it caches the mounting processes
credentials and uses that in place of the current process' credentials
when accessing certain parts of the underlying filesystems.  The
current overlayfs implementation assumes that the mounter will always
be more privileged than the processes accessing the filesystem, it
would be nice if we could build a mechanism that didn't have this
assumption baked into the implementation.

This patchset may not have been The Answer, but surely there is
something we can do to support this use-case.

> Our take was that how is this model better because in current model
> only mounter needs to be privileged while in this new model each
> caller will have to be privileged. But Android guys seemed to be ok
> with that. So has this assumption changed since early days. If callers
> are privileged, then vfs_getxattr() on underlying filesystem for
> overaly internal xattrs should succeed and there is no need for this
> change.
>
> I suspect patches have evolved since then and callers are not as
> privileged as we expect them to and that's why we are bypassing this
> check on all overlayfs internal trusted xattrs? This definitely requires
> much close scrutiny. My initial reaction is that this sounds very scary.
>
> In general I would think overlayfs should not bypass the check on
> underlying fs. Either checks should be done in mounter's context or
> caller's context (depending on override_creds=on/off).
>
> Thanks
> Vivek
Paul Moore March 9, 2022, 9:13 p.m. UTC | #13
On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
>>
>> I wanted to try and bring this thread back from the dead (?) as I
>> believe the use-case is still valid and worth supporting.  Some more
>> brief comments below ...
>>
>> On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>> > I am not sure. In the early version of patches I think argument was
>> > that do not switch to mounter's creds and use caller's creds on
>> > underlying filesystem as well. And each caller will be privileged
>> > enough to be able to perform the operation.
>
> Indeed that was the argument - though, "userxattr" eliminated the need for patches 1 & 2 completely for us, which is great. We're no longer carrying those in our 5.15 tree.
>
>> Unfortunately, this idea falls apart when we attempt to use overlayfs
>> due to the clever/usual way it caches the mounting processes
>> credentials and uses that in place of the current process' credentials
>> when accessing certain parts of the underlying filesystems.  The
>> current overlayfs implementation assumes that the mounter will always
>> be more privileged than the processes accessing the filesystem, it
>> would be nice if we could build a mechanism that didn't have this
>> assumption baked into the implementation.
>>
>> This patchset may not have been The Answer, but surely there is
>> something we can do to support this use-case.
>
> Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.

Can you elaborate a bit more on the coherency issues?  Is this the dir
cache issue that is alluded to in the patchset?  Anything else that
has come up on review?

Before I start looking at the dir cache in any detail, did you have
any thoughts on how to resolve the problems that have arisen?
Paul Moore March 10, 2022, 10:11 p.m. UTC | #14
On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:

...

> >> This patchset may not have been The Answer, but surely there is
> >> something we can do to support this use-case.
> >
> > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
>
> Can you elaborate a bit more on the coherency issues?  Is this the dir
> cache issue that is alluded to in the patchset?  Anything else that
> has come up on review?
>
> Before I start looking at the dir cache in any detail, did you have
> any thoughts on how to resolve the problems that have arisen?

David, Vivek, Amir, Miklos, or anyone for that matter, can you please
go into more detail on the cache issues?  I *think* I may have found a
potential solution for an issue that could arise when the credential
override is not in place, but I'm not certain it's the only issue :)

There is motivation on our part to try and get the
"override_creds=off" portion of the patchset working and suitable for
upstreaming, but I need some help in making sure I understand all the
objections/problems.
Amir Goldstein March 11, 2022, 4:09 a.m. UTC | #15
On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
>
> ...
>
> > >> This patchset may not have been The Answer, but surely there is
> > >> something we can do to support this use-case.
> > >
> > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
> >
> > Can you elaborate a bit more on the coherency issues?  Is this the dir
> > cache issue that is alluded to in the patchset?  Anything else that
> > has come up on review?
> >
> > Before I start looking at the dir cache in any detail, did you have
> > any thoughts on how to resolve the problems that have arisen?
>
> David, Vivek, Amir, Miklos, or anyone for that matter, can you please
> go into more detail on the cache issues?  I *think* I may have found a
> potential solution for an issue that could arise when the credential
> override is not in place, but I'm not certain it's the only issue :)
>

Hi Paul,

In this thread I claimed that the authors of the patches did not present
a security model for overlayfs, such as the one currently in overlayfs.rst.
If we had a model we could have debated its correctness and review its
implementation.

As a proof that there is no solid model, I gave an *example* regarding
the overlay readdir cache.

When listing a merged dir, meaning, a directory containing entries from
several overlay layers, ovl_permission() is called to check user's permission,
but ovl_permission() does not currently check permissions to read all layers,
because that is not the current overlayfs model.

Overlayfs has a readdir cache, so without override_cred, a user with high
credentials can populate the readdir cache and then a user will fewer
credentials, not enough to access the lower layers, but enough to access
the upper most layer, will pass ovl_permission() check and be allowed to
read from readdir cache.

This specific problem can be solved in several ways - disable readdir
cache with override_cred=off, check all layers in ovl_permission().
That's not my point. My point is that I provided a proof that the current
model of override_cred=off is flawed and it is up to the authors of the
patch to fix the model and provide the analysis of overlayfs code to
prove the model's correctness.

The core of the matter is there is no easy way to "merge" the permissions
from all layers into a single permission blob that could be checked once.

Maybe the example I gave is the only flaw in the model, maybe not
I am not sure. I will be happy to help you in review of a model and the
solution that you may have found.

Thanks,
Amir.
Vivek Goyal March 11, 2022, 2:01 p.m. UTC | #16
On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote:
> On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote:
> > > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > ...
> >
> > > >> This patchset may not have been The Answer, but surely there is
> > > >> something we can do to support this use-case.
> > > >
> > > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues.
> > >
> > > Can you elaborate a bit more on the coherency issues?  Is this the dir
> > > cache issue that is alluded to in the patchset?  Anything else that
> > > has come up on review?
> > >
> > > Before I start looking at the dir cache in any detail, did you have
> > > any thoughts on how to resolve the problems that have arisen?
> >
> > David, Vivek, Amir, Miklos, or anyone for that matter, can you please
> > go into more detail on the cache issues?  I *think* I may have found a
> > potential solution for an issue that could arise when the credential
> > override is not in place, but I'm not certain it's the only issue :)
> >
> 
> Hi Paul,
> 
> In this thread I claimed that the authors of the patches did not present
> a security model for overlayfs, such as the one currently in overlayfs.rst.
> If we had a model we could have debated its correctness and review its
> implementation.

Agreed. After going through the patch set, I was wondering what's the
overall security model and how to visualize that.

So probably there needs to be a documentation patch which explains
what's the new security model and how does it work.

Also think both in terms of DAC and MAC. (Instead of just focussing too
hard on SELinux).

My understanding is that in current model, some of the overlayfs
operations require priviliges. So mounter is supposed to be priviliged
and does the operation on underlying layers.

Now in this new model, there will be two levels of check. Both overlay
level and underlying layer checks will happen in the context of task
which is doing the operation. So first of all, all tasks will need
to have enough priviliges to be able to perform various operations
on lower layer. 

If we do checks at both the levels in with the creds of calling task,
I guess that probably is fine. (But will require a closer code inspection
to make sure there is no privilege escalation both for mounter as well
calling task).

> 
> As a proof that there is no solid model, I gave an *example* regarding
> the overlay readdir cache.
> 
> When listing a merged dir, meaning, a directory containing entries from
> several overlay layers, ovl_permission() is called to check user's permission,
> but ovl_permission() does not currently check permissions to read all layers,
> because that is not the current overlayfs model.
> 
> Overlayfs has a readdir cache, so without override_cred, a user with high
> credentials can populate the readdir cache and then a user will fewer
> credentials, not enough to access the lower layers, but enough to access
> the upper most layer, will pass ovl_permission() check and be allowed to
> read from readdir cache.

I am not very familiar with dir caching code. When I read through the
overlayfs.rst, it gave the impression that caching is per "struct file".

"This merged name list is cached in the
'struct file' and so remains as long as the file is kept open."

And I was wondering if that's the case, then one user should not be able
to access the cache built by another priviliged user (until and unless
privileged user passed fd).

But looks like we build this cache and store in ovl inode and that's
why this issue of cache built by higher privileged process will be
accessible to lower privileged process.

With current model this is not an issue because "mounter" is providing
those privileges to unprivileged process. So while unprivileged process
can't do "readdir" on an underlying lower dir, it might still be able
to do that through an overlay mount. But if we don't switch to mounter's
creds, then we probably can't rely on this dir caching. Agreed that
disabling dir caching seems simplest solution if we were to do
override_creds=off.

Thanks
Vivek

> 
> This specific problem can be solved in several ways - disable readdir
> cache with override_cred=off, check all layers in ovl_permission().
> That's not my point. My point is that I provided a proof that the current
> model of override_cred=off is flawed and it is up to the authors of the
> patch to fix the model and provide the analysis of overlayfs code to
> prove the model's correctness.
> 
> The core of the matter is there is no easy way to "merge" the permissions
> from all layers into a single permission blob that could be checked once.
> 
> Maybe the example I gave is the only flaw in the model, maybe not
> I am not sure. I will be happy to help you in review of a model and the
> solution that you may have found.
> 
> Thanks,
> Amir.
>
Paul Moore March 11, 2022, 8:52 p.m. UTC | #17
On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote:
> > Hi Paul,

Hi Amir, Vivek,

Thanks for the replies, I think I now have a better understanding of
the concerns which is starting to make the path forward a bit more
clear.  A few more comments below ...

> > In this thread I claimed that the authors of the patches did not present
> > a security model for overlayfs, such as the one currently in overlayfs.rst.
> > If we had a model we could have debated its correctness and review its
> > implementation.
>
> Agreed. After going through the patch set, I was wondering what's the
> overall security model and how to visualize that.
>
> So probably there needs to be a documentation patch which explains
> what's the new security model and how does it work.

Yes, of course.  I'll be sure to add a section to the existing docs.

> Also think both in terms of DAC and MAC. (Instead of just focussing too
> hard on SELinux).

Definitely.  Most of what I've been thinking about the past day or so
has been how to properly handle some of the DAC/capability issues; I
have yet to start playing with the code, but for the most part I think
the MAC/SELinux bits are already working properly.

> My understanding is that in current model, some of the overlayfs
> operations require priviliges. So mounter is supposed to be priviliged
> and does the operation on underlying layers.
>
> Now in this new model, there will be two levels of check. Both overlay
> level and underlying layer checks will happen in the context of task
> which is doing the operation. So first of all, all tasks will need
> to have enough priviliges to be able to perform various operations
> on lower layer.
>
> If we do checks at both the levels in with the creds of calling task,
> I guess that probably is fine. (But will require a closer code inspection
> to make sure there is no privilege escalation both for mounter as well
> calling task).

I have thoughts on this, but I don't think I'm yet in a position to
debate this in depth just yet; I still need to finish poking around
the code and playing with a few things :)

It may take some time before I'm back with patches, but I appreciate
all of the tips and insight - thank you!
Johannes Segitz March 22, 2023, 7:28 a.m. UTC | #18
On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > Agreed. After going through the patch set, I was wondering what's the
> > overall security model and how to visualize that.
> >
> > So probably there needs to be a documentation patch which explains
> > what's the new security model and how does it work.
> 
> Yes, of course.  I'll be sure to add a section to the existing docs.
> 
> > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > hard on SELinux).
> 
> Definitely.  Most of what I've been thinking about the past day or so
> has been how to properly handle some of the DAC/capability issues; I
> have yet to start playing with the code, but for the most part I think
> the MAC/SELinux bits are already working properly.
> 
> > My understanding is that in current model, some of the overlayfs
> > operations require priviliges. So mounter is supposed to be priviliged
> > and does the operation on underlying layers.
> >
> > Now in this new model, there will be two levels of check. Both overlay
> > level and underlying layer checks will happen in the context of task
> > which is doing the operation. So first of all, all tasks will need
> > to have enough priviliges to be able to perform various operations
> > on lower layer.
> >
> > If we do checks at both the levels in with the creds of calling task,
> > I guess that probably is fine. (But will require a closer code inspection
> > to make sure there is no privilege escalation both for mounter as well
> > calling task).
> 
> I have thoughts on this, but I don't think I'm yet in a position to
> debate this in depth just yet; I still need to finish poking around
> the code and playing with a few things :)
> 
> It may take some time before I'm back with patches, but I appreciate
> all of the tips and insight - thank you!

Let me resurrect this discussion. With 
https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
the Fedora policy changed kernel_t to a confined domain. This means that
many overlayfs setups that are created in initrd will now run into issues,
as it will have kernel_t as part of the saved credentials. So while the
original use case that inspired the patch set was probably not very common
that now changed.

It's tricky to work around this. Loading a policy in initrd causes a lot of
issues now that kernel_t isn't unconfined anymore. Once the policy is
loaded by systemd changing the mounts is tough since we use it for /etc and
at this time systemd already has open file handles for policy files in
/etc.

Johannes
Amir Goldstein March 22, 2023, 12:48 p.m. UTC | #19
On Wed, Mar 22, 2023 at 9:28 AM Johannes Segitz <jsegitz@suse.com> wrote:
>
> On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Agreed. After going through the patch set, I was wondering what's the
> > > overall security model and how to visualize that.
> > >
> > > So probably there needs to be a documentation patch which explains
> > > what's the new security model and how does it work.
> >
> > Yes, of course.  I'll be sure to add a section to the existing docs.
> >
> > > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > > hard on SELinux).
> >
> > Definitely.  Most of what I've been thinking about the past day or so
> > has been how to properly handle some of the DAC/capability issues; I
> > have yet to start playing with the code, but for the most part I think
> > the MAC/SELinux bits are already working properly.
> >
> > > My understanding is that in current model, some of the overlayfs
> > > operations require priviliges. So mounter is supposed to be priviliged
> > > and does the operation on underlying layers.
> > >
> > > Now in this new model, there will be two levels of check. Both overlay
> > > level and underlying layer checks will happen in the context of task
> > > which is doing the operation. So first of all, all tasks will need
> > > to have enough priviliges to be able to perform various operations
> > > on lower layer.
> > >
> > > If we do checks at both the levels in with the creds of calling task,
> > > I guess that probably is fine. (But will require a closer code inspection
> > > to make sure there is no privilege escalation both for mounter as well
> > > calling task).
> >
> > I have thoughts on this, but I don't think I'm yet in a position to
> > debate this in depth just yet; I still need to finish poking around
> > the code and playing with a few things :)
> >
> > It may take some time before I'm back with patches, but I appreciate
> > all of the tips and insight - thank you!
>
> Let me resurrect this discussion. With
> https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
> the Fedora policy changed kernel_t to a confined domain. This means that
> many overlayfs setups that are created in initrd will now run into issues,
> as it will have kernel_t as part of the saved credentials. So while the
> original use case that inspired the patch set was probably not very common
> that now changed.

I don't remember anyone rejecting the patches on the account that
the Android use case is not important. It was never the issue.

>
> It's tricky to work around this. Loading a policy in initrd causes a lot of
> issues now that kernel_t isn't unconfined anymore. Once the policy is
> loaded by systemd changing the mounts is tough since we use it for /etc and
> at this time systemd already has open file handles for policy files in
> /etc.
>

I've already explained several times on this thread what needs to be
done in order to move forward - express the security model and
explain why it is safe.

If the security guys are going to be in LSS in Vancouver, perhaps
we can have a meetup with overlayfs developers on the overlap
day with LSFMM (May 10) to try and figure out a path forward.

Thanks,
Amir.
Paul Moore March 22, 2023, 2:05 p.m. UTC | #20
On Wed, Mar 22, 2023 at 3:28 AM Johannes Segitz <jsegitz@suse.com> wrote:
> On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote:
> > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Agreed. After going through the patch set, I was wondering what's the
> > > overall security model and how to visualize that.
> > >
> > > So probably there needs to be a documentation patch which explains
> > > what's the new security model and how does it work.
> >
> > Yes, of course.  I'll be sure to add a section to the existing docs.
> >
> > > Also think both in terms of DAC and MAC. (Instead of just focussing too
> > > hard on SELinux).
> >
> > Definitely.  Most of what I've been thinking about the past day or so
> > has been how to properly handle some of the DAC/capability issues; I
> > have yet to start playing with the code, but for the most part I think
> > the MAC/SELinux bits are already working properly.
> >
> > > My understanding is that in current model, some of the overlayfs
> > > operations require priviliges. So mounter is supposed to be priviliged
> > > and does the operation on underlying layers.
> > >
> > > Now in this new model, there will be two levels of check. Both overlay
> > > level and underlying layer checks will happen in the context of task
> > > which is doing the operation. So first of all, all tasks will need
> > > to have enough priviliges to be able to perform various operations
> > > on lower layer.
> > >
> > > If we do checks at both the levels in with the creds of calling task,
> > > I guess that probably is fine. (But will require a closer code inspection
> > > to make sure there is no privilege escalation both for mounter as well
> > > calling task).
> >
> > I have thoughts on this, but I don't think I'm yet in a position to
> > debate this in depth just yet; I still need to finish poking around
> > the code and playing with a few things :)
> >
> > It may take some time before I'm back with patches, but I appreciate
> > all of the tips and insight - thank you!
>
> Let me resurrect this discussion. With
> https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792
> the Fedora policy changed kernel_t to a confined domain. This means that
> many overlayfs setups that are created in initrd will now run into issues,
> as it will have kernel_t as part of the saved credentials.

Regardless of any overlayfs cred work, it seems like it would also be
worth spending some time to see if the kernel_t mounter creds
situation can also be improved.  I'm guessing this is due to mounts
happening before the SELinux policy is loaded?  Has anyone looked into
mounting the SELinux policy even earlier in these cases (may not be
possible) and/or umount/mount/remounting the affected overlayfs-based
filesystems after the policy has been loaded?

I can't say I'm the best person to comment on how the Fedora SELinux
policy is structured, but I do know a *little* about SELinux and I
think that accepting kernel_t as an overlayfs mounter cred is a
mistake.

> So while the
> original use case that inspired the patch set was probably not very common
> that now changed.
>
> It's tricky to work around this. Loading a policy in initrd causes a lot of
> issues now that kernel_t isn't unconfined anymore. Once the policy is
> loaded by systemd changing the mounts is tough since we use it for /etc and
> at this time systemd already has open file handles for policy files in
> /etc.

It's been a while since I worked on this, but I pretty much had to
give up on the read-write case, the overlayfs copy-up/work-dir
approach made this impractical, or at least I couldn't think of a sane
way to handle this without some sort of credential override.  However,
I did have a quick-and-dirty prototype that appeared to work well in
the read-only/no-work-dir case; I think I still have it in a
development branch somewhere, I can dig it back up and get it ported
to a modern kernel if there is any interest.

However, when discussing the prototype with Christian Brauner off-list
(added to the CC line) he still objected to the no-cred-override
approach and said it wasn't something he could support, so I dropped
it and focused on the other piles of fire lying about my desk (my
apologies to Christian if I'm mis-remembering/understanding the
conversation).  I still think there is value in supporting a
no-creds-override option, and if there is basic support for getting
this upstream I'm happy to pick the work back up, but I can't invest a
lot more time in this if there isn't an agreement from the
overlayfs/VFS maintainers that this is something that would consider.

--
paul-moore.com
Paul Moore March 22, 2023, 2:07 p.m. UTC | #21
On Wed, Mar 22, 2023 at 8:48 AM Amir Goldstein <amir73il@gmail.com> wrote:
> If the security guys are going to be in LSS in Vancouver, perhaps
> we can have a meetup with overlayfs developers on the overlap
> day with LSFMM (May 10) to try and figure out a path forward.

At the very least I currently plan to be at LSS-NA in Vancouver and
would be very happy to discuss this with anyone who wants to talk
about it.  I'm also happy to continue to conversation here too, for
the sake of those who might not be able to travel to LSS-NA and/or
LSFMM.