Message ID | 20211117015806.2192263-1-dvander@google.com (mailing list archive) |
---|---|
Headers | show |
Series | overlayfs override_creds=off & nested get xattr fix | expand |
On 11/16/2021 5:58 PM, David Anderson wrote: > Mark Salyzyn (3): > Add flags option to get xattr method paired to __vfs_getxattr > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > overlayfs: override_creds=off option bypass creator_cred > > Mark Salyzyn + John Stultz (1): > overlayfs: inode_owner_or_capable called during execv > > The first three patches address fundamental security issues that should > be solved regardless of the override_creds=off feature. > > The fourth adds the feature depends on these other fixes. > > By default, all access to the upper, lower and work directories is the > recorded mounter's MAC and DAC credentials. The incoming accesses are > checked against the caller's credentials. This isn't very clear. Are you saying that the security attributes of the upper, lower, and work directories are determined by the attributes of the process that mounted the filesystem? What is an "incoming access"? I'm sure that means something if you're steeped in the lore of overlayfs, but it isn't obvious to me. > If the principles of least privilege are applied for sepolicy, the > mounter's credentials might not overlap the credentials of the caller's > when accessing the overlayfs filesystem. I'm sorry, but I've tried pretty hard, and can't puzzle that one out. > For example, a file that a > lower DAC privileged caller can execute, is MAC denied to the > generally higher DAC privileged mounter, to prevent an attack vector. DAC privileges are not hierarchical. This doesn't make any sense. > We add the option to turn off override_creds in the mount options; all > subsequent operations after mount on the filesystem will be only the > caller's credentials. I think I might have figured that one out, but in order to do so I have to make way too many assumptions about the earlier paragraph. Could you please try to explain what you're doing with more context? > The module boolean parameter and mount option > override_creds is also added as a presence check for this "feature", > existence of /sys/module/overlay/parameters/overlay_creds > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > Signed-off-by: David Anderson <dvander@google.com> > Cc: Miklos Szeredi <miklos@szeredi.hu> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Amir Goldstein <amir73il@gmail.com> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Stephen Smalley <sds@tycho.nsa.gov> > Cc: John Stultz <john.stultz@linaro.org> > Cc: linux-doc@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-unionfs@vger.kernel.org > Cc: linux-security-module@vger.kernel.org > Cc: kernel-team@android.com > Cc: selinux@vger.kernel.org > Cc: paulmoore@microsoft.com > Cc: Luca.Boccassi@microsoft.com > > --- > > v19 > - rebase. > > v18 > - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr > > v17 > - correct some zero-day build failures. > - fix up documentation > > v16 > - rebase and merge of two patches. > - add adjustment to deal with execv when overrides is off. > > v15 > - Revert back to v4 with fixes from on the way from v5-v14. The single > structure argument passing to address the complaints about too many > arguments was rejected by the community. > - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH > check. Can address that independently. > - ToDo: upstream test frame for thes security fixes (currently testing > is all in Android). > > v14: > - Rejoin, rebase and a few adjustments. > > v13: > - Pull out first patch and try to get it in alone feedback, some > Acks, and then <crickets> because people forgot why we were doing i. > > v12: > - Restore squished out patch 2 and 3 in the series, > then change algorithm to add flags argument. > Per-thread flag is a large security surface. > > v11: > - Squish out v10 introduced patch 2 and 3 in the series, > then and use per-thread flag instead for nesting. > - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper. > - Add sb argument to ovl_revert_creds to match future work. > > v10: > - Return NULL on CAP_DAC_READ_SEARCH > - Add __get xattr method to solve sepolicy logging issue > - Drop unnecessary sys_admin sepolicy checking for administrative > driver internal xattr functions. > > v6: > - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS. > - Do better with the documentation, drop rationalizations. > - pr_warn message adjusted to report consequences. > > v5: > - beefed up the caveats in the Documentation > - Is dependent on > "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh" > "overlayfs: check CAP_MKNOD before issuing vfs_whiteout" > - Added prwarn when override_creds=off > > v4: > - spelling and grammar errors in text > > v3: > - Change name from caller_credentials / creator_credentials to the > boolean override_creds. > - Changed from creator to mounter credentials. > - Updated and fortified the documentation. > - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS > > v2: > - Forward port changed attr to stat, resulting in a build error. > - altered commit message. > > David Anderson (4): > Add flags option to get xattr method paired to __vfs_getxattr > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > overlayfs: override_creds=off option bypass creator_cred > overlayfs: inode_owner_or_capable called during execv > > Documentation/filesystems/locking.rst | 2 +- > Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++- > fs/9p/acl.c | 3 +- > fs/9p/xattr.c | 3 +- > fs/afs/xattr.c | 10 +++--- > fs/attr.c | 2 +- > fs/btrfs/xattr.c | 3 +- > fs/ceph/xattr.c | 3 +- > fs/cifs/xattr.c | 2 +- > fs/ecryptfs/inode.c | 6 ++-- > fs/ecryptfs/mmap.c | 5 +-- > fs/erofs/xattr.c | 3 +- > fs/ext2/xattr_security.c | 2 +- > fs/ext2/xattr_trusted.c | 2 +- > fs/ext2/xattr_user.c | 2 +- > fs/ext4/xattr_hurd.c | 2 +- > fs/ext4/xattr_security.c | 2 +- > fs/ext4/xattr_trusted.c | 2 +- > fs/ext4/xattr_user.c | 2 +- > fs/f2fs/xattr.c | 4 +-- > fs/fuse/xattr.c | 4 +-- > fs/gfs2/xattr.c | 3 +- > fs/hfs/attr.c | 2 +- > fs/hfsplus/xattr.c | 3 +- > fs/hfsplus/xattr_security.c | 3 +- > fs/hfsplus/xattr_trusted.c | 3 +- > fs/hfsplus/xattr_user.c | 3 +- > fs/inode.c | 7 +++-- > fs/internal.h | 3 +- > fs/jffs2/security.c | 3 +- > fs/jffs2/xattr_trusted.c | 3 +- > fs/jffs2/xattr_user.c | 3 +- > fs/jfs/xattr.c | 5 +-- > fs/kernfs/inode.c | 3 +- > fs/nfs/nfs4proc.c | 9 ++++-- > fs/ntfs3/xattr.c | 2 +- > fs/ocfs2/xattr.c | 9 ++++-- > fs/open.c | 2 +- > fs/orangefs/xattr.c | 3 +- > fs/overlayfs/copy_up.c | 2 +- > fs/overlayfs/dir.c | 17 +++++----- > fs/overlayfs/file.c | 25 ++++++++------- > fs/overlayfs/inode.c | 29 ++++++++--------- > fs/overlayfs/namei.c | 6 ++-- > fs/overlayfs/overlayfs.h | 7 +++-- > fs/overlayfs/ovl_entry.h | 1 + > fs/overlayfs/readdir.c | 8 ++--- > fs/overlayfs/super.c | 34 ++++++++++++++++---- > fs/overlayfs/util.c | 13 ++++++-- > fs/posix_acl.c | 2 +- > fs/reiserfs/xattr_security.c | 3 +- > fs/reiserfs/xattr_trusted.c | 3 +- > fs/reiserfs/xattr_user.c | 3 +- > fs/squashfs/xattr.c | 2 +- > fs/ubifs/xattr.c | 3 +- > fs/xattr.c | 42 +++++++++++++------------ > fs/xfs/xfs_xattr.c | 3 +- > include/linux/lsm_hook_defs.h | 3 +- > include/linux/security.h | 6 ++-- > include/linux/xattr.h | 6 ++-- > include/uapi/linux/xattr.h | 7 +++-- > mm/shmem.c | 3 +- > net/socket.c | 3 +- > security/commoncap.c | 11 ++++--- > security/integrity/evm/evm_main.c | 13 +++++--- > security/security.c | 5 +-- > security/selinux/hooks.c | 19 ++++++----- > security/smack/smack_lsm.c | 18 ++++++----- > 68 files changed, 289 insertions(+), 167 deletions(-) >
On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote: > > Mark Salyzyn (3): > Add flags option to get xattr method paired to __vfs_getxattr > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > overlayfs: override_creds=off option bypass creator_cred > > Mark Salyzyn + John Stultz (1): > overlayfs: inode_owner_or_capable called during execv > > The first three patches address fundamental security issues that should > be solved regardless of the override_creds=off feature. > > The fourth adds the feature depends on these other fixes. > > By default, all access to the upper, lower and work directories is the > recorded mounter's MAC and DAC credentials. The incoming accesses are > checked against the caller's credentials. > > If the principles of least privilege are applied for sepolicy, the > mounter's credentials might not overlap the credentials of the caller's > when accessing the overlayfs filesystem. For example, a file that a > lower DAC privileged caller can execute, is MAC denied to the > generally higher DAC privileged mounter, to prevent an attack vector. > > We add the option to turn off override_creds in the mount options; all > subsequent operations after mount on the filesystem will be only the > caller's credentials. The module boolean parameter and mount option > override_creds is also added as a presence check for this "feature", > existence of /sys/module/overlay/parameters/overlay_creds > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > Signed-off-by: David Anderson <dvander@google.com> > Cc: Miklos Szeredi <miklos@szeredi.hu> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Amir Goldstein <amir73il@gmail.com> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Stephen Smalley <sds@tycho.nsa.gov> > Cc: John Stultz <john.stultz@linaro.org> > Cc: linux-doc@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-unionfs@vger.kernel.org > Cc: linux-security-module@vger.kernel.org > Cc: kernel-team@android.com > Cc: selinux@vger.kernel.org > Cc: paulmoore@microsoft.com > Cc: Luca.Boccassi@microsoft.com > > --- > > v19 > - rebase. > Hi David, I see that the patch set has changed hands (presumably to Android upstreaming team), but you just rebased v18 without addressing the maintainers concerns [1]. Specifically, the patch 2/4 is very wrong for unprivileged mount and I think that the very noisy patch 1/4 could be completely avoided: Can't you use -o userxattr mount option for Android use case and limit the manipulation of user.ovrelay.* xattr based on sepolicy for actors that are allowed to make changes in overlayfs mount? or not limit at all? The access to those xattr is forbidden via "incoming" xattr ops on overlay inodes. Also, IMO, the Documentation section about "Non overlapping credentials" does not hold the same standards as the section about "Permission model" - it does not state the requirements clear enough for my non-security-oriented brain to be able to understand if the "Ignore mounter's credentials" model can be exploited. Can an unprivileged user create an overlay over a directory that they have access to and redirect an innocent looking file name to an underlying file that said the mounting user has no access to and by doing that, tricking a privileged user to modify the innocent looking file on the mounter's behalf? Of course this could be avoided by forbidding unprivileged mount with override_creds=off, but there could be other scenarios, so a clear model would help to understand the risks. For example: If user 1 was able to read in lower dir A, now the content of overlay dir A is cached and user 2, that has permissions to read upper dir A and does not have read permissions on lower dir A will see the content of lower dir A. I think that the core problem with the approach is using Non-uniform credentials to access underlying layers. I don't see a simple way around a big audit that checks all those cases, but maybe I'm missing some quick shortcut or maybe your use case can add some restrictions about the users that could access this overlay that would simplify the generic problem. Thanks, Amir. [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 11/16/2021 5:58 PM, David Anderson wrote: > > Mark Salyzyn (3): > > > > By default, all access to the upper, lower and work directories is the > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > checked against the caller's credentials. > > This isn't very clear. Are you saying that the security attributes > of the upper, lower, and work directories are determined by the > attributes of the process that mounted the filesystem? What is an > "incoming access"? I'm sure that means something if you're steeped > in the lore of overlayfs, but it isn't obvious to me. (Sorry, hitting "Reply All" this time...) Thanks for taking a look - Yes. An "incoming access" is the user application security context accessing the filesystem. > > If the principles of least privilege are applied for sepolicy, the > > mounter's credentials might not overlap the credentials of the caller's > > when accessing the overlayfs filesystem. > > I'm sorry, but I've tried pretty hard, and can't puzzle that one out. If your sepolicy is designed to give processes minimal privileges (as ours is), then "init" might lack privileges even though other processes have them. For example, init can mount /x but not access /x/y/z. But, process XYZ can access /x/y/z. In our system processes have no privileges to anything by default, and permissions are granted as needed, as narrowly as possible. > DAC privileges are not hierarchical. This doesn't make any sense. Sorry, that was probably not the right word. The intent was to say that a process with minimal DAC privileges might be able to access a file, but a process with expansive DAC privileges might be denied access to the same file due to MAC restrictions. > I think I might have figured that one out, but in order to do so > I have to make way too many assumptions about the earlier paragraph. > Could you please try to explain what you're doing with more context? Hopefully the above helps explain: overlayfs uses the mounter's privileges, which does not work on a system where the mounter does not have a superset of child processes' privileges. That's the crux of the issue and I'll keep working on how it's communicated in the patch description. -David On Tue, Nov 16, 2021 at 6:18 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 11/16/2021 5:58 PM, David Anderson wrote: > > Mark Salyzyn (3): > > Add flags option to get xattr method paired to __vfs_getxattr > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > overlayfs: override_creds=off option bypass creator_cred > > > > Mark Salyzyn + John Stultz (1): > > overlayfs: inode_owner_or_capable called during execv > > > > The first three patches address fundamental security issues that should > > be solved regardless of the override_creds=off feature. > > > > The fourth adds the feature depends on these other fixes. > > > > By default, all access to the upper, lower and work directories is the > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > checked against the caller's credentials. > > This isn't very clear. Are you saying that the security attributes > of the upper, lower, and work directories are determined by the > attributes of the process that mounted the filesystem? What is an > "incoming access"? I'm sure that means something if you're steeped > in the lore of overlayfs, but it isn't obvious to me. > > > If the principles of least privilege are applied for sepolicy, the > > mounter's credentials might not overlap the credentials of the caller's > > when accessing the overlayfs filesystem. > > I'm sorry, but I've tried pretty hard, and can't puzzle that one out. > > > For example, a file that a > > lower DAC privileged caller can execute, is MAC denied to the > > generally higher DAC privileged mounter, to prevent an attack vector. > > DAC privileges are not hierarchical. This doesn't make any sense. > > > We add the option to turn off override_creds in the mount options; all > > subsequent operations after mount on the filesystem will be only the > > caller's credentials. > > I think I might have figured that one out, but in order to do so > I have to make way too many assumptions about the earlier paragraph. > Could you please try to explain what you're doing with more context? > > > The module boolean parameter and mount option > > override_creds is also added as a presence check for this "feature", > > existence of /sys/module/overlay/parameters/overlay_creds > > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > > Signed-off-by: David Anderson <dvander@google.com> > > Cc: Miklos Szeredi <miklos@szeredi.hu> > > Cc: Jonathan Corbet <corbet@lwn.net> > > Cc: Vivek Goyal <vgoyal@redhat.com> > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > Cc: Amir Goldstein <amir73il@gmail.com> > > Cc: Randy Dunlap <rdunlap@infradead.org> > > Cc: Stephen Smalley <sds@tycho.nsa.gov> > > Cc: John Stultz <john.stultz@linaro.org> > > Cc: linux-doc@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: linux-fsdevel@vger.kernel.org > > Cc: linux-unionfs@vger.kernel.org > > Cc: linux-security-module@vger.kernel.org > > Cc: kernel-team@android.com > > Cc: selinux@vger.kernel.org > > Cc: paulmoore@microsoft.com > > Cc: Luca.Boccassi@microsoft.com > > > > --- > > > > v19 > > - rebase. > > > > v18 > > - rebase + fix minor cut and paste error for inode argument in __vfs_getxattr > > > > v17 > > - correct some zero-day build failures. > > - fix up documentation > > > > v16 > > - rebase and merge of two patches. > > - add adjustment to deal with execv when overrides is off. > > > > v15 > > - Revert back to v4 with fixes from on the way from v5-v14. The single > > structure argument passing to address the complaints about too many > > arguments was rejected by the community. > > - Drop the udner discussion fix for an additional CAP_DAC_READ_SEARCH > > check. Can address that independently. > > - ToDo: upstream test frame for thes security fixes (currently testing > > is all in Android). > > > > v14: > > - Rejoin, rebase and a few adjustments. > > > > v13: > > - Pull out first patch and try to get it in alone feedback, some > > Acks, and then <crickets> because people forgot why we were doing i. > > > > v12: > > - Restore squished out patch 2 and 3 in the series, > > then change algorithm to add flags argument. > > Per-thread flag is a large security surface. > > > > v11: > > - Squish out v10 introduced patch 2 and 3 in the series, > > then and use per-thread flag instead for nesting. > > - Switch name to ovl_do_vds_getxattr for __vds_getxattr wrapper. > > - Add sb argument to ovl_revert_creds to match future work. > > > > v10: > > - Return NULL on CAP_DAC_READ_SEARCH > > - Add __get xattr method to solve sepolicy logging issue > > - Drop unnecessary sys_admin sepolicy checking for administrative > > driver internal xattr functions. > > > > v6: > > - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS. > > - Do better with the documentation, drop rationalizations. > > - pr_warn message adjusted to report consequences. > > > > v5: > > - beefed up the caveats in the Documentation > > - Is dependent on > > "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh" > > "overlayfs: check CAP_MKNOD before issuing vfs_whiteout" > > - Added prwarn when override_creds=off > > > > v4: > > - spelling and grammar errors in text > > > > v3: > > - Change name from caller_credentials / creator_credentials to the > > boolean override_creds. > > - Changed from creator to mounter credentials. > > - Updated and fortified the documentation. > > - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS > > > > v2: > > - Forward port changed attr to stat, resulting in a build error. > > - altered commit message. > > > > David Anderson (4): > > Add flags option to get xattr method paired to __vfs_getxattr > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > overlayfs: override_creds=off option bypass creator_cred > > overlayfs: inode_owner_or_capable called during execv > > > > Documentation/filesystems/locking.rst | 2 +- > > Documentation/filesystems/overlayfs.rst | 26 ++++++++++++++- > > fs/9p/acl.c | 3 +- > > fs/9p/xattr.c | 3 +- > > fs/afs/xattr.c | 10 +++--- > > fs/attr.c | 2 +- > > fs/btrfs/xattr.c | 3 +- > > fs/ceph/xattr.c | 3 +- > > fs/cifs/xattr.c | 2 +- > > fs/ecryptfs/inode.c | 6 ++-- > > fs/ecryptfs/mmap.c | 5 +-- > > fs/erofs/xattr.c | 3 +- > > fs/ext2/xattr_security.c | 2 +- > > fs/ext2/xattr_trusted.c | 2 +- > > fs/ext2/xattr_user.c | 2 +- > > fs/ext4/xattr_hurd.c | 2 +- > > fs/ext4/xattr_security.c | 2 +- > > fs/ext4/xattr_trusted.c | 2 +- > > fs/ext4/xattr_user.c | 2 +- > > fs/f2fs/xattr.c | 4 +-- > > fs/fuse/xattr.c | 4 +-- > > fs/gfs2/xattr.c | 3 +- > > fs/hfs/attr.c | 2 +- > > fs/hfsplus/xattr.c | 3 +- > > fs/hfsplus/xattr_security.c | 3 +- > > fs/hfsplus/xattr_trusted.c | 3 +- > > fs/hfsplus/xattr_user.c | 3 +- > > fs/inode.c | 7 +++-- > > fs/internal.h | 3 +- > > fs/jffs2/security.c | 3 +- > > fs/jffs2/xattr_trusted.c | 3 +- > > fs/jffs2/xattr_user.c | 3 +- > > fs/jfs/xattr.c | 5 +-- > > fs/kernfs/inode.c | 3 +- > > fs/nfs/nfs4proc.c | 9 ++++-- > > fs/ntfs3/xattr.c | 2 +- > > fs/ocfs2/xattr.c | 9 ++++-- > > fs/open.c | 2 +- > > fs/orangefs/xattr.c | 3 +- > > fs/overlayfs/copy_up.c | 2 +- > > fs/overlayfs/dir.c | 17 +++++----- > > fs/overlayfs/file.c | 25 ++++++++------- > > fs/overlayfs/inode.c | 29 ++++++++--------- > > fs/overlayfs/namei.c | 6 ++-- > > fs/overlayfs/overlayfs.h | 7 +++-- > > fs/overlayfs/ovl_entry.h | 1 + > > fs/overlayfs/readdir.c | 8 ++--- > > fs/overlayfs/super.c | 34 ++++++++++++++++---- > > fs/overlayfs/util.c | 13 ++++++-- > > fs/posix_acl.c | 2 +- > > fs/reiserfs/xattr_security.c | 3 +- > > fs/reiserfs/xattr_trusted.c | 3 +- > > fs/reiserfs/xattr_user.c | 3 +- > > fs/squashfs/xattr.c | 2 +- > > fs/ubifs/xattr.c | 3 +- > > fs/xattr.c | 42 +++++++++++++------------ > > fs/xfs/xfs_xattr.c | 3 +- > > include/linux/lsm_hook_defs.h | 3 +- > > include/linux/security.h | 6 ++-- > > include/linux/xattr.h | 6 ++-- > > include/uapi/linux/xattr.h | 7 +++-- > > mm/shmem.c | 3 +- > > net/socket.c | 3 +- > > security/commoncap.c | 11 ++++--- > > security/integrity/evm/evm_main.c | 13 +++++--- > > security/security.c | 5 +-- > > security/selinux/hooks.c | 19 ++++++----- > > security/smack/smack_lsm.c | 18 ++++++----- > > 68 files changed, 289 insertions(+), 167 deletions(-) > >
On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote: > Hi David, > > I see that the patch set has changed hands (presumably to Android upstreaming > team), but you just rebased v18 without addressing the maintainers concerns [1]. Indeed I'm carrying this forward as Mark is no longer working on it. My apologies for missing those comments! > Specifically, the patch 2/4 is very wrong for unprivileged mount and > I think that the very noisy patch 1/4 could be completely avoided: > Can't you use -o userxattr mount option for Android use case and limit > the manipulation of user.ovrelay.* xattr based on sepolicy for actors > that are allowed > to make changes in overlayfs mount? or not limit at all? > The access to those xattr is forbidden via "incoming" xattr ops on > overlay inodes. Can you clarify a bit more? The patch is definitely super noisy and I'd love to have a better solution. The problem it's trying to solve is: 1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper. 2. Kernel-privileged init mounts /blah with overlayfs using the above dirs. 2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins. 3. Kernel-privileged init tries to execute /blah/init to initiate a domain transition. 4. exec() fails because the overlayfs mounter creds (kernel domain) does not have getxattr permission to /blah/init. Eg, we're hitting this problem without even making changes to the mount, and without anything being written to /mnt/blah-upper. > Can an unprivileged user create an overlay over a directory that they have > access to and redirect an innocent looking file name to an underlying file that > said the mounting user has no access to and by doing that, tricking a privileged > user to modify the innocent looking file on the mounter's behalf? > Of course this could be avoided by forbidding unprivileged mount with > override_creds=off, but there could be other scenarios, so a clear model > would help to understand the risks. > > For example: > If user 1 was able to read in lower dir A, now the content of overlay dir A > is cached and user 2, that has permissions to read upper dir A and does > not have read permissions on lower dir A will see the content of lower dir A. I'll need to think about this more and test to verify. It's not a scenario that would come up in our use case (both dirs effectively have the same permissions). If the answer is "yes, that can happen" - do you see this as a problem of clarifying the model, or a problem of fixing that loophole? >> I think that the core problem with the approach is using Non-uniform > credentials to access underlying layers. I don't see a simple way around > a big audit that checks all those cases, but maybe I'm missing some quick > shortcut or maybe your use case can add some restrictions about the > users that could access this overlay that would simplify the generic problem. In a security model like ours, I think there's no way around it, that we really need accesses to be from the caller's credentials and not the mounter's. It's even worse than earlier iterations of this patch perhaps let on: we mount before sepolicy is loaded (so we can overlay the policy itself), and thus the mounter's creds are effectively "the kernel". This domain is highly restricted in our sepolicy for obvious reasons. There's no way our security team will let us unrestrict it. Best, -David > > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/
On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote: > > On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Hi David, > > > > I see that the patch set has changed hands (presumably to Android upstreaming > > team), but you just rebased v18 without addressing the maintainers concerns [1]. > > Indeed I'm carrying this forward as Mark is no longer working on it. > My apologies for > missing those comments! > > > Specifically, the patch 2/4 is very wrong for unprivileged mount and > > I think that the very noisy patch 1/4 could be completely avoided: > > Can't you use -o userxattr mount option for Android use case and limit > > the manipulation of user.ovrelay.* xattr based on sepolicy for actors > > that are allowed > > to make changes in overlayfs mount? or not limit at all? > > The access to those xattr is forbidden via "incoming" xattr ops on > > overlay inodes. > > Can you clarify a bit more? The patch is definitely super noisy and I'd love > to have a better solution. The problem it's trying to solve is: > 1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper. > 2. Kernel-privileged init mounts /blah with overlayfs using the above dirs. > 2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins. > 3. Kernel-privileged init tries to execute /blah/init to initiate a > domain transition. > 4. exec() fails because the overlayfs mounter creds (kernel domain) does > not have getxattr permission to /blah/init. > > Eg, we're hitting this problem without even making changes to the mount, and > without anything being written to /mnt/blah-upper. > So what is your solution? Remove the security check from overlayfs setting xattr? How does that work for the person who composed the security policy? You will need to grant some basic privileges to the mounter. If you do not want to grant the mounter privileges to set trusted.overlay xattr you may use mount option -o userxattr and grant it permissions to set user.overlay xattrs. > > Can an unprivileged user create an overlay over a directory that they have > > access to and redirect an innocent looking file name to an underlying file that > > said the mounting user has no access to and by doing that, tricking a privileged > > user to modify the innocent looking file on the mounter's behalf? > > Of course this could be avoided by forbidding unprivileged mount with > > override_creds=off, but there could be other scenarios, so a clear model > > would help to understand the risks. > > > > For example: > > If user 1 was able to read in lower dir A, now the content of overlay dir A > > is cached and user 2, that has permissions to read upper dir A and does > > not have read permissions on lower dir A will see the content of lower dir A. > > I'll need to think about this more and test to verify. It's not a scenario that > would come up in our use case (both dirs effectively have the same permissions). > Your argument is taking the wrong POV. The reason that previous posts of this patch set have been rejected is not because it doesn't work for your use case. It is because other players can exploit the feature to bypass security policies, so the feature cannot be merged as is. > If the answer is "yes, that can happen" - do you see this as a problem of > clarifying the model, or a problem of fixing that loophole? > It is something that is not at all easy to fix. In the example above, instead of checking permissions against the overlay inode (on "incoming" readdir) will need to check permissions of every accessing user against all layers, before allowing access to the merged directory content (which is cached). A lot more work - and this is just for this one example. > >> I think that the core problem with the approach is using Non-uniform > > credentials to access underlying layers. I don't see a simple way around > > a big audit that checks all those cases, but maybe I'm missing some quick > > shortcut or maybe your use case can add some restrictions about the > > users that could access this overlay that would simplify the generic problem. > > In a security model like ours, I think there's no way around it, that > we really need > accesses to be from the caller's credentials and not the mounter's. It's even > worse than earlier iterations of this patch perhaps let on: we mount > before sepolicy > is loaded (so we can overlay the policy itself), and thus the > mounter's creds are > effectively "the kernel". This domain is highly restricted in our > sepolicy for obvious > reasons. There's no way our security team will let us unrestrict it. > Not sure what that means or what I can do with this information. If I had a simple suggestion on how to solve your problem I would have suggested it, but I cannot think of any right now. The best I can do is to try to make you understand the problems that your patch causes to others, so you can figure out a way that meets your goals without breaking other use cases. Thanks, Amir.
On Thu, Nov 18, 2021 at 2:20 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Nov 18, 2021 at 11:53 AM David Anderson <dvander@google.com> wrote: > > > > On Tue, Nov 16, 2021 at 11:36 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > Hi David, > > > > > > I see that the patch set has changed hands (presumably to Android upstreaming > > > team), but you just rebased v18 without addressing the maintainers concerns [1]. > > > > Indeed I'm carrying this forward as Mark is no longer working on it. > > My apologies for > > missing those comments! > > > > > Specifically, the patch 2/4 is very wrong for unprivileged mount and > > > I think that the very noisy patch 1/4 could be completely avoided: > > > Can't you use -o userxattr mount option for Android use case and limit > > > the manipulation of user.ovrelay.* xattr based on sepolicy for actors > > > that are allowed > > > to make changes in overlayfs mount? or not limit at all? > > > The access to those xattr is forbidden via "incoming" xattr ops on > > > overlay inodes. > > > > Can you clarify a bit more? The patch is definitely super noisy and I'd love > > to have a better solution. The problem it's trying to solve is: > > 1. Kernel-privileged init mounts /mnt/blah-lower and /mnt/blah-upper. > > 2. Kernel-privileged init mounts /blah with overlayfs using the above dirs. > > 2. Kernel-privileged init loads sepolicy off /blah/policy. Enforcing begins. > > 3. Kernel-privileged init tries to execute /blah/init to initiate a > > domain transition. > > 4. exec() fails because the overlayfs mounter creds (kernel domain) does > > not have getxattr permission to /blah/init. > > > > Eg, we're hitting this problem without even making changes to the mount, and > > without anything being written to /mnt/blah-upper. > > > > So what is your solution? > Remove the security check from overlayfs setting xattr? > How does that work for the person who composed the security policy? > You will need to grant some basic privileges to the mounter. > If you do not want to grant the mounter privileges to set trusted.overlay xattr > you may use mount option -o userxattr and grant it permissions to set > user.overlay xattrs. Thanks for the tips. I think it might be possible for us to work around this one in userspace, either by granting the mounter this one permission across all contexts, or with a much narrower kernel fix that doesn't require us keeping the big noisy patch in our tree. > > > Can an unprivileged user create an overlay over a directory that they have > > > access to and redirect an innocent looking file name to an underlying file that > > > said the mounting user has no access to and by doing that, tricking a privileged > > > user to modify the innocent looking file on the mounter's behalf? > > > Of course this could be avoided by forbidding unprivileged mount with > > > override_creds=off, but there could be other scenarios, so a clear model > > > would help to understand the risks. > > > > > > For example: > > > If user 1 was able to read in lower dir A, now the content of overlay dir A > > > is cached and user 2, that has permissions to read upper dir A and does > > > not have read permissions on lower dir A will see the content of lower dir A. > > > > I'll need to think about this more and test to verify. It's not a scenario that > > would come up in our use case (both dirs effectively have the same permissions). > > > > Your argument is taking the wrong POV. > The reason that previous posts of this patch set have been rejected > is not because it doesn't work for your use case. > It is because other players can exploit the feature to bypass security > policies, so the feature cannot be merged as is. Ack, understood. > > If the answer is "yes, that can happen" - do you see this as a problem of > > clarifying the model, or a problem of fixing that loophole? > > > > It is something that is not at all easy to fix. > In the example above, instead of checking permissions against the > overlay inode (on "incoming" readdir) will need to check permissions of every > accessing user against all layers, before allowing access to the merged > directory content (which is cached). > A lot more work - and this is just for this one example. I see your point. If we could implement that, behind a mount flag, would that be an acceptable solution? > > >> I think that the core problem with the approach is using Non-uniform > > > credentials to access underlying layers. I don't see a simple way around > > > a big audit that checks all those cases, but maybe I'm missing some quick > > > shortcut or maybe your use case can add some restrictions about the > > > users that could access this overlay that would simplify the generic problem. > > > > In a security model like ours, I think there's no way around it, that > > we really need > > accesses to be from the caller's credentials and not the mounter's. It's even > > worse than earlier iterations of this patch perhaps let on: we mount > > before sepolicy > > is loaded (so we can overlay the policy itself), and thus the > > mounter's creds are > > effectively "the kernel". This domain is highly restricted in our > > sepolicy for obvious > > reasons. There's no way our security team will let us unrestrict it. > > > > Not sure what that means or what I can do with this information. > If I had a simple suggestion on how to solve your problem I would have > suggested it, but I cannot think of any right now. > The best I can do is to try to make you understand the problems that your > patch causes to others, so you can figure out a way that meets your goals > without breaking other use cases. To help clarify, we do not have any processes which have access to everything. Almost every file in the system has a specific selabel and processes are only granted access to the labels they need. To give the mounter (init in kernel domain) access to every single label doesn't make sense. It only makes sense to have the access be "passthrough", making overlayfs as transparent as possible. Best, -David > > Thanks, > Amir.
> > It is something that is not at all easy to fix. > > In the example above, instead of checking permissions against the > > overlay inode (on "incoming" readdir) will need to check permissions of every > > accessing user against all layers, before allowing access to the merged > > directory content (which is cached). > > A lot more work - and this is just for this one example. > > I see your point. If we could implement that, behind a mount flag, would that be > an acceptable solution? > As I wrote, this is one specific problem that I identified. If you propose a different behavior base on mount flag you should be able to argue that is cannot be exploited to circumvent security access policies, by peaking into cached copies of objects that the user has no access to, or by any other way. I have no idea how to implement what you want and prove that it is safe. Maybe if you explained the use case in greater details with some examples someone could help you reach a possible solution. Thanks, Amir.
On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote: > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote: > > > > Mark Salyzyn (3): > > Add flags option to get xattr method paired to __vfs_getxattr > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > overlayfs: override_creds=off option bypass creator_cred > > > > Mark Salyzyn + John Stultz (1): > > overlayfs: inode_owner_or_capable called during execv > > > > The first three patches address fundamental security issues that should > > be solved regardless of the override_creds=off feature. > > > > The fourth adds the feature depends on these other fixes. > > > > By default, all access to the upper, lower and work directories is the > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > checked against the caller's credentials. > > > > If the principles of least privilege are applied for sepolicy, the > > mounter's credentials might not overlap the credentials of the caller's > > when accessing the overlayfs filesystem. For example, a file that a > > lower DAC privileged caller can execute, is MAC denied to the > > generally higher DAC privileged mounter, to prevent an attack vector. > > > > We add the option to turn off override_creds in the mount options; all > > subsequent operations after mount on the filesystem will be only the > > caller's credentials. The module boolean parameter and mount option > > override_creds is also added as a presence check for this "feature", > > existence of /sys/module/overlay/parameters/overlay_creds > > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > > Signed-off-by: David Anderson <dvander@google.com> > > Cc: Miklos Szeredi <miklos@szeredi.hu> > > Cc: Jonathan Corbet <corbet@lwn.net> > > Cc: Vivek Goyal <vgoyal@redhat.com> > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > Cc: Amir Goldstein <amir73il@gmail.com> > > Cc: Randy Dunlap <rdunlap@infradead.org> > > Cc: Stephen Smalley <sds@tycho.nsa.gov> > > Cc: John Stultz <john.stultz@linaro.org> > > Cc: linux-doc@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: linux-fsdevel@vger.kernel.org > > Cc: linux-unionfs@vger.kernel.org > > Cc: linux-security-module@vger.kernel.org > > Cc: kernel-team@android.com > > Cc: selinux@vger.kernel.org > > Cc: paulmoore@microsoft.com > > Cc: Luca.Boccassi@microsoft.com > > > > --- > > > > v19 > > - rebase. > > > > Hi David, > > I see that the patch set has changed hands (presumably to Android upstreaming > team), but you just rebased v18 without addressing the maintainers concerns [1]. > BTW, where is patch 1 of the series. I can't seem to find it. I think I was running into issues with getxattr() on underlying filesystem as well (if mounter did not have sufficient privileges) and tried to fix it. But did not find a good solution at that point of time. https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/ So basically when overlay inode is being initialized, code will try to query "security.selinux" xattr on underlying file to initialize selinux label on the overlay inode. For regular filesystems, they bypass the security check by calling __vfs_getxattr() when trying to initialize this selinux security label. But with layered filesystem, it still ends up calling vfs_getxattr() on underlying filesyste. Which means it checks for caller's creds and if caller is not priviliged enough, access will be denied. To solve this problem, looks like this patch set is passing a flag XATTR_NOSECUROTY so that permission checks are skipped in getxattr() path in underlying filesystem. As long as this information is not leaked to user space (and remains in overlayfs), it probably is fine? And if information is not going to user space, then it probably is fine for unprivileged overlayfs mounts as well? I see a comment from Miklos as well as you that it is not safe to do for unprivileged mounts. Can you help me understand why that's the case. > Specifically, the patch 2/4 is very wrong for unprivileged mount and Can you help me understand why it is wrong. (/me should spend more time reading the patch. But I am taking easy route of asking you. :-)). > I think that the very noisy patch 1/4 could be completely avoided: How can it completely avoided. If mounter is not privileged then vfs_getxattr() on underlying filesystem will fail. Or if override_creds=off, then caller might not be privileged enough to do getxattr() but we still should be able to initialize overlay inode security label. > Can't you use -o userxattr mount option user xattrs done't work for device nodes and symlinks. BTW, how will userxattr solve the problem completely. It can be used to store overlay specific xattrs but accessing security xattrs on underlying filesystem will still be a problem? Thanks Vivek > for Android use case and limit > the manipulation of user.ovrelay.* xattr based on sepolicy for actors > that are allowed > to make changes in overlayfs mount? or not limit at all? > The access to those xattr is forbidden via "incoming" xattr ops on > overlay inodes. > > Also, IMO, the Documentation section about "Non overlapping credentials" does > not hold the same standards as the section about "Permission model" - > it does not > state the requirements clear enough for my non-security-oriented brain to be > able to understand if the "Ignore mounter's credentials" model can be exploited. > > Can an unprivileged user create an overlay over a directory that they have > access to and redirect an innocent looking file name to an underlying file that > said the mounting user has no access to and by doing that, tricking a privileged > user to modify the innocent looking file on the mounter's behalf? > Of course this could be avoided by forbidding unprivileged mount with > override_creds=off, but there could be other scenarios, so a clear model > would help to understand the risks. > > For example: > If user 1 was able to read in lower dir A, now the content of overlay dir A > is cached and user 2, that has permissions to read upper dir A and does > not have read permissions on lower dir A will see the content of lower dir A. > > I think that the core problem with the approach is using Non-uniform > credentials to access underlying layers. I don't see a simple way around > a big audit that checks all those cases, but maybe I'm missing some quick > shortcut or maybe your use case can add some restrictions about the > users that could access this overlay that would simplify the generic problem. > > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-unionfs/CAJfpegtMoD85j5namV592sJD23QeUMD=+tq4SvFDqjVxsAszYQ@mail.gmail.com/ >
On Fri, Dec 3, 2021 at 10:38 AM Vivek Goyal <vgoyal@redhat.com> wrote: > On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote: > > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote: > > > > > > Mark Salyzyn (3): > > > Add flags option to get xattr method paired to __vfs_getxattr > > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > > overlayfs: override_creds=off option bypass creator_cred > > > > > > Mark Salyzyn + John Stultz (1): > > > overlayfs: inode_owner_or_capable called during execv > > > > > > The first three patches address fundamental security issues that should > > > be solved regardless of the override_creds=off feature. > > > > > > The fourth adds the feature depends on these other fixes. > > > > > > By default, all access to the upper, lower and work directories is the > > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > > checked against the caller's credentials. > > > > > > If the principles of least privilege are applied for sepolicy, the > > > mounter's credentials might not overlap the credentials of the caller's > > > when accessing the overlayfs filesystem. For example, a file that a > > > lower DAC privileged caller can execute, is MAC denied to the > > > generally higher DAC privileged mounter, to prevent an attack vector. > > > > > > We add the option to turn off override_creds in the mount options; all > > > subsequent operations after mount on the filesystem will be only the > > > caller's credentials. The module boolean parameter and mount option > > > override_creds is also added as a presence check for this "feature", > > > existence of /sys/module/overlay/parameters/overlay_creds > > BTW, where is patch 1 of the series. I can't seem to find it. Lore to the rescue ... https://lore.kernel.org/selinux/20211117015806.2192263-2-dvander@google.com/
On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote: > > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote: > > > > > > Mark Salyzyn (3): > > > Add flags option to get xattr method paired to __vfs_getxattr > > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > > overlayfs: override_creds=off option bypass creator_cred > > > > > > Mark Salyzyn + John Stultz (1): > > > overlayfs: inode_owner_or_capable called during execv > > > > > > The first three patches address fundamental security issues that should > > > be solved regardless of the override_creds=off feature. > > > > > > The fourth adds the feature depends on these other fixes. > > > > > > By default, all access to the upper, lower and work directories is the > > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > > checked against the caller's credentials. > > > > > > If the principles of least privilege are applied for sepolicy, the > > > mounter's credentials might not overlap the credentials of the caller's > > > when accessing the overlayfs filesystem. For example, a file that a > > > lower DAC privileged caller can execute, is MAC denied to the > > > generally higher DAC privileged mounter, to prevent an attack vector. > > > > > > We add the option to turn off override_creds in the mount options; all > > > subsequent operations after mount on the filesystem will be only the > > > caller's credentials. The module boolean parameter and mount option > > > override_creds is also added as a presence check for this "feature", > > > existence of /sys/module/overlay/parameters/overlay_creds > > > > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > > > Signed-off-by: David Anderson <dvander@google.com> > > > Cc: Miklos Szeredi <miklos@szeredi.hu> > > > Cc: Jonathan Corbet <corbet@lwn.net> > > > Cc: Vivek Goyal <vgoyal@redhat.com> > > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > > Cc: Amir Goldstein <amir73il@gmail.com> > > > Cc: Randy Dunlap <rdunlap@infradead.org> > > > Cc: Stephen Smalley <sds@tycho.nsa.gov> > > > Cc: John Stultz <john.stultz@linaro.org> > > > Cc: linux-doc@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > Cc: linux-fsdevel@vger.kernel.org > > > Cc: linux-unionfs@vger.kernel.org > > > Cc: linux-security-module@vger.kernel.org > > > Cc: kernel-team@android.com > > > Cc: selinux@vger.kernel.org > > > Cc: paulmoore@microsoft.com > > > Cc: Luca.Boccassi@microsoft.com > > > > > > --- > > > > > > v19 > > > - rebase. > > > > > > > Hi David, > > > > I see that the patch set has changed hands (presumably to Android upstreaming > > team), but you just rebased v18 without addressing the maintainers concerns [1]. > > > > BTW, where is patch 1 of the series. I can't seem to find it. > > I think I was running into issues with getxattr() on underlying filesystem > as well (if mounter did not have sufficient privileges) and tried to fix > it. But did not find a good solution at that point of time. > > https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/ > > So basically when overlay inode is being initialized, code will try to > query "security.selinux" xattr on underlying file to initialize selinux > label on the overlay inode. For regular filesystems, they bypass the > security check by calling __vfs_getxattr() when trying to initialize > this selinux security label. But with layered filesystem, it still > ends up calling vfs_getxattr() on underlying filesyste. Which means > it checks for caller's creds and if caller is not priviliged enough, > access will be denied. > > To solve this problem, looks like this patch set is passing a flag > XATTR_NOSECUROTY so that permission checks are skipped in getxattr() > path in underlying filesystem. As long as this information is > not leaked to user space (and remains in overlayfs), it probably is > fine? And if information is not going to user space, then it probably > is fine for unprivileged overlayfs mounts as well? > > I see a comment from Miklos as well as you that it is not safe to > do for unprivileged mounts. Can you help me understand why that's > the case. > > > > Specifically, the patch 2/4 is very wrong for unprivileged mount and > > Can you help me understand why it is wrong. (/me should spend more > time reading the patch. But I am taking easy route of asking you. :-)). > I should have spent more time reading the patch too :-) I was not referring to the selinux part. That looks fine I guess. I was referring to the part of: "Check impure, opaque, origin & meta xattr with no sepolicy audit (using __vfs_getxattr) since these operations are internal to overlayfs operations and do not disclose any data." I don't know how safe that really is to ignore the security checks for reading trusted xattr and allow non-privileged mounts to do that. Certainly since non privileged mounts are likely to use userxattr anyway, so what's the reason to bypass security? > > I think that the very noisy patch 1/4 could be completely avoided: > > How can it completely avoided. If mounter is not privileged then > vfs_getxattr() on underlying filesystem will fail. Or if > override_creds=off, then caller might not be privileged enough to > do getxattr() but we still should be able to initialize overlay > inode security label. > My bad. I didn't read the description of the selinux problem with the re-post and forgot about it. > > Can't you use -o userxattr mount option > > user xattrs done't work for device nodes and symlinks. > > BTW, how will userxattr solve the problem completely. It can be used > to store overlay specific xattrs but accessing security xattrs on > underlying filesystem will still be a problem? It cannot. As long as the patch sticks with passing through the getxattr flags, it looks fine to me. passing security for trusted.overlay seems dodgy. Thanks, Amir.
On Fri, Dec 03, 2021 at 06:31:01PM +0200, Amir Goldstein wrote: > On Fri, Dec 3, 2021 at 5:38 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Wed, Nov 17, 2021 at 09:36:42AM +0200, Amir Goldstein wrote: > > > On Wed, Nov 17, 2021 at 3:58 AM David Anderson <dvander@google.com> wrote: > > > > > > > > Mark Salyzyn (3): > > > > Add flags option to get xattr method paired to __vfs_getxattr > > > > overlayfs: handle XATTR_NOSECURITY flag for get xattr method > > > > overlayfs: override_creds=off option bypass creator_cred > > > > > > > > Mark Salyzyn + John Stultz (1): > > > > overlayfs: inode_owner_or_capable called during execv > > > > > > > > The first three patches address fundamental security issues that should > > > > be solved regardless of the override_creds=off feature. > > > > > > > > The fourth adds the feature depends on these other fixes. > > > > > > > > By default, all access to the upper, lower and work directories is the > > > > recorded mounter's MAC and DAC credentials. The incoming accesses are > > > > checked against the caller's credentials. > > > > > > > > If the principles of least privilege are applied for sepolicy, the > > > > mounter's credentials might not overlap the credentials of the caller's > > > > when accessing the overlayfs filesystem. For example, a file that a > > > > lower DAC privileged caller can execute, is MAC denied to the > > > > generally higher DAC privileged mounter, to prevent an attack vector. > > > > > > > > We add the option to turn off override_creds in the mount options; all > > > > subsequent operations after mount on the filesystem will be only the > > > > caller's credentials. The module boolean parameter and mount option > > > > override_creds is also added as a presence check for this "feature", > > > > existence of /sys/module/overlay/parameters/overlay_creds > > > > > > > > Signed-off-by: Mark Salyzyn <salyzyn@android.com> > > > > Signed-off-by: David Anderson <dvander@google.com> > > > > Cc: Miklos Szeredi <miklos@szeredi.hu> > > > > Cc: Jonathan Corbet <corbet@lwn.net> > > > > Cc: Vivek Goyal <vgoyal@redhat.com> > > > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > > > Cc: Amir Goldstein <amir73il@gmail.com> > > > > Cc: Randy Dunlap <rdunlap@infradead.org> > > > > Cc: Stephen Smalley <sds@tycho.nsa.gov> > > > > Cc: John Stultz <john.stultz@linaro.org> > > > > Cc: linux-doc@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > Cc: linux-fsdevel@vger.kernel.org > > > > Cc: linux-unionfs@vger.kernel.org > > > > Cc: linux-security-module@vger.kernel.org > > > > Cc: kernel-team@android.com > > > > Cc: selinux@vger.kernel.org > > > > Cc: paulmoore@microsoft.com > > > > Cc: Luca.Boccassi@microsoft.com > > > > > > > > --- > > > > > > > > v19 > > > > - rebase. > > > > > > > > > > Hi David, > > > > > > I see that the patch set has changed hands (presumably to Android upstreaming > > > team), but you just rebased v18 without addressing the maintainers concerns [1]. > > > > > > > BTW, where is patch 1 of the series. I can't seem to find it. > > > > I think I was running into issues with getxattr() on underlying filesystem > > as well (if mounter did not have sufficient privileges) and tried to fix > > it. But did not find a good solution at that point of time. > > > > https://lore.kernel.org/linux-unionfs/1467733854-6314-6-git-send-email-vgoyal@redhat.com/ > > > > So basically when overlay inode is being initialized, code will try to > > query "security.selinux" xattr on underlying file to initialize selinux > > label on the overlay inode. For regular filesystems, they bypass the > > security check by calling __vfs_getxattr() when trying to initialize > > this selinux security label. But with layered filesystem, it still > > ends up calling vfs_getxattr() on underlying filesyste. Which means > > it checks for caller's creds and if caller is not priviliged enough, > > access will be denied. > > > > To solve this problem, looks like this patch set is passing a flag > > XATTR_NOSECUROTY so that permission checks are skipped in getxattr() > > path in underlying filesystem. As long as this information is > > not leaked to user space (and remains in overlayfs), it probably is > > fine? And if information is not going to user space, then it probably > > is fine for unprivileged overlayfs mounts as well? > > > > I see a comment from Miklos as well as you that it is not safe to > > do for unprivileged mounts. Can you help me understand why that's > > the case. > > > > > > > Specifically, the patch 2/4 is very wrong for unprivileged mount and > > > > Can you help me understand why it is wrong. (/me should spend more > > time reading the patch. But I am taking easy route of asking you. :-)). > > > > I should have spent more time reading the patch too :-) > I was not referring to the selinux part. That looks fine I guess. > > I was referring to the part of: > "Check impure, opaque, origin & meta xattr with no sepolicy audit > (using __vfs_getxattr) since these operations are internal to > overlayfs operations and do not disclose any data." > I don't know how safe that really is to ignore the security checks > for reading trusted xattr and allow non-privileged mounts to do that. I am also concerned about this. > Certainly since non privileged mounts are likely to use userxattr > anyway, so what's the reason to bypass security? I am not sure. In the early version of patches I think argument was that do not switch to mounter's creds and use caller's creds on underlying filesystem as well. And each caller will be privileged enough to be able to perform the operation. Our take was that how is this model better because in current model only mounter needs to be privileged while in this new model each caller will have to be privileged. But Android guys seemed to be ok with that. So has this assumption changed since early days. If callers are privileged, then vfs_getxattr() on underlying filesystem for overaly internal xattrs should succeed and there is no need for this change. I suspect patches have evolved since then and callers are not as privileged as we expect them to and that's why we are bypassing this check on all overlayfs internal trusted xattrs? This definitely requires much close scrutiny. My initial reaction is that this sounds very scary. In general I would think overlayfs should not bypass the check on underlying fs. Either checks should be done in mounter's context or caller's context (depending on override_creds=on/off). Thanks Vivek > > > > I think that the very noisy patch 1/4 could be completely avoided: > > > > How can it completely avoided. If mounter is not privileged then > > vfs_getxattr() on underlying filesystem will fail. Or if > > override_creds=off, then caller might not be privileged enough to > > do getxattr() but we still should be able to initialize overlay > > inode security label. > > > > My bad. I didn't read the description of the selinux problem > with the re-post and forgot about it. > > > > Can't you use -o userxattr mount option > > > > user xattrs done't work for device nodes and symlinks. > > > > BTW, how will userxattr solve the problem completely. It can be used > > to store overlay specific xattrs but accessing security xattrs on > > underlying filesystem will still be a problem? > > It cannot. > As long as the patch sticks with passing through the > getxattr flags, it looks fine to me. > passing security for trusted.overlay seems dodgy. > > Thanks, > Amir. >
I wanted to try and bring this thread back from the dead (?) as I believe the use-case is still valid and worth supporting. Some more brief comments below ... On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote: > I am not sure. In the early version of patches I think argument was > that do not switch to mounter's creds and use caller's creds on > underlying filesystem as well. And each caller will be privileged > enough to be able to perform the operation. The basic idea is that we can now build Linux systems with enough access control granularity such that a given process can have the necessary privileges to mount a filesystem, but not necessarily access all of the data on the filesystem, while other processes, with different access rights, are allowed to read and write data on the mounted filesystem. Granted, this is a bit different from how things are usually done, but in my opinion it's a valid and interesting use case in that it allows us to remove unneeded access rights from historically very privileged system startup services/scripts: the service that runs to mount my homedir shouldn't be allowed to access my files just to mount the directory. Unfortunately, this idea falls apart when we attempt to use overlayfs due to the clever/usual way it caches the mounting processes credentials and uses that in place of the current process' credentials when accessing certain parts of the underlying filesystems. The current overlayfs implementation assumes that the mounter will always be more privileged than the processes accessing the filesystem, it would be nice if we could build a mechanism that didn't have this assumption baked into the implementation. This patchset may not have been The Answer, but surely there is something we can do to support this use-case. > Our take was that how is this model better because in current model > only mounter needs to be privileged while in this new model each > caller will have to be privileged. But Android guys seemed to be ok > with that. So has this assumption changed since early days. If callers > are privileged, then vfs_getxattr() on underlying filesystem for > overaly internal xattrs should succeed and there is no need for this > change. > > I suspect patches have evolved since then and callers are not as > privileged as we expect them to and that's why we are bypassing this > check on all overlayfs internal trusted xattrs? This definitely requires > much close scrutiny. My initial reaction is that this sounds very scary. > > In general I would think overlayfs should not bypass the check on > underlying fs. Either checks should be done in mounter's context or > caller's context (depending on override_creds=on/off). > > Thanks > Vivek
On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote: > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote: >> >> I wanted to try and bring this thread back from the dead (?) as I >> believe the use-case is still valid and worth supporting. Some more >> brief comments below ... >> >> On Fri, Dec 3, 2021 at 1:34 PM Vivek Goyal <vgoyal@redhat.com> wrote: >> > I am not sure. In the early version of patches I think argument was >> > that do not switch to mounter's creds and use caller's creds on >> > underlying filesystem as well. And each caller will be privileged >> > enough to be able to perform the operation. > > Indeed that was the argument - though, "userxattr" eliminated the need for patches 1 & 2 completely for us, which is great. We're no longer carrying those in our 5.15 tree. > >> Unfortunately, this idea falls apart when we attempt to use overlayfs >> due to the clever/usual way it caches the mounting processes >> credentials and uses that in place of the current process' credentials >> when accessing certain parts of the underlying filesystems. The >> current overlayfs implementation assumes that the mounter will always >> be more privileged than the processes accessing the filesystem, it >> would be nice if we could build a mechanism that didn't have this >> assumption baked into the implementation. >> >> This patchset may not have been The Answer, but surely there is >> something we can do to support this use-case. > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues. Can you elaborate a bit more on the coherency issues? Is this the dir cache issue that is alluded to in the patchset? Anything else that has come up on review? Before I start looking at the dir cache in any detail, did you have any thoughts on how to resolve the problems that have arisen?
On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote: > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote: > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote: ... > >> This patchset may not have been The Answer, but surely there is > >> something we can do to support this use-case. > > > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues. > > Can you elaborate a bit more on the coherency issues? Is this the dir > cache issue that is alluded to in the patchset? Anything else that > has come up on review? > > Before I start looking at the dir cache in any detail, did you have > any thoughts on how to resolve the problems that have arisen? David, Vivek, Amir, Miklos, or anyone for that matter, can you please go into more detail on the cache issues? I *think* I may have found a potential solution for an issue that could arise when the credential override is not in place, but I'm not certain it's the only issue :) There is motivation on our part to try and get the "override_creds=off" portion of the patchset working and suitable for upstreaming, but I need some help in making sure I understand all the objections/problems.
On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote: > > On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote: > > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote: > > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote: > > ... > > > >> This patchset may not have been The Answer, but surely there is > > >> something we can do to support this use-case. > > > > > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues. > > > > Can you elaborate a bit more on the coherency issues? Is this the dir > > cache issue that is alluded to in the patchset? Anything else that > > has come up on review? > > > > Before I start looking at the dir cache in any detail, did you have > > any thoughts on how to resolve the problems that have arisen? > > David, Vivek, Amir, Miklos, or anyone for that matter, can you please > go into more detail on the cache issues? I *think* I may have found a > potential solution for an issue that could arise when the credential > override is not in place, but I'm not certain it's the only issue :) > Hi Paul, In this thread I claimed that the authors of the patches did not present a security model for overlayfs, such as the one currently in overlayfs.rst. If we had a model we could have debated its correctness and review its implementation. As a proof that there is no solid model, I gave an *example* regarding the overlay readdir cache. When listing a merged dir, meaning, a directory containing entries from several overlay layers, ovl_permission() is called to check user's permission, but ovl_permission() does not currently check permissions to read all layers, because that is not the current overlayfs model. Overlayfs has a readdir cache, so without override_cred, a user with high credentials can populate the readdir cache and then a user will fewer credentials, not enough to access the lower layers, but enough to access the upper most layer, will pass ovl_permission() check and be allowed to read from readdir cache. This specific problem can be solved in several ways - disable readdir cache with override_cred=off, check all layers in ovl_permission(). That's not my point. My point is that I provided a proof that the current model of override_cred=off is flawed and it is up to the authors of the patch to fix the model and provide the analysis of overlayfs code to prove the model's correctness. The core of the matter is there is no easy way to "merge" the permissions from all layers into a single permission blob that could be checked once. Maybe the example I gave is the only flaw in the model, maybe not I am not sure. I will be happy to help you in review of a model and the solution that you may have found. Thanks, Amir.
On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote: > On Fri, Mar 11, 2022 at 12:11 AM Paul Moore <paul@paul-moore.com> wrote: > > > > On Wed, Mar 9, 2022 at 4:13 PM Paul Moore <paul@paul-moore.com> wrote: > > > On Tue, Mar 1, 2022 at 12:05 AM David Anderson <dvander@google.com> wrote: > > > > On Mon, Feb 28, 2022 at 5:09 PM Paul Moore <paul@paul-moore.com> wrote: > > > > ... > > > > > >> This patchset may not have been The Answer, but surely there is > > > >> something we can do to support this use-case. > > > > > > > > Yup exactly, and we still need patches 3 & 4 to deal with this. My current plan is to try and rework our sepolicy (we have some ideas on how it could be made compatible with how overlayfs works). If that doesn't pan out we'll revisit these patches and think harder about how to deal with the coherency issues. > > > > > > Can you elaborate a bit more on the coherency issues? Is this the dir > > > cache issue that is alluded to in the patchset? Anything else that > > > has come up on review? > > > > > > Before I start looking at the dir cache in any detail, did you have > > > any thoughts on how to resolve the problems that have arisen? > > > > David, Vivek, Amir, Miklos, or anyone for that matter, can you please > > go into more detail on the cache issues? I *think* I may have found a > > potential solution for an issue that could arise when the credential > > override is not in place, but I'm not certain it's the only issue :) > > > > Hi Paul, > > In this thread I claimed that the authors of the patches did not present > a security model for overlayfs, such as the one currently in overlayfs.rst. > If we had a model we could have debated its correctness and review its > implementation. Agreed. After going through the patch set, I was wondering what's the overall security model and how to visualize that. So probably there needs to be a documentation patch which explains what's the new security model and how does it work. Also think both in terms of DAC and MAC. (Instead of just focussing too hard on SELinux). My understanding is that in current model, some of the overlayfs operations require priviliges. So mounter is supposed to be priviliged and does the operation on underlying layers. Now in this new model, there will be two levels of check. Both overlay level and underlying layer checks will happen in the context of task which is doing the operation. So first of all, all tasks will need to have enough priviliges to be able to perform various operations on lower layer. If we do checks at both the levels in with the creds of calling task, I guess that probably is fine. (But will require a closer code inspection to make sure there is no privilege escalation both for mounter as well calling task). > > As a proof that there is no solid model, I gave an *example* regarding > the overlay readdir cache. > > When listing a merged dir, meaning, a directory containing entries from > several overlay layers, ovl_permission() is called to check user's permission, > but ovl_permission() does not currently check permissions to read all layers, > because that is not the current overlayfs model. > > Overlayfs has a readdir cache, so without override_cred, a user with high > credentials can populate the readdir cache and then a user will fewer > credentials, not enough to access the lower layers, but enough to access > the upper most layer, will pass ovl_permission() check and be allowed to > read from readdir cache. I am not very familiar with dir caching code. When I read through the overlayfs.rst, it gave the impression that caching is per "struct file". "This merged name list is cached in the 'struct file' and so remains as long as the file is kept open." And I was wondering if that's the case, then one user should not be able to access the cache built by another priviliged user (until and unless privileged user passed fd). But looks like we build this cache and store in ovl inode and that's why this issue of cache built by higher privileged process will be accessible to lower privileged process. With current model this is not an issue because "mounter" is providing those privileges to unprivileged process. So while unprivileged process can't do "readdir" on an underlying lower dir, it might still be able to do that through an overlay mount. But if we don't switch to mounter's creds, then we probably can't rely on this dir caching. Agreed that disabling dir caching seems simplest solution if we were to do override_creds=off. Thanks Vivek > > This specific problem can be solved in several ways - disable readdir > cache with override_cred=off, check all layers in ovl_permission(). > That's not my point. My point is that I provided a proof that the current > model of override_cred=off is flawed and it is up to the authors of the > patch to fix the model and provide the analysis of overlayfs code to > prove the model's correctness. > > The core of the matter is there is no easy way to "merge" the permissions > from all layers into a single permission blob that could be checked once. > > Maybe the example I gave is the only flaw in the model, maybe not > I am not sure. I will be happy to help you in review of a model and the > solution that you may have found. > > Thanks, > Amir. >
On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote: > On Fri, Mar 11, 2022 at 06:09:56AM +0200, Amir Goldstein wrote: > > Hi Paul, Hi Amir, Vivek, Thanks for the replies, I think I now have a better understanding of the concerns which is starting to make the path forward a bit more clear. A few more comments below ... > > In this thread I claimed that the authors of the patches did not present > > a security model for overlayfs, such as the one currently in overlayfs.rst. > > If we had a model we could have debated its correctness and review its > > implementation. > > Agreed. After going through the patch set, I was wondering what's the > overall security model and how to visualize that. > > So probably there needs to be a documentation patch which explains > what's the new security model and how does it work. Yes, of course. I'll be sure to add a section to the existing docs. > Also think both in terms of DAC and MAC. (Instead of just focussing too > hard on SELinux). Definitely. Most of what I've been thinking about the past day or so has been how to properly handle some of the DAC/capability issues; I have yet to start playing with the code, but for the most part I think the MAC/SELinux bits are already working properly. > My understanding is that in current model, some of the overlayfs > operations require priviliges. So mounter is supposed to be priviliged > and does the operation on underlying layers. > > Now in this new model, there will be two levels of check. Both overlay > level and underlying layer checks will happen in the context of task > which is doing the operation. So first of all, all tasks will need > to have enough priviliges to be able to perform various operations > on lower layer. > > If we do checks at both the levels in with the creds of calling task, > I guess that probably is fine. (But will require a closer code inspection > to make sure there is no privilege escalation both for mounter as well > calling task). I have thoughts on this, but I don't think I'm yet in a position to debate this in depth just yet; I still need to finish poking around the code and playing with a few things :) It may take some time before I'm back with patches, but I appreciate all of the tips and insight - thank you!
On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote: > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > Agreed. After going through the patch set, I was wondering what's the > > overall security model and how to visualize that. > > > > So probably there needs to be a documentation patch which explains > > what's the new security model and how does it work. > > Yes, of course. I'll be sure to add a section to the existing docs. > > > Also think both in terms of DAC and MAC. (Instead of just focussing too > > hard on SELinux). > > Definitely. Most of what I've been thinking about the past day or so > has been how to properly handle some of the DAC/capability issues; I > have yet to start playing with the code, but for the most part I think > the MAC/SELinux bits are already working properly. > > > My understanding is that in current model, some of the overlayfs > > operations require priviliges. So mounter is supposed to be priviliged > > and does the operation on underlying layers. > > > > Now in this new model, there will be two levels of check. Both overlay > > level and underlying layer checks will happen in the context of task > > which is doing the operation. So first of all, all tasks will need > > to have enough priviliges to be able to perform various operations > > on lower layer. > > > > If we do checks at both the levels in with the creds of calling task, > > I guess that probably is fine. (But will require a closer code inspection > > to make sure there is no privilege escalation both for mounter as well > > calling task). > > I have thoughts on this, but I don't think I'm yet in a position to > debate this in depth just yet; I still need to finish poking around > the code and playing with a few things :) > > It may take some time before I'm back with patches, but I appreciate > all of the tips and insight - thank you! Let me resurrect this discussion. With https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792 the Fedora policy changed kernel_t to a confined domain. This means that many overlayfs setups that are created in initrd will now run into issues, as it will have kernel_t as part of the saved credentials. So while the original use case that inspired the patch set was probably not very common that now changed. It's tricky to work around this. Loading a policy in initrd causes a lot of issues now that kernel_t isn't unconfined anymore. Once the policy is loaded by systemd changing the mounts is tough since we use it for /etc and at this time systemd already has open file handles for policy files in /etc. Johannes
On Wed, Mar 22, 2023 at 9:28 AM Johannes Segitz <jsegitz@suse.com> wrote: > > On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote: > > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > > Agreed. After going through the patch set, I was wondering what's the > > > overall security model and how to visualize that. > > > > > > So probably there needs to be a documentation patch which explains > > > what's the new security model and how does it work. > > > > Yes, of course. I'll be sure to add a section to the existing docs. > > > > > Also think both in terms of DAC and MAC. (Instead of just focussing too > > > hard on SELinux). > > > > Definitely. Most of what I've been thinking about the past day or so > > has been how to properly handle some of the DAC/capability issues; I > > have yet to start playing with the code, but for the most part I think > > the MAC/SELinux bits are already working properly. > > > > > My understanding is that in current model, some of the overlayfs > > > operations require priviliges. So mounter is supposed to be priviliged > > > and does the operation on underlying layers. > > > > > > Now in this new model, there will be two levels of check. Both overlay > > > level and underlying layer checks will happen in the context of task > > > which is doing the operation. So first of all, all tasks will need > > > to have enough priviliges to be able to perform various operations > > > on lower layer. > > > > > > If we do checks at both the levels in with the creds of calling task, > > > I guess that probably is fine. (But will require a closer code inspection > > > to make sure there is no privilege escalation both for mounter as well > > > calling task). > > > > I have thoughts on this, but I don't think I'm yet in a position to > > debate this in depth just yet; I still need to finish poking around > > the code and playing with a few things :) > > > > It may take some time before I'm back with patches, but I appreciate > > all of the tips and insight - thank you! > > Let me resurrect this discussion. With > https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792 > the Fedora policy changed kernel_t to a confined domain. This means that > many overlayfs setups that are created in initrd will now run into issues, > as it will have kernel_t as part of the saved credentials. So while the > original use case that inspired the patch set was probably not very common > that now changed. I don't remember anyone rejecting the patches on the account that the Android use case is not important. It was never the issue. > > It's tricky to work around this. Loading a policy in initrd causes a lot of > issues now that kernel_t isn't unconfined anymore. Once the policy is > loaded by systemd changing the mounts is tough since we use it for /etc and > at this time systemd already has open file handles for policy files in > /etc. > I've already explained several times on this thread what needs to be done in order to move forward - express the security model and explain why it is safe. If the security guys are going to be in LSS in Vancouver, perhaps we can have a meetup with overlayfs developers on the overlap day with LSFMM (May 10) to try and figure out a path forward. Thanks, Amir.
On Wed, Mar 22, 2023 at 3:28 AM Johannes Segitz <jsegitz@suse.com> wrote: > On Fri, Mar 11, 2022 at 03:52:54PM -0500, Paul Moore wrote: > > On Fri, Mar 11, 2022 at 9:01 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > > Agreed. After going through the patch set, I was wondering what's the > > > overall security model and how to visualize that. > > > > > > So probably there needs to be a documentation patch which explains > > > what's the new security model and how does it work. > > > > Yes, of course. I'll be sure to add a section to the existing docs. > > > > > Also think both in terms of DAC and MAC. (Instead of just focussing too > > > hard on SELinux). > > > > Definitely. Most of what I've been thinking about the past day or so > > has been how to properly handle some of the DAC/capability issues; I > > have yet to start playing with the code, but for the most part I think > > the MAC/SELinux bits are already working properly. > > > > > My understanding is that in current model, some of the overlayfs > > > operations require priviliges. So mounter is supposed to be priviliged > > > and does the operation on underlying layers. > > > > > > Now in this new model, there will be two levels of check. Both overlay > > > level and underlying layer checks will happen in the context of task > > > which is doing the operation. So first of all, all tasks will need > > > to have enough priviliges to be able to perform various operations > > > on lower layer. > > > > > > If we do checks at both the levels in with the creds of calling task, > > > I guess that probably is fine. (But will require a closer code inspection > > > to make sure there is no privilege escalation both for mounter as well > > > calling task). > > > > I have thoughts on this, but I don't think I'm yet in a position to > > debate this in depth just yet; I still need to finish poking around > > the code and playing with a few things :) > > > > It may take some time before I'm back with patches, but I appreciate > > all of the tips and insight - thank you! > > Let me resurrect this discussion. With > https://github.com/fedora-selinux/selinux-policy/commit/1e8688ea694393c9d918939322b72dfb44a01792 > the Fedora policy changed kernel_t to a confined domain. This means that > many overlayfs setups that are created in initrd will now run into issues, > as it will have kernel_t as part of the saved credentials. Regardless of any overlayfs cred work, it seems like it would also be worth spending some time to see if the kernel_t mounter creds situation can also be improved. I'm guessing this is due to mounts happening before the SELinux policy is loaded? Has anyone looked into mounting the SELinux policy even earlier in these cases (may not be possible) and/or umount/mount/remounting the affected overlayfs-based filesystems after the policy has been loaded? I can't say I'm the best person to comment on how the Fedora SELinux policy is structured, but I do know a *little* about SELinux and I think that accepting kernel_t as an overlayfs mounter cred is a mistake. > So while the > original use case that inspired the patch set was probably not very common > that now changed. > > It's tricky to work around this. Loading a policy in initrd causes a lot of > issues now that kernel_t isn't unconfined anymore. Once the policy is > loaded by systemd changing the mounts is tough since we use it for /etc and > at this time systemd already has open file handles for policy files in > /etc. It's been a while since I worked on this, but I pretty much had to give up on the read-write case, the overlayfs copy-up/work-dir approach made this impractical, or at least I couldn't think of a sane way to handle this without some sort of credential override. However, I did have a quick-and-dirty prototype that appeared to work well in the read-only/no-work-dir case; I think I still have it in a development branch somewhere, I can dig it back up and get it ported to a modern kernel if there is any interest. However, when discussing the prototype with Christian Brauner off-list (added to the CC line) he still objected to the no-cred-override approach and said it wasn't something he could support, so I dropped it and focused on the other piles of fire lying about my desk (my apologies to Christian if I'm mis-remembering/understanding the conversation). I still think there is value in supporting a no-creds-override option, and if there is basic support for getting this upstream I'm happy to pick the work back up, but I can't invest a lot more time in this if there isn't an agreement from the overlayfs/VFS maintainers that this is something that would consider. -- paul-moore.com
On Wed, Mar 22, 2023 at 8:48 AM Amir Goldstein <amir73il@gmail.com> wrote: > If the security guys are going to be in LSS in Vancouver, perhaps > we can have a meetup with overlayfs developers on the overlap > day with LSFMM (May 10) to try and figure out a path forward. At the very least I currently plan to be at LSS-NA in Vancouver and would be very happy to discuss this with anyone who wants to talk about it. I'm also happy to continue to conversation here too, for the sake of those who might not be able to travel to LSS-NA and/or LSFMM.