Message ID | 1472625363-4850-1-git-send-email-ashishsangwan2@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote: > In case of a write call on any file, there is a xattr lookup call for > security.capablities type of xattr which is a scaling bottleneck. > In some of our use cases, just enabling the xattr support, we are > experiencing a performance drop of almost 20% even though the file does > not have any security xattr. > Fuse, by default, does not remember the presence of security attributes as > it clears the MS_NOSEC flag at the time of fill super and hence requires a > lookup of security xattr at each write. This makes sense in case of network > filesystems where multiple clients can change the state of xattr. > This patch adds a new mount option cache_security_xattr_presence > to avoid clearing MS_NOSEC flag. This could be use by the filesystem > implementations which supports xattr but are local in nature OR the > implementations which has its own security policies and > do not support security.capablities xattr. If I remember correctly, FUSE does not support LSMs at all, so even if there is a security.capabilities xattr it won't have the expected effect. So maybe it makes more sense to unconditionally catch both read and write of security.capabilites in kernel and never forward it to userspace? Best, -Nikolaus
*ping* On Tue, Sep 6, 2016 at 2:17 PM, Ashish Sangwan <ashishsangwan2@gmail.com> wrote: > > > On Wed, Aug 31, 2016 at 10:34 PM, Nikolaus Rath <Nikolaus@rath.org> wrote: >> >> On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote: >> > In case of a write call on any file, there is a xattr lookup call for >> > security.capablities type of xattr which is a scaling bottleneck. >> > In some of our use cases, just enabling the xattr support, we are >> > experiencing a performance drop of almost 20% even though the file does >> > not have any security xattr. >> > Fuse, by default, does not remember the presence of security attributes >> > as >> > it clears the MS_NOSEC flag at the time of fill super and hence requires >> > a >> > lookup of security xattr at each write. This makes sense in case of >> > network >> > filesystems where multiple clients can change the state of xattr. >> > This patch adds a new mount option cache_security_xattr_presence >> > to avoid clearing MS_NOSEC flag. This could be use by the filesystem >> > implementations which supports xattr but are local in nature OR the >> > implementations which has its own security policies and >> > do not support security.capablities xattr. >> >> >> If I remember correctly, FUSE does not support LSMs at all, so even if >> there is a security.capabilities xattr it won't have the expected >> effect. So maybe it makes more sense to unconditionally catch both read >> and write of security.capabilites in kernel and never forward it to >> userspace? > > > Hi Miklos, do you have any comment about the patch or Nikolaus's advise? > > Thanks, > Ashish >> >> >> Best, >> -Nikolaus >> >> -- >> GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F >> Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F >> >> »Time flies like an arrow, fruit flies like a Banana.« > > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 06, 2016 at 02:17:08PM +0530, Ashish Sangwan wrote: > On Wed, Aug 31, 2016 at 10:34 PM, Nikolaus Rath <Nikolaus@rath.org> wrote: > > > On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote: > > > In case of a write call on any file, there is a xattr lookup call for > > > security.capablities type of xattr which is a scaling bottleneck. > > > In some of our use cases, just enabling the xattr support, we are > > > experiencing a performance drop of almost 20% even though the file does > > > not have any security xattr. > > > Fuse, by default, does not remember the presence of security attributes > > as > > > it clears the MS_NOSEC flag at the time of fill super and hence requires > > a > > > lookup of security xattr at each write. This makes sense in case of > > network > > > filesystems where multiple clients can change the state of xattr. > > > This patch adds a new mount option cache_security_xattr_presence > > > to avoid clearing MS_NOSEC flag. This could be use by the filesystem > > > implementations which supports xattr but are local in nature OR the > > > implementations which has its own security policies and > > > do not support security.capablities xattr. > > > > > > If I remember correctly, FUSE does not support LSMs at all, so even if > > there is a security.capabilities xattr it won't have the expected > > effect. So maybe it makes more sense to unconditionally catch both read > > and write of security.capabilites in kernel and never forward it to > > userspace? > > > > Hi Miklos, do you have any comment about the patch or Nikolaus's advise? "security.capabilities" should work fine in fuse. Handling of that is in security/commoncaps.c, not part of any security module. I'm looking at handling of ATTR_KILL_* flags in fuse and it's a mess. It needs to be sorted out, but it's going to be more complicated than your patch. Fuse handles different models for filesystems, and unfortunately it doesn't make a clear distintion between the two, resulting in all sorts of "interesting" bugs. 1) Local filesystem (ntfs3g, etc). The VFS cares for this type of fs very well, since the cached state is always consistent with the actual filesystem state. In this case we can safely turn on MS_NOSEC caching. Using the "fuseblk" filesystem type should be a reliable indication of this mode. Non-fuseblk filesystems can also be local. They would be setting attribute and entry timeouts to large values and setting FOPEN_KEEP_CACHE in open to get maximal attribute and data caching. But there's no single, reliable indication of this mode, so (perhaps with the exception of "fuseblk") we have to assume a distributed filesystem. 2) Distributed filesystem (gluster, etc). The filesystem can be modified externally, hence the cached state can become out of date with the actual filesystem state. For this case we need to be careful with MS_NOSEC. Not only that, but fuse allowed VFS to set mode in setattr in order to clear suid/sgid on chown and truncate, and (since writeback_cache) write. The problem with this is that it'll potentially set a stale mode. E.g.: host1: chmod 4755 foo host2: stat foo > /dev/null host1: chmod 4700 foo host2: chown 1:1 foo host2: stat -c%a bar 755 See the problem? The poper fix would be to let the filesystems do the suid/sgid clearing on the relevant operations. Possibly some are already doing it (if the filesystem just forwards operations to a real underlying fs then it will just work). So we need a way to know if the filesystem is clearing privs (suid/sgid/cap) on chown, truncate and write. A) It is clearing privs. We can set MS_NOSEC and ignore ATTR_KILL_* in fuse_setxattr(). There's a special case, though: A*) writeback_cache: In this case WRITE requests won't be sent to the filesystem immediately on a write() so we still need to do getattr/getxattr in write to determine if privs need to be cleared. But we can use the attr timeout to limit the rate. B) It isn't clearing privs. This is the default, unless filesystem indicates otherwise, we must assmue this even though the filesystem may actually be clearing privs. In this case we must remove "security.capability" before doing chown, truncate or write (this is the current state). But we also need to manually kill suid/sgit in a less racy way. Solution: - If ATTR_KILL_* is set in fuse_setattr(), then update the attributes and recalculate the mode to set, to reduce the race window. This is a pretty simple and harmless change. - Refresh the attributes if timeout has expired and recheck suid/sgid. This will result in more correct operation, but also may cause performance regression in case the attribute timeout is zero or very small. If it does cause a performance regression, that will at least make the filesystem writers consider moving to model 1 or model 2/A. Still, this is a risky change, because we don't want to generally break working setups on new kernels, and this is a very obscure corner case, which probably not many care about. I'm not sure about the risk/benefit ratio here... So to conclude: - We need some fixes to the default behavior. - We need a way for filesystem to tell the kernel module if it is taking care of clearing privileges for write/truncate/chown (FUSE_HANDLE_KILLPRIV). - We need a way for filesystem to tell the kernel module if it is allowing caching of xattrs or lack thereof (timeout in fuse_getxattr_out). - Perhaps add a "local fs" mode where we can assume proper consistency between cache and backing. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 13af4a4..7245a40 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt @@ -115,6 +115,18 @@ Mount options Set the block size for the filesystem. The default is 512. This option is only valid for 'fuseblk' type mounts. +'cache_security_xattr_presence' + + If xattr support is enabled, in case of every write call on a file + fuse perform a xattr lookup call for security.capablities type as it does + not remember the presence of this xattr type. This is expected behavior in + case of network file system implementations where multiple clients can + modify the security related xattr state. + But in case of local file system implementations OR in case of network + file system implementations which does not support security.capablities + this option will prevent the security xattr lookup by caching its presence + in kernel. + Control filesystem ~~~~~~~~~~~~~~~~~~ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 4e05b51..bd670c8 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -66,6 +66,7 @@ struct fuse_mount_data { unsigned rootmode_present:1; unsigned user_id_present:1; unsigned group_id_present:1; + unsigned cache_security_xattr_presence:1; unsigned flags; unsigned max_read; unsigned blksize; @@ -454,6 +455,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_CACHE_SECURITY_XATTR_PRESENCE, OPT_ERR }; @@ -466,6 +468,7 @@ static const match_table_t tokens = { {OPT_ALLOW_OTHER, "allow_other"}, {OPT_MAX_READ, "max_read=%u"}, {OPT_BLKSIZE, "blksize=%u"}, + {OPT_CACHE_SECURITY_XATTR_PRESENCE, "cache_security_xattr_presence"}, {OPT_ERR, NULL} }; @@ -539,6 +542,10 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) d->flags |= FUSE_ALLOW_OTHER; break; + case OPT_CACHE_SECURITY_XATTR_PRESENCE: + d->cache_security_xattr_presence = 1; + break; + case OPT_MAX_READ: if (match_int(&args[0], &value)) return 0; @@ -1069,6 +1076,8 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; } + if (d.cache_security_xattr_presence) + sb->s_flags |= MS_NOSEC; sb->s_magic = FUSE_SUPER_MAGIC; sb->s_op = &fuse_super_operations; sb->s_maxbytes = MAX_LFS_FILESIZE;
In case of a write call on any file, there is a xattr lookup call for security.capablities type of xattr which is a scaling bottleneck. In some of our use cases, just enabling the xattr support, we are experiencing a performance drop of almost 20% even though the file does not have any security xattr. Fuse, by default, does not remember the presence of security attributes as it clears the MS_NOSEC flag at the time of fill super and hence requires a lookup of security xattr at each write. This makes sense in case of network filesystems where multiple clients can change the state of xattr. This patch adds a new mount option cache_security_xattr_presence to avoid clearing MS_NOSEC flag. This could be use by the filesystem implementations which supports xattr but are local in nature OR the implementations which has its own security policies and do not support security.capablities xattr. Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com> --- Documentation/filesystems/fuse.txt | 12 ++++++++++++ fs/fuse/inode.c | 9 +++++++++ 2 files changed, 21 insertions(+)