diff mbox

Fuse: Add mount option to cache presence of security related xattr

Message ID 1472625363-4850-1-git-send-email-ashishsangwan2@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ashish Sangwan Aug. 31, 2016, 6:36 a.m. UTC
In case of a write call on any file, there is a xattr lookup call for
security.capablities type of xattr which is a scaling bottleneck.
In some of our use cases, just enabling the xattr support, we are
experiencing a performance drop of almost 20% even though the file does
not have any security xattr.
Fuse, by default, does not remember the presence of security attributes as
it clears the MS_NOSEC flag at the time of fill super and hence requires a
lookup of security xattr at each write. This makes sense in case of network
filesystems where multiple clients can change the state of xattr.
This patch adds a new mount option cache_security_xattr_presence
to avoid clearing MS_NOSEC flag. This could be use by the filesystem
implementations which supports xattr but are local in nature OR the
implementations which has its own security policies and
do not support security.capablities xattr.

Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com>
---
 Documentation/filesystems/fuse.txt | 12 ++++++++++++
 fs/fuse/inode.c                    |  9 +++++++++
 2 files changed, 21 insertions(+)

Comments

Nikolaus Rath Aug. 31, 2016, 5:04 p.m. UTC | #1
On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote:
> In case of a write call on any file, there is a xattr lookup call for
> security.capablities type of xattr which is a scaling bottleneck.
> In some of our use cases, just enabling the xattr support, we are
> experiencing a performance drop of almost 20% even though the file does
> not have any security xattr.
> Fuse, by default, does not remember the presence of security attributes as
> it clears the MS_NOSEC flag at the time of fill super and hence requires a
> lookup of security xattr at each write. This makes sense in case of network
> filesystems where multiple clients can change the state of xattr.
> This patch adds a new mount option cache_security_xattr_presence
> to avoid clearing MS_NOSEC flag. This could be use by the filesystem
> implementations which supports xattr but are local in nature OR the
> implementations which has its own security policies and
> do not support security.capablities xattr.


If I remember correctly, FUSE does not support LSMs at all, so even if
there is a security.capabilities xattr it won't have the expected
effect. So maybe it makes more sense to unconditionally catch both read
and write of security.capabilites in kernel and never forward it to
userspace?

Best,
-Nikolaus
Ashish Sangwan Sept. 10, 2016, 4 a.m. UTC | #2
*ping*

On Tue, Sep 6, 2016 at 2:17 PM, Ashish Sangwan <ashishsangwan2@gmail.com> wrote:
>
>
> On Wed, Aug 31, 2016 at 10:34 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>
>> On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote:
>> > In case of a write call on any file, there is a xattr lookup call for
>> > security.capablities type of xattr which is a scaling bottleneck.
>> > In some of our use cases, just enabling the xattr support, we are
>> > experiencing a performance drop of almost 20% even though the file does
>> > not have any security xattr.
>> > Fuse, by default, does not remember the presence of security attributes
>> > as
>> > it clears the MS_NOSEC flag at the time of fill super and hence requires
>> > a
>> > lookup of security xattr at each write. This makes sense in case of
>> > network
>> > filesystems where multiple clients can change the state of xattr.
>> > This patch adds a new mount option cache_security_xattr_presence
>> > to avoid clearing MS_NOSEC flag. This could be use by the filesystem
>> > implementations which supports xattr but are local in nature OR the
>> > implementations which has its own security policies and
>> > do not support security.capablities xattr.
>>
>>
>> If I remember correctly, FUSE does not support LSMs at all, so even if
>> there is a security.capabilities xattr it won't have the expected
>> effect. So maybe it makes more sense to unconditionally catch both read
>> and write of security.capabilites in kernel and never forward it to
>> userspace?
>
>
> Hi Miklos, do you have any comment about the patch or Nikolaus's advise?
>
> Thanks,
> Ashish
>>
>>
>> Best,
>> -Nikolaus
>>
>> --
>> GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
>> Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>>
>>              »Time flies like an arrow, fruit flies like a Banana.«
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miklos Szeredi Sept. 15, 2016, 2:15 p.m. UTC | #3
On Tue, Sep 06, 2016 at 02:17:08PM +0530, Ashish Sangwan wrote:
> On Wed, Aug 31, 2016 at 10:34 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> 
> > On Aug 31 2016, Ashish Sangwan <ashishsangwan2@gmail.com> wrote:
> > > In case of a write call on any file, there is a xattr lookup call for
> > > security.capablities type of xattr which is a scaling bottleneck.
> > > In some of our use cases, just enabling the xattr support, we are
> > > experiencing a performance drop of almost 20% even though the file does
> > > not have any security xattr.
> > > Fuse, by default, does not remember the presence of security attributes
> > as
> > > it clears the MS_NOSEC flag at the time of fill super and hence requires
> > a
> > > lookup of security xattr at each write. This makes sense in case of
> > network
> > > filesystems where multiple clients can change the state of xattr.
> > > This patch adds a new mount option cache_security_xattr_presence
> > > to avoid clearing MS_NOSEC flag. This could be use by the filesystem
> > > implementations which supports xattr but are local in nature OR the
> > > implementations which has its own security policies and
> > > do not support security.capablities xattr.
> >
> >
> > If I remember correctly, FUSE does not support LSMs at all, so even if
> > there is a security.capabilities xattr it won't have the expected
> > effect. So maybe it makes more sense to unconditionally catch both read
> > and write of security.capabilites in kernel and never forward it to
> > userspace?
> >
> 
> Hi Miklos, do you have any comment about the patch or Nikolaus's advise?

"security.capabilities" should work fine in fuse.  Handling of that is in
security/commoncaps.c, not part of any security module.

I'm looking at handling of ATTR_KILL_* flags in fuse and it's a mess.  It needs
to be sorted out, but it's going to be more complicated than your patch.

Fuse handles different models for filesystems, and unfortunately it doesn't make
a clear distintion between the two, resulting in all sorts of "interesting"
bugs.

1) Local filesystem (ntfs3g, etc).  The VFS cares for this type of fs very well,
since the cached state is always consistent with the actual filesystem state.
In this case we can safely turn on MS_NOSEC caching.  Using the "fuseblk"
filesystem type should be a reliable indication of this mode.  Non-fuseblk
filesystems can also be local.  They would be setting attribute and entry
timeouts to large values and setting FOPEN_KEEP_CACHE in open to get maximal
attribute and data caching.  But there's no single, reliable indication of this
mode, so (perhaps with the exception of "fuseblk") we have to assume a
distributed filesystem.

2) Distributed filesystem (gluster, etc).  The filesystem can be modified
externally, hence the cached state can become out of date with the actual
filesystem state.  For this case we need to be careful with MS_NOSEC.  Not only
that, but fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
chown and truncate, and (since writeback_cache) write.  The problem with this is
that it'll potentially set a stale mode.  E.g.:

host1: chmod 4755 foo
host2: stat foo > /dev/null
host1: chmod 4700 foo
host2: chown 1:1 foo
host2: stat -c%a bar
755

See the problem?

The poper fix would be to let the filesystems do the suid/sgid clearing on the
relevant operations.  Possibly some are already doing it (if the filesystem just
forwards operations to a real underlying fs then it will just work).

So we need a way to know if the filesystem is clearing privs (suid/sgid/cap) on
chown, truncate and write.

A) It is clearing privs. We can set MS_NOSEC and ignore ATTR_KILL_* in
fuse_setxattr().  There's a special case, though:

A*) writeback_cache: In this case WRITE requests won't be sent to the
filesystem immediately on a write() so we still need to do getattr/getxattr in
write to determine if privs need to be cleared.  But we can use the attr timeout
to limit the rate.

B) It isn't clearing privs.  This is the default, unless filesystem indicates
otherwise, we must assmue this even though the filesystem may actually be
clearing privs.  In this case we must remove "security.capability" before doing
chown, truncate or write (this is the current state).  But we also need to
manually kill suid/sgit in a less racy way.  Solution:

- If ATTR_KILL_* is set in fuse_setattr(), then update the attributes and
  recalculate the mode to set, to reduce the race window.  This is a pretty
  simple and harmless change.

- Refresh the attributes if timeout has expired and recheck suid/sgid.  This
  will result in more correct operation, but also may cause performance
  regression in case the attribute timeout is zero or very small.  If it does
  cause a performance regression, that will at least make the filesystem writers
  consider moving to model 1 or model 2/A.  Still, this is a risky change,
  because we don't want to generally break working setups on new kernels, and
  this is a very obscure corner case, which probably not many care about.  I'm
  not sure about the risk/benefit ratio here...

So to conclude:

 - We need some fixes to the default behavior.

 - We need a way for filesystem to tell the kernel module if it is taking care
   of clearing privileges for write/truncate/chown (FUSE_HANDLE_KILLPRIV).

 - We need a way for filesystem to tell the kernel module if it is allowing
   caching of xattrs or lack thereof (timeout in fuse_getxattr_out).

 - Perhaps add a "local fs" mode where we can assume proper consistency between
   cache and backing.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 13af4a4..7245a40 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -115,6 +115,18 @@  Mount options
   Set the block size for the filesystem.  The default is 512.  This
   option is only valid for 'fuseblk' type mounts.
 
+'cache_security_xattr_presence'
+
+  If xattr support is enabled, in case of every write call on a file
+  fuse perform a xattr lookup call for security.capablities type as it does
+  not remember the presence of this xattr type. This is expected behavior in
+  case of network file system implementations where multiple clients can
+  modify the security related xattr state.
+  But in case of local file system implementations OR in case of network
+  file system implementations which does not support security.capablities
+  this option will prevent the security xattr lookup by caching its presence
+  in kernel.
+
 Control filesystem
 ~~~~~~~~~~~~~~~~~~
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 4e05b51..bd670c8 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -66,6 +66,7 @@  struct fuse_mount_data {
 	unsigned rootmode_present:1;
 	unsigned user_id_present:1;
 	unsigned group_id_present:1;
+	unsigned cache_security_xattr_presence:1;
 	unsigned flags;
 	unsigned max_read;
 	unsigned blksize;
@@ -454,6 +455,7 @@  enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_CACHE_SECURITY_XATTR_PRESENCE,
 	OPT_ERR
 };
 
@@ -466,6 +468,7 @@  static const match_table_t tokens = {
 	{OPT_ALLOW_OTHER,		"allow_other"},
 	{OPT_MAX_READ,			"max_read=%u"},
 	{OPT_BLKSIZE,			"blksize=%u"},
+	{OPT_CACHE_SECURITY_XATTR_PRESENCE, "cache_security_xattr_presence"},
 	{OPT_ERR,			NULL}
 };
 
@@ -539,6 +542,10 @@  static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 			d->flags |= FUSE_ALLOW_OTHER;
 			break;
 
+		case OPT_CACHE_SECURITY_XATTR_PRESENCE:
+			d->cache_security_xattr_presence = 1;
+			break;
+
 		case OPT_MAX_READ:
 			if (match_int(&args[0], &value))
 				return 0;
@@ -1069,6 +1076,8 @@  static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 		sb->s_blocksize = PAGE_SIZE;
 		sb->s_blocksize_bits = PAGE_SHIFT;
 	}
+	if (d.cache_security_xattr_presence)
+		sb->s_flags |= MS_NOSEC;
 	sb->s_magic = FUSE_SUPER_MAGIC;
 	sb->s_op = &fuse_super_operations;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;