[v4,03/16] fs: allow __fput_sync to be used by non-kthreads and in modules
diff mbox

Message ID 1441968882-7851-4-git-send-email-jeff.layton@primarydata.com
State New
Headers show

Commit Message

Jeff Layton Sept. 11, 2015, 10:54 a.m. UTC
We want nfsd to keep a cache of open files, but that would potentially
block userland callers from obtaining leases on them. To fix this,
we'll be adding a new notifier chain to the lease code that will call
back into nfsd on any attempt to set a FL_LEASE. nfsd can then close
any open files for that inode in advance of that.

The problem however is that since that notifier will run in normal
process context, the final __fput will be delayed a'la task_work and we
are still unable to set a lease. What we need to do is to put the struct
file synchronously so that the __fput runs before returning from the
notifier call.

The comments over __fput_sync and the BUG_ON in there mandate that it
should only be used in kthread context, but I see no reason why that
should be so. As long as the caller avoids holding locks that may be
problematic, it should be OK to use it from normal process context as
well.

Remove the __ prefix and the BUG_ON from that function and update the
comments over it. Also export it so that it can be used from nfsd code,
and move the export of fput just below the function definition.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 27 ++++++++++++++-------------
 include/linux/file.h |  2 +-
 kernel/acct.c        |  2 +-
 3 files changed, 16 insertions(+), 15 deletions(-)

Comments

Al Viro Sept. 11, 2015, 2 p.m. UTC | #1
On Fri, Sep 11, 2015 at 06:54:29AM -0400, Jeff Layton wrote:
> We want nfsd to keep a cache of open files, but that would potentially
> block userland callers from obtaining leases on them. To fix this,
> we'll be adding a new notifier chain to the lease code that will call
> back into nfsd on any attempt to set a FL_LEASE. nfsd can then close
> any open files for that inode in advance of that.
> 
> The problem however is that since that notifier will run in normal
> process context, the final __fput will be delayed a'la task_work and we
> are still unable to set a lease. What we need to do is to put the struct
> file synchronously so that the __fput runs before returning from the
> notifier call.
> 
> The comments over __fput_sync and the BUG_ON in there mandate that it
> should only be used in kthread context, but I see no reason why that
> should be so. As long as the caller avoids holding locks that may be
> problematic, it should be OK to use it from normal process context as
> well.
> 
> Remove the __ prefix and the BUG_ON from that function and update the
> comments over it. Also export it so that it can be used from nfsd code,
> and move the export of fput just below the function definition.

I really don't like that.
	a) how deep in kernel stack will that thing run?
	b) what locking environment is expected in your case?

And opening it for use by any random driver that just feels like e.g.
using it to go parse its config over there in /lib/we/are/special/wank.conf
with 5Kb worth of kernel stack already eaten is a really bad idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Sept. 11, 2015, 2:38 p.m. UTC | #2
On Fri, 11 Sep 2015 15:00:49 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Fri, Sep 11, 2015 at 06:54:29AM -0400, Jeff Layton wrote:
> > We want nfsd to keep a cache of open files, but that would potentially
> > block userland callers from obtaining leases on them. To fix this,
> > we'll be adding a new notifier chain to the lease code that will call
> > back into nfsd on any attempt to set a FL_LEASE. nfsd can then close
> > any open files for that inode in advance of that.
> > 
> > The problem however is that since that notifier will run in normal
> > process context, the final __fput will be delayed a'la task_work and we
> > are still unable to set a lease. What we need to do is to put the struct
> > file synchronously so that the __fput runs before returning from the
> > notifier call.
> > 
> > The comments over __fput_sync and the BUG_ON in there mandate that it
> > should only be used in kthread context, but I see no reason why that
> > should be so. As long as the caller avoids holding locks that may be
> > problematic, it should be OK to use it from normal process context as
> > well.
> > 
> > Remove the __ prefix and the BUG_ON from that function and update the
> > comments over it. Also export it so that it can be used from nfsd code,
> > and move the export of fput just below the function definition.
> 
> I really don't like that.
> 	a) how deep in kernel stack will that thing run?
> 	b) what locking environment is expected in your case?
> 
> And opening it for use by any random driver that just feels like e.g.
> using it to go parse its config over there in /lib/we/are/special/wank.conf
> with 5Kb worth of kernel stack already eaten is a really bad idea.


Not too deep in our case, and with no real locking held aside from a
SRCU lock. Basically we're going to have a SRCU notifier chain that
will run from vfs_setlease. That will call back into the nfsd code when
it's running which will scan the hash for open files for the inode,
unhash and release them (synchronously). If they're being held open in
the cache but are otherwise idle, that's enough to allow a lease to be
acquired.

That said, I'm not thrilled with it either. There are some
alternatives:

1) we could just call task_work_run after the fput, but that seems
scary if (e.g.) some random interrupt walks in and queues up some
task_work.

2) we could add a "delayed_fput(file)", that adds it to the
delayed_fput_list, even when being run from normal process context.
Then we could just flush_delayed_fput() afterward. More context
switching, but that should be relatively safe I'd think.

Patch
diff mbox

diff --git a/fs/file_table.c b/fs/file_table.c
index f4833af62eae..6769ed45c35f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -280,25 +280,26 @@  void fput(struct file *file)
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
+EXPORT_SYMBOL(fput);
 
 /*
- * synchronous analog of fput(); for kernel threads that might be needed
- * in some umount() (and thus can't use flush_delayed_fput() without
- * risking deadlocks), need to wait for completion of __fput() and know
- * for this specific struct file it won't involve anything that would
- * need them.  Use only if you really need it - at the very least,
- * don't blindly convert fput() by kernel thread to that.
+ * synchronous analog of fput(); this is necessary for tasks
+ * that might be needed in some umount() (and thus can't use
+ * flush_delayed_fput() without risking deadlocks), need to wait for
+ * completion of __fput() and know for this specific struct file it
+ * won't involve anything that would need them. It's also necessary
+ * for nfsd, which needs to be able to synchronously close files
+ * on which userspace programs are trying to set leases.
+ *
+ * Use only if you really need it - at the very least, don't blindly
+ * convert fput() to this.
  */
-void __fput_sync(struct file *file)
+void fput_sync(struct file *file)
 {
-	if (atomic_long_dec_and_test(&file->f_count)) {
-		struct task_struct *task = current;
-		BUG_ON(!(task->flags & PF_KTHREAD));
+	if (atomic_long_dec_and_test(&file->f_count))
 		__fput(file);
-	}
 }
-
-EXPORT_SYMBOL(fput);
+EXPORT_SYMBOL(fput_sync);
 
 void put_filp(struct file *file)
 {
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..046a8c477b9a 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -71,6 +71,6 @@  extern void put_unused_fd(unsigned int fd);
 extern void fd_install(unsigned int fd, struct file *file);
 
 extern void flush_delayed_fput(void);
-extern void __fput_sync(struct file *);
+extern void fput_sync(struct file *);
 
 #endif /* __LINUX_FILE_H */
diff --git a/kernel/acct.c b/kernel/acct.c
index 74963d192c5d..b58300ebd819 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -183,7 +183,7 @@  static void close_work(struct work_struct *work)
 	struct file *file = acct->file;
 	if (file->f_op->flush)
 		file->f_op->flush(file, NULL);
-	__fput_sync(file);
+	fput_sync(file);
 	complete(&acct->done);
 }