fs: have flush_delayed_fput flush the workqueue job
diff mbox

Message ID 1441128953-16577-1-git-send-email-jeff.layton@primarydata.com
State New
Headers show

Commit Message

Jeff Layton Sept. 1, 2015, 5:35 p.m. UTC
I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work.  It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should just block until the workqueue job completes if it's
already running.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jeff Layton Sept. 1, 2015, 5:57 p.m. UTC | #1
On Tue,  1 Sep 2015 13:35:53 -0400
Jeff Layton <jlayton@poochiereds.net> wrote:

> I think there's a potential race in flush_delayed_fput. A kthread does
> an fput() and that file gets added to the list and the delayed work is
> scheduled. More than 1 jiffy passes, and the workqueue thread picks up
> the work and starts running it. Then the kthread calls
> flush_delayed_work.  It sees that the list is empty and returns
> immediately, even though the __fput for its file may not have run yet.
> 
> Close this by making flush_delayed_fput use flush_delayed_work instead,
> which should just block until the workqueue job completes if it's
> already running.
> 
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
>  fs/file_table.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 7f9d407c7595..f4833af62eae 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -243,6 +243,8 @@ static void ____fput(struct callback_head *work)
>  	__fput(container_of(work, struct file, f_u.fu_rcuhead));
>  }
>  
> +static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> +
>  /*
>   * If kernel thread really needs to have the final fput() it has done
>   * to complete, call this.  The only user right now is the boot - we
> @@ -255,11 +257,9 @@ static void ____fput(struct callback_head *work)
>   */
>  void flush_delayed_fput(void)
>  {
> -	delayed_fput(NULL);
> +	flush_delayed_work(&delayed_fput_work);
>  }
>  
> -static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> -
>  void fput(struct file *file)
>  {
>  	if (atomic_long_dec_and_test(&file->f_count)) {

Hi Al,

I just noticed this by inspection. I had a suspicion that I had hit
this race in some code I'm developing, but it turned out to be a
different problem. Still, I think the race looks plausible.

I do have another (somewhat related) problem though that I could use
your advice on. In a nutshell, there doesn't seem to be an equivalent
to flush_delayed_fput for fputs that have been task_work_add'ed.

I'm working on an open file cache for knfsd. What we want to do is to
allow knfsd to hold files open indefinitely until:

- the exports cache is flushed
- there is a REMOVE or RENAME call that would unlink the dentry
- someone wants to set a lease on the file

The setlease the main problem. When we close down the cache entry, we
call fput on the struct file we hold, but since it's running in normal
process context it gets queued to task_work. We need the final __fput
to run before we try to set the lease though or the refcounts will
remain high and block the lease attempt.

Is it reasonable to call task_work_run() to flush the work, or will
that turn out to be nasty? Alternately, could we allow __fput_sync to
be run from normal process context? If so, then I could just
synchronously put the files when I need to close them to allow a
setlease to proceed.

What would be the best approach for flushing delayed fputs from normal
process context, before returning to userland?

Patch
diff mbox

diff --git a/fs/file_table.c b/fs/file_table.c
index 7f9d407c7595..f4833af62eae 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -243,6 +243,8 @@  static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -255,11 +257,9 @@  static void ____fput(struct callback_head *work)
  */
 void flush_delayed_fput(void)
 {
-	delayed_fput(NULL);
+	flush_delayed_work(&delayed_fput_work);
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {