[v3,15/32] NFSD: Leave open files out of the filecache LRU

Message ID	165730471781.28142.13547044100953437563.stgit@klimt.1015granger.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> Subject: [PATCH v3 15/32] NFSD: Leave open files out of the filecache LRU From: Chuck Lever <chuck.lever@oracle.com> To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: david@fromorbit.com, jlayton@redhat.com, tgraf@suug.ch Date: Fri, 08 Jul 2022 14:25:17 -0400 Message-ID: <165730471781.28142.13547044100953437563.stgit@klimt.1015granger.net> In-Reply-To: <165730437087.28142.6731645688073512500.stgit@klimt.1015granger.net> References: <165730437087.28142.6731645688073512500.stgit@klimt.1015granger.net> User-Agent: StGit/1.5.dev3+g9561319 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	Overhaul NFSD filecache \| expand [v3,00/32] Overhaul NFSD filecache [v3,01/32] NFSD: Demote a WARN to a pr_warn() [v3,02/32] NFSD: Report filecache LRU size [v3,03/32] NFSD: Report count of calls to nfsd_file_acquire() [v3,04/32] NFSD: Report count of freed filecache items [v3,05/32] NFSD: Report average age of filecache items [v3,06/32] NFSD: Add nfsd_file_lru_dispose_list() helper [v3,07/32] NFSD: Refactor nfsd_file_gc() [v3,08/32] NFSD: Refactor nfsd_file_lru_scan() [v3,09/32] NFSD: Report the number of items evicted by the LRU walk [v3,10/32] NFSD: Record number of flush calls [v3,11/32] NFSD: Zero counters when the filecache is re-initialized [v3,12/32] NFSD: Hook up the filecache stat file [v3,13/32] NFSD: WARN when freeing an item still linked via nf_lru [v3,14/32] NFSD: Trace filecache LRU activity [v3,15/32] NFSD: Leave open files out of the filecache LRU [v3,16/32] NFSD: Fix the filecache LRU shrinker [v3,17/32] NFSD: Never call nfsd_file_gc() in foreground paths [v3,18/32] NFSD: No longer record nf_hashval in the trace log [v3,19/32] NFSD: Remove lockdep assertion from unhash_and_release_locked() [v3,20/32] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode [v3,21/32] NFSD: Refactor __nfsd_file_close_inode() [v3,22/32] NFSD: nfsd_file_hash_remove can compute hashval [v3,23/32] NFSD: Remove nfsd_file::nf_hashval [v3,24/32] NFSD: Replace the "init once" mechanism [v3,25/32] NFSD: Set up an rhashtable for the filecache [v3,26/32] NFSD: Convert the filecache to use rhashtable [v3,27/32] NFSD: Clean up unused code after rhashtable conversion [v3,28/32] NFSD: Separate tracepoints for acquire and create [v3,29/32] NFSD: Move nfsd_file_trace_alloc() tracepoint [v3,30/32] NFSD: Update the nfsd_file_fsnotify_handle_event() tracepoint [v3,31/32] NFSD: NFSv4 CLOSE should release an nfsd_file immediately [v3,32/32] NFSD: Ensure nf_inode is never dereferenced

Message ID

165730471781.28142.13547044100953437563.stgit@klimt.1015granger.net (mailing list archive)

State

New, archived

Headers

Subject: [PATCH v3 15/32] NFSD: Leave open files out of the filecache LRU
From: Chuck Lever <chuck.lever@oracle.com>
To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org
Cc: david@fromorbit.com, jlayton@redhat.com, tgraf@suug.ch
Date: Fri, 08 Jul 2022 14:25:17 -0400
Message-ID: 
 <165730471781.28142.13547044100953437563.stgit@klimt.1015granger.net>
In-Reply-To: 
 <165730437087.28142.6731645688073512500.stgit@klimt.1015granger.net>
References: 
 <165730437087.28142.6731645688073512500.stgit@klimt.1015granger.net>
User-Agent: StGit/1.5.dev3+g9561319
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Precedence: bulk

Series

Overhaul NFSD filecache | expand

Commit Message

Chuck Lever July 8, 2022, 6:25 p.m. UTC

There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:

fs/nfsd/filecache.c
 482                 ret = list_lru_walk(&nfsd_file_lru,
 483                                 nfsd_file_lru_cb,
 484                                 &head, LONG_MAX);

causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.

What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.

Address this by ensuring that open files are not kept on the LRU
list.

Reported-by: Frank van der Linden <fllinden@amazon.com>
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   24 +++++++++++++++++++-----
 fs/nfsd/trace.h     |    2 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

Comments

Jeff Layton July 8, 2022, 7:29 p.m. UTC | #1

On Fri, 2022-07-08 at 14:25 -0400, Chuck Lever wrote:
> There have been reports of problems when running fstests generic/531
> against Linux NFS servers with NFSv4. The NFS server that hosts the
> test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
> Analysis shows that:
> 
> fs/nfsd/filecache.c
>  482                 ret = list_lru_walk(&nfsd_file_lru,
>  483                                 nfsd_file_lru_cb,
>  484                                 &head, LONG_MAX);
> 
> causes nfsd_file_gc() to walk the entire length of the filecache LRU
> list every time it is called (which is quite frequently). The walk
> holds a spinlock the entire time that prevents other nfsd threads
> from accessing the filecache.
> 
> What's more, for NFSv4 workloads, none of the items that are visited
> during this walk may be evicted, since they are all files that are
> held OPEN by NFS clients.
> 
> Address this by ensuring that open files are not kept on the LRU
> list.
> 
> Reported-by: Frank van der Linden <fllinden@amazon.com>
> Reported-by: Wang Yugui <wangyugui@e16-tech.com>
> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
> Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c |   24 +++++++++++++++++++-----
>  fs/nfsd/trace.h     |    2 ++
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 37373b012276..6e9e186334ab 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -269,6 +269,7 @@ nfsd_file_flush(struct nfsd_file *nf)
>  
>  static void nfsd_file_lru_add(struct nfsd_file *nf)
>  {
> +	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>  	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
>  		trace_nfsd_file_lru_add(nf);
>  }
> @@ -298,7 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
>  {
>  	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>  		nfsd_file_do_unhash(nf);
> -		nfsd_file_lru_remove(nf);
>  		return true;
>  	}
>  	return false;
> @@ -319,6 +319,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
>  	if (refcount_dec_not_one(&nf->nf_ref))
>  		return true;
>  
> +	nfsd_file_lru_remove(nf);
>  	list_add(&nf->nf_lru, dispose);
>  	return true;
>  }
> @@ -330,6 +331,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
>  
>  	if (refcount_dec_and_test(&nf->nf_ref)) {
>  		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
> +		nfsd_file_lru_remove(nf);
>  		nfsd_file_free(nf);
>  	}
>  }
> @@ -339,7 +341,7 @@ nfsd_file_put(struct nfsd_file *nf)
>  {
>  	might_sleep();
>  
> -	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> +	nfsd_file_lru_add(nf);

Do you really want to add this on every put? I would have thought you'd
only want to do this on a 2->1 nf_ref transition.

>  	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>  		nfsd_file_flush(nf);
>  		nfsd_file_put_noref(nf);
> @@ -439,8 +441,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
>  	}
>  }
>  
> -/*
> +/**
> + * nfsd_file_lru_cb - Examine an entry on the LRU list
> + * @item: LRU entry to examine
> + * @lru: controlling LRU
> + * @lock: LRU list lock (unused)
> + * @arg: dispose list
> + *
>   * Note this can deadlock with nfsd_file_cache_purge.
> + *
> + * Return values:
> + *   %LRU_REMOVED: @item was removed from the LRU
> + *   %LRU_SKIP: @item cannot be evicted
>   */
>  static enum lru_status
>  nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> @@ -462,8 +474,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
>  	 * That order is deliberate to ensure that we can do this locklessly.
>  	 */
>  	if (refcount_read(&nf->nf_ref) > 1) {
> +		list_lru_isolate(lru, &nf->nf_lru);
>  		trace_nfsd_file_gc_in_use(nf);
> -		return LRU_SKIP;
> +		return LRU_REMOVED;

Interesting. So you wait until the LRU scanner runs to remove these
entries? I expected to see you do this in nfsd_file_get, but this does
seem likely to be more efficient.

>  	}
>  
>  	/*
> @@ -1020,6 +1033,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		goto retry;
>  	}
>  
> +	nfsd_file_lru_remove(nf);
>  	this_cpu_inc(nfsd_file_cache_hits);
>  
>  	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
> @@ -1055,7 +1069,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	refcount_inc(&nf->nf_ref);
>  	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
>  	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> -	nfsd_file_lru_add(nf);
>  	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
>  	++nfsd_file_hashtbl[hashval].nfb_count;
>  	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
> @@ -1080,6 +1093,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	 */
>  	if (status != nfs_ok || inode->i_nlink == 0) {
>  		bool do_free;
> +		nfsd_file_lru_remove(nf);
>  		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
>  		do_free = nfsd_file_unhash(nf);
>  		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 1cc1133371eb..54082b868b72 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -929,7 +929,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name,					\
>  	TP_ARGS(nf))
>  
>  DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
> +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
>  DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
> +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
>  DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
>  DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
>  DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
> 
>

Chuck Lever July 9, 2022, 8:45 p.m. UTC | #2

> On Jul 8, 2022, at 3:29 PM, Jeff Layton <jlayton@redhat.com> wrote:
> 
> On Fri, 2022-07-08 at 14:25 -0400, Chuck Lever wrote:
>> There have been reports of problems when running fstests generic/531
>> against Linux NFS servers with NFSv4. The NFS server that hosts the
>> test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
>> Analysis shows that:
>> 
>> fs/nfsd/filecache.c
>> 482 ret = list_lru_walk(&nfsd_file_lru,
>> 483 nfsd_file_lru_cb,
>> 484 &head, LONG_MAX);
>> 
>> causes nfsd_file_gc() to walk the entire length of the filecache LRU
>> list every time it is called (which is quite frequently). The walk
>> holds a spinlock the entire time that prevents other nfsd threads
>> from accessing the filecache.
>> 
>> What's more, for NFSv4 workloads, none of the items that are visited
>> during this walk may be evicted, since they are all files that are
>> held OPEN by NFS clients.
>> 
>> Address this by ensuring that open files are not kept on the LRU
>> list.
>> 
>> Reported-by: Frank van der Linden <fllinden@amazon.com>
>> Reported-by: Wang Yugui <wangyugui@e16-tech.com>
>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
>> Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/filecache.c | 24 +++++++++++++++++++-----
>> fs/nfsd/trace.h | 2 ++
>> 2 files changed, 21 insertions(+), 5 deletions(-)
>> 
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 37373b012276..6e9e186334ab 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -269,6 +269,7 @@ nfsd_file_flush(struct nfsd_file *nf)
>> 
>> static void nfsd_file_lru_add(struct nfsd_file *nf)
>> {
>> +	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>> 	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
>> 		trace_nfsd_file_lru_add(nf);
>> }
>> @@ -298,7 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
>> {
>> 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>> 		nfsd_file_do_unhash(nf);
>> -		nfsd_file_lru_remove(nf);
>> 		return true;
>> 	}
>> 	return false;
>> @@ -319,6 +319,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
>> 	if (refcount_dec_not_one(&nf->nf_ref))
>> 		return true;
>> 
>> +	nfsd_file_lru_remove(nf);
>> 	list_add(&nf->nf_lru, dispose);
>> 	return true;
>> }
>> @@ -330,6 +331,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
>> 
>> 	if (refcount_dec_and_test(&nf->nf_ref)) {
>> 		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
>> +		nfsd_file_lru_remove(nf);
>> 		nfsd_file_free(nf);
>> 	}
>> }
>> @@ -339,7 +341,7 @@ nfsd_file_put(struct nfsd_file *nf)
>> {
>> 	might_sleep();
>> 
>> -	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>> +	nfsd_file_lru_add(nf);
> 
> Do you really want to add this on every put? I would have thought you'd
> only want to do this on a 2->1 nf_ref transition.

My measurements indicate that 2->1 is the common case, so checking
that this is /not/ a 2->1 transition doesn't confer much if any
benefit.

Under load, I don't see any contention on the LRU locks, which is
where I'd expect to see a problem if this design were not efficient.


>> 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>> 		nfsd_file_flush(nf);
>> 		nfsd_file_put_noref(nf);
>> @@ -439,8 +441,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
>> 	}
>> }
>> 
>> -/*
>> +/**
>> + * nfsd_file_lru_cb - Examine an entry on the LRU list
>> + * @item: LRU entry to examine
>> + * @lru: controlling LRU
>> + * @lock: LRU list lock (unused)
>> + * @arg: dispose list
>> + *
>> * Note this can deadlock with nfsd_file_cache_purge.
>> + *
>> + * Return values:
>> + * %LRU_REMOVED: @item was removed from the LRU
>> + * %LRU_SKIP: @item cannot be evicted
>> */
>> static enum lru_status
>> nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
>> @@ -462,8 +474,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
>> 	 * That order is deliberate to ensure that we can do this locklessly.
>> 	 */
>> 	if (refcount_read(&nf->nf_ref) > 1) {
>> +		list_lru_isolate(lru, &nf->nf_lru);
>> 		trace_nfsd_file_gc_in_use(nf);
>> -		return LRU_SKIP;
>> +		return LRU_REMOVED;
> 
> Interesting. So you wait until the LRU scanner runs to remove these
> entries? I expected to see you do this in nfsd_file_get, but this does
> seem likely to be more efficient.
> 
>> 	}
>> 
>> 	/*
>> @@ -1020,6 +1033,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 		goto retry;
>> 	}
>> 
>> +	nfsd_file_lru_remove(nf);
>> 	this_cpu_inc(nfsd_file_cache_hits);
>> 
>> 	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
>> @@ -1055,7 +1069,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 	refcount_inc(&nf->nf_ref);
>> 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
>> 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
>> -	nfsd_file_lru_add(nf);
>> 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
>> 	++nfsd_file_hashtbl[hashval].nfb_count;
>> 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
>> @@ -1080,6 +1093,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> 	 */
>> 	if (status != nfs_ok || inode->i_nlink == 0) {
>> 		bool do_free;
>> +		nfsd_file_lru_remove(nf);
>> 		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
>> 		do_free = nfsd_file_unhash(nf);
>> 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
>> index 1cc1133371eb..54082b868b72 100644
>> --- a/fs/nfsd/trace.h
>> +++ b/fs/nfsd/trace.h
>> @@ -929,7 +929,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name,					\
>> 	TP_ARGS(nf))
>> 
>> DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
>> +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
>> DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
>> +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
>> DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
>> DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
>> DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
>> 
>> 
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

--
Chuck Lever

Jeff Layton July 11, 2022, 11:39 a.m. UTC | #3

On Sat, 2022-07-09 at 20:45 +0000, Chuck Lever III wrote:
> 
> > On Jul 8, 2022, at 3:29 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Fri, 2022-07-08 at 14:25 -0400, Chuck Lever wrote:
> > > There have been reports of problems when running fstests generic/531
> > > against Linux NFS servers with NFSv4. The NFS server that hosts the
> > > test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
> > > Analysis shows that:
> > > 
> > > fs/nfsd/filecache.c
> > > 482 ret = list_lru_walk(&nfsd_file_lru,
> > > 483 nfsd_file_lru_cb,
> > > 484 &head, LONG_MAX);
> > > 
> > > causes nfsd_file_gc() to walk the entire length of the filecache LRU
> > > list every time it is called (which is quite frequently). The walk
> > > holds a spinlock the entire time that prevents other nfsd threads
> > > from accessing the filecache.
> > > 
> > > What's more, for NFSv4 workloads, none of the items that are visited
> > > during this walk may be evicted, since they are all files that are
> > > held OPEN by NFS clients.
> > > 
> > > Address this by ensuring that open files are not kept on the LRU
> > > list.
> > > 
> > > Reported-by: Frank van der Linden <fllinden@amazon.com>
> > > Reported-by: Wang Yugui <wangyugui@e16-tech.com>
> > > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
> > > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > fs/nfsd/filecache.c | 24 +++++++++++++++++++-----
> > > fs/nfsd/trace.h | 2 ++
> > > 2 files changed, 21 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 37373b012276..6e9e186334ab 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -269,6 +269,7 @@ nfsd_file_flush(struct nfsd_file *nf)
> > > 
> > > static void nfsd_file_lru_add(struct nfsd_file *nf)
> > > {
> > > +	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> > > 	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
> > > 		trace_nfsd_file_lru_add(nf);
> > > }
> > > @@ -298,7 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf)
> > > {
> > > 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > > 		nfsd_file_do_unhash(nf);
> > > -		nfsd_file_lru_remove(nf);
> > > 		return true;
> > > 	}
> > > 	return false;
> > > @@ -319,6 +319,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
> > > 	if (refcount_dec_not_one(&nf->nf_ref))
> > > 		return true;
> > > 
> > > +	nfsd_file_lru_remove(nf);
> > > 	list_add(&nf->nf_lru, dispose);
> > > 	return true;
> > > }
> > > @@ -330,6 +331,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
> > > 
> > > 	if (refcount_dec_and_test(&nf->nf_ref)) {
> > > 		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
> > > +		nfsd_file_lru_remove(nf);
> > > 		nfsd_file_free(nf);
> > > 	}
> > > }
> > > @@ -339,7 +341,7 @@ nfsd_file_put(struct nfsd_file *nf)
> > > {
> > > 	might_sleep();
> > > 
> > > -	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> > > +	nfsd_file_lru_add(nf);
> > 
> > Do you really want to add this on every put? I would have thought you'd
> > only want to do this on a 2->1 nf_ref transition.
> 
> My measurements indicate that 2->1 is the common case, so checking
> that this is /not/ a 2->1 transition doesn't confer much if any
> benefit.
> 
> Under load, I don't see any contention on the LRU locks, which is
> where I'd expect to see a problem if this design were not efficient.
> 
> 

Fair enough. I guess the idea is to throw it onto the LRU and the
scanner will just (eventually) take it off again without reaping it.

You can add my Reviewed-by: to this one as well.


> > > 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
> > > 		nfsd_file_flush(nf);
> > > 		nfsd_file_put_noref(nf);
> > > @@ -439,8 +441,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > > 	}
> > > }
> > > 
> > > -/*
> > > +/**
> > > + * nfsd_file_lru_cb - Examine an entry on the LRU list
> > > + * @item: LRU entry to examine
> > > + * @lru: controlling LRU
> > > + * @lock: LRU list lock (unused)
> > > + * @arg: dispose list
> > > + *
> > > * Note this can deadlock with nfsd_file_cache_purge.
> > > + *
> > > + * Return values:
> > > + * %LRU_REMOVED: @item was removed from the LRU
> > > + * %LRU_SKIP: @item cannot be evicted
> > > */
> > > static enum lru_status
> > > nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> > > @@ -462,8 +474,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
> > > 	 * That order is deliberate to ensure that we can do this locklessly.
> > > 	 */
> > > 	if (refcount_read(&nf->nf_ref) > 1) {
> > > +		list_lru_isolate(lru, &nf->nf_lru);
> > > 		trace_nfsd_file_gc_in_use(nf);
> > > -		return LRU_SKIP;
> > > +		return LRU_REMOVED;
> > 
> > Interesting. So you wait until the LRU scanner runs to remove these
> > entries? I expected to see you do this in nfsd_file_get, but this does
> > seem likely to be more efficient.
> > 
> > > 	}
> > > 
> > > 	/*
> > > @@ -1020,6 +1033,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 		goto retry;
> > > 	}
> > > 
> > > +	nfsd_file_lru_remove(nf);
> > > 	this_cpu_inc(nfsd_file_cache_hits);
> > > 
> > > 	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
> > > @@ -1055,7 +1069,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 	refcount_inc(&nf->nf_ref);
> > > 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> > > 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> > > -	nfsd_file_lru_add(nf);
> > > 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
> > > 	++nfsd_file_hashtbl[hashval].nfb_count;
> > > 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
> > > @@ -1080,6 +1093,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > 	 */
> > > 	if (status != nfs_ok || inode->i_nlink == 0) {
> > > 		bool do_free;
> > > +		nfsd_file_lru_remove(nf);
> > > 		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
> > > 		do_free = nfsd_file_unhash(nf);
> > > 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
> > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > > index 1cc1133371eb..54082b868b72 100644
> > > --- a/fs/nfsd/trace.h
> > > +++ b/fs/nfsd/trace.h
> > > @@ -929,7 +929,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name,					\
> > > 	TP_ARGS(nf))
> > > 
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
> > > +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
> > > +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
> > > DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
> > > 
> > > 
> > 
> > -- 
> > Jeff Layton <jlayton@redhat.com>
> 
> --
> Chuck Lever
> 
> 
>

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 37373b012276..6e9e186334ab 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -269,6 +269,7 @@  nfsd_file_flush(struct nfsd_file *nf)
 
 static void nfsd_file_lru_add(struct nfsd_file *nf)
 {
+	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
 	if (list_lru_add(&nfsd_file_lru, &nf->nf_lru))
 		trace_nfsd_file_lru_add(nf);
 }
@@ -298,7 +299,6 @@  nfsd_file_unhash(struct nfsd_file *nf)
 {
 	if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
 		nfsd_file_do_unhash(nf);
-		nfsd_file_lru_remove(nf);
 		return true;
 	}
 	return false;
@@ -319,6 +319,7 @@  nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp
 	if (refcount_dec_not_one(&nf->nf_ref))
 		return true;
 
+	nfsd_file_lru_remove(nf);
 	list_add(&nf->nf_lru, dispose);
 	return true;
 }
@@ -330,6 +331,7 @@  nfsd_file_put_noref(struct nfsd_file *nf)
 
 	if (refcount_dec_and_test(&nf->nf_ref)) {
 		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+		nfsd_file_lru_remove(nf);
 		nfsd_file_free(nf);
 	}
 }
@@ -339,7 +341,7 @@  nfsd_file_put(struct nfsd_file *nf)
 {
 	might_sleep();
 
-	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+	nfsd_file_lru_add(nf);
 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
 		nfsd_file_flush(nf);
 		nfsd_file_put_noref(nf);
@@ -439,8 +441,18 @@  nfsd_file_dispose_list_delayed(struct list_head *dispose)
 	}
 }
 
-/*
+/**
+ * nfsd_file_lru_cb - Examine an entry on the LRU list
+ * @item: LRU entry to examine
+ * @lru: controlling LRU
+ * @lock: LRU list lock (unused)
+ * @arg: dispose list
+ *
  * Note this can deadlock with nfsd_file_cache_purge.
+ *
+ * Return values:
+ *   %LRU_REMOVED: @item was removed from the LRU
+ *   %LRU_SKIP: @item cannot be evicted
  */
 static enum lru_status
 nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
@@ -462,8 +474,9 @@  nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 	 * That order is deliberate to ensure that we can do this locklessly.
 	 */
 	if (refcount_read(&nf->nf_ref) > 1) {
+		list_lru_isolate(lru, &nf->nf_lru);
 		trace_nfsd_file_gc_in_use(nf);
-		return LRU_SKIP;
+		return LRU_REMOVED;
 	}
 
 	/*
@@ -1020,6 +1033,7 @@  nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		goto retry;
 	}
 
+	nfsd_file_lru_remove(nf);
 	this_cpu_inc(nfsd_file_cache_hits);
 
 	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
@@ -1055,7 +1069,6 @@  nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	refcount_inc(&nf->nf_ref);
 	__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 	__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
-	nfsd_file_lru_add(nf);
 	hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
 	++nfsd_file_hashtbl[hashval].nfb_count;
 	nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
@@ -1080,6 +1093,7 @@  nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 */
 	if (status != nfs_ok || inode->i_nlink == 0) {
 		bool do_free;
+		nfsd_file_lru_remove(nf);
 		spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
 		do_free = nfsd_file_unhash(nf);
 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 1cc1133371eb..54082b868b72 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -929,7 +929,9 @@  DEFINE_EVENT(nfsd_file_gc_class, name,					\
 	TP_ARGS(nf))
 
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del);
+DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback);
 DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);

[v3,15/32] NFSD: Leave open files out of the filecache LRU

Commit Message

Comments

Patch