diff mbox series

[v1] nfsd: drop the lock during filecache LRU scans

Message ID 20250109142438.18689-2-cel@kernel.org (mailing list archive)
State New
Headers show
Series [v1] nfsd: drop the lock during filecache LRU scans | expand

Commit Message

Chuck Lever Jan. 9, 2025, 2:24 p.m. UTC
From: NeilBrown <neilb@suse.de>

Under a high NFSv3 load with lots of different file being accessed,
the LRU list of garbage-collectable files can become quite long.

Asking list_lru_scan() to scan the whole list can result in a long
period during which a spinlock is held, blocking the addition and
removal of LRU items.

So ask list_lru_scan() to scan only a few entries at a time, and
repeat until the scan is complete.

Fixes: edead3a55804 ("NFSD: Fix the filecache LRU shrinker")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 14 ++++++++++----
 fs/nfsd/filecache.h |  6 ++++++
 2 files changed, 16 insertions(+), 4 deletions(-)

Updated version of Neil's patch to break up filecache LRU scans.
This can be backported to LTS kernels -- a Fixes tag has been
proposed.

Subsequent work can replace the list_lru mechanism. That would
be more substantial and thus more challenging to backport.

There are two concerns here:

 - The number of items in the LRU can now change while this loop is
   operating. We might need a "if (!ret) break;" or some other exit
   condition to prevent an infinite loop.

 - The list_lru_walk() always starts at the tail of the LRU, rather
   than picking up where it left off. So it might only visit the
   same handful of items on the list repeatedly, reintroducing the
   bug fixed by edead3a55804.

Comments

NeilBrown Jan. 9, 2025, 11:49 p.m. UTC | #1
On Fri, 10 Jan 2025, cel@kernel.org wrote:
> From: NeilBrown <neilb@suse.de>
> 
> Under a high NFSv3 load with lots of different file being accessed,
> the LRU list of garbage-collectable files can become quite long.
> 
> Asking list_lru_scan() to scan the whole list can result in a long
> period during which a spinlock is held, blocking the addition and
> removal of LRU items.
> 
> So ask list_lru_scan() to scan only a few entries at a time, and
> repeat until the scan is complete.
> 
> Fixes: edead3a55804 ("NFSD: Fix the filecache LRU shrinker")
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 14 ++++++++++----
>  fs/nfsd/filecache.h |  6 ++++++
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> Updated version of Neil's patch to break up filecache LRU scans.
> This can be backported to LTS kernels -- a Fixes tag has been
> proposed.
> 
> Subsequent work can replace the list_lru mechanism. That would
> be more substantial and thus more challenging to backport.
> 
> There are two concerns here:
> 
>  - The number of items in the LRU can now change while this loop is
>    operating. We might need a "if (!ret) break;" or some other exit
>    condition to prevent an infinite loop.

Not infinite - it is still bounded by the original list size.
You would expect the list walk to see all the "REFERENCED" files first.
These return LRU_ROTATE so the return value (which is count of
"isolated") will not increase, but nr_to_skip will decrease.  So you
would expect the first few results to be zero until all the REFERENCED
bits are cleared and those are rotated to the end.  Aborting then would
not be good.

> 
>  - The list_lru_walk() always starts at the tail of the LRU, rather
>    than picking up where it left off. So it might only visit the
>    same handful of items on the list repeatedly, reintroducing the
>    bug fixed by edead3a55804.

If you change the remaining LRU_SKIP to LRU_ROTATE then it will always
remove the first few entries, either freeing them or rotating them to
the head of the list.  So it won't repeat unless items have been
removed.  In that case things might be freed more quickly than we would
like.  Maybe it would help to only process 80% of the list size, as it
is better to leave a few for the next time around, rather than close too
early.

NeilBrown


> 
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a1cdba42c4fa..fcd751cb7c76 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -541,13 +541,19 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
>  static void
>  nfsd_file_gc(void)
>  {
> +	unsigned long remaining = list_lru_count(&nfsd_file_lru);
>  	LIST_HEAD(dispose);
>  	unsigned long ret;
>  
> -	ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
> -			    &dispose, list_lru_count(&nfsd_file_lru));
> -	trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
> -	nfsd_file_dispose_list_delayed(&dispose);
> +	while (remaining > 0) {
> +		unsigned long num_to_scan = min(remaining, NFSD_FILE_GC_BATCH);
> +
> +		ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
> +				    &dispose, num_to_scan);
> +		trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
> +		nfsd_file_dispose_list_delayed(&dispose);
> +		remaining -= num_to_scan;
> +	}
>  }
>  
>  static void
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index d5db6b34ba30..463bd60b98b4 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -3,6 +3,12 @@
>  
>  #include <linux/fsnotify_backend.h>
>  
> +/*
> + * Limit the time that the list_lru_one lock is held during
> + * an LRU scan.
> + */
> +#define NFSD_FILE_GC_BATCH	(32UL)
> +
>  /*
>   * This is the fsnotify_mark container that nfsd attaches to the files that it
>   * is holding open. Note that we have a separate refcount here aside from the
> -- 
> 2.47.0
> 
> 
>
diff mbox series

Patch

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a1cdba42c4fa..fcd751cb7c76 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -541,13 +541,19 @@  nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
 static void
 nfsd_file_gc(void)
 {
+	unsigned long remaining = list_lru_count(&nfsd_file_lru);
 	LIST_HEAD(dispose);
 	unsigned long ret;
 
-	ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
-			    &dispose, list_lru_count(&nfsd_file_lru));
-	trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
-	nfsd_file_dispose_list_delayed(&dispose);
+	while (remaining > 0) {
+		unsigned long num_to_scan = min(remaining, NFSD_FILE_GC_BATCH);
+
+		ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb,
+				    &dispose, num_to_scan);
+		trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru));
+		nfsd_file_dispose_list_delayed(&dispose);
+		remaining -= num_to_scan;
+	}
 }
 
 static void
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index d5db6b34ba30..463bd60b98b4 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -3,6 +3,12 @@ 
 
 #include <linux/fsnotify_backend.h>
 
+/*
+ * Limit the time that the list_lru_one lock is held during
+ * an LRU scan.
+ */
+#define NFSD_FILE_GC_BATCH	(32UL)
+
 /*
  * This is the fsnotify_mark container that nfsd attaches to the files that it
  * is holding open. Note that we have a separate refcount here aside from the