diff mbox series

[v2] nfsd: serialize filecache garbage collector

Message ID 20220531103427.47769-1-wangyugui@e16-tech.com (mailing list archive)
State New, archived
Headers show
Series [v2] nfsd: serialize filecache garbage collector | expand

Commit Message

Wang Yugui May 31, 2022, 10:34 a.m. UTC
When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
xfstests generic/531, nfsd proceses are in CPU high-load state,
and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.

concurrency nfsd_file_gc() is almost meaningless, so serialize it.

Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
---
Changes since v1:
- add static to 'atomic_t nfsd_file_gc_running'.
  thanks for kernel test robot <lkp@intel.com>

 fs/nfsd/filecache.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Chuck Lever May 31, 2022, 2:12 p.m. UTC | #1
> On May 31, 2022, at 6:34 AM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> 
> When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
> xfstests generic/531, nfsd proceses are in CPU high-load state,
> and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.

Over the past few days, I've been able to reproduce a lot of bad
behavior with generic/531. My test client has 12 physical CPU
cores, and my lab network is 56Gb InfiniBand.

Unfortunately this patch doesn't really begin to address it. For
example, with this patch applied, CPU idle is in single digits
on the NFS server that exports the test's scratch device, and
that server can still get into a soft lock-up. IMO that is
because this change works around the underlying problem but
makes no attempt to root-cause or address that issue.

I agree that the NFS server's behavior needs attention, but I'm
not inclined to apply this particular patch as it is.


> concurrency nfsd_file_gc() is almost meaningless, so serialize it.
> 
> Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
> ---
> Changes since v1:
> - add static to 'atomic_t nfsd_file_gc_running'.
>  thanks for kernel test robot <lkp@intel.com>
> 
> fs/nfsd/filecache.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f172412447f5..28a8f8d6d235 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -471,10 +471,15 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
> 	return ret;
> }
> 
> +/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */
> +static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0);
> static void
> nfsd_file_gc(void)
> {
> -	nfsd_file_lru_walk_list(NULL);
> +	if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) {
> +		nfsd_file_lru_walk_list(NULL);
> +		atomic_set(&nfsd_file_gc_running, 0);
> +	}
> }
> 
> static void
> -- 
> 2.36.1
> 

--
Chuck Lever
Wang Yugui May 31, 2022, 2:44 p.m. UTC | #2
Hi,

> > On May 31, 2022, at 6:34 AM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> > 
> > When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
> > xfstests generic/531, nfsd proceses are in CPU high-load state,
> > and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.
> 
> Over the past few days, I've been able to reproduce a lot of bad
> behavior with generic/531. My test client has 12 physical CPU
> cores, and my lab network is 56Gb InfiniBand.
> 
> Unfortunately this patch doesn't really begin to address it. For
> example, with this patch applied, CPU idle is in single digits
> on the NFS server that exports the test's scratch device, and
> that server can still get into a soft lock-up. IMO that is
> because this change works around the underlying problem but
> makes no attempt to root-cause or address that issue.
> 
> I agree that the NFS server's behavior needs attention, but I'm
> not inclined to apply this particular patch as it is.

Yes. this patch is just particular for xfstests generic/531.

In xfstests  generic/531, when many(>500K ) files are kept as OPEN, a
file delete will cause LRU walk( CPU soft look-up) too.

big LRU data is still fast to add, but very slow to remove some random
one?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/05/31
diff mbox series

Patch

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f172412447f5..28a8f8d6d235 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -471,10 +471,15 @@  nfsd_file_lru_walk_list(struct shrink_control *sc)
 	return ret;
 }
 
+/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */
+static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0);
 static void
 nfsd_file_gc(void)
 {
-	nfsd_file_lru_walk_list(NULL);
+	if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) {
+		nfsd_file_lru_walk_list(NULL);
+		atomic_set(&nfsd_file_gc_running, 0);
+	}
 }
 
 static void