nfsd: remove dedicated nfsd_filecache_wq

Message ID	20221107171056.64564-1-jlayton@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Jeff Layton <jlayton@kernel.org> To: chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org Subject: [PATCH] nfsd: remove dedicated nfsd_filecache_wq Date: Mon, 7 Nov 2022 12:10:56 -0500 Message-Id: <20221107171056.64564-1-jlayton@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	nfsd: remove dedicated nfsd_filecache_wq \| expand nfsd: remove dedicated nfsd_filecache_wq

Jeff Layton Nov. 7, 2022, 5:10 p.m. UTC

There's no clear benefit to allocating our own over just using the
system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
is leaked.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

Jeff Layton Nov. 7, 2022, 5:28 p.m. UTC | #1

On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> There's no clear benefit to allocating our own over just using the
> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> is leaked.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/filecache.c | 13 +------------
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 

I'm playing with this and it seems to be ok, but reading further into
the workqueue doc, it says this:

* A wq serves as a domain for forward progress guarantee
  (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
  which are not involved in memory reclaim and don't need to be
  flushed as a part of a group of work items, and don't require any
  special attribute, can use one of the system wq.  There is no
  difference in execution characteristics between using a dedicated wq
  and a system wq.

These jobs are involved in mem reclaim however, due to the shrinker.
OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.

In any case, we aren't flushing the work or anything as part of mem
reclaim, so maybe the above bullet point doesn't apply here?

> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 1e76b0d3b83a..59e06d68d20c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
>  	struct list_head freeme;
>  };
>  
> -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
> -
>  static struct kmem_cache		*nfsd_file_slab;
>  static struct kmem_cache		*nfsd_file_mark_slab;
>  static struct list_lru			nfsd_file_lru;
> @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
>  	spin_lock(&l->lock);
>  	list_splice_tail_init(files, &l->freeme);
>  	spin_unlock(&l->lock);
> -	queue_work(nfsd_filecache_wq, &l->work);
> +	queue_work(system_wq, &l->work);
>  }
>  
>  static void
> @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
>  	if (ret)
>  		return ret;
>  
> -	ret = -ENOMEM;
> -	nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
> -	if (!nfsd_filecache_wq)
> -		goto out;
> -
>  	nfsd_file_slab = kmem_cache_create("nfsd_file",
>  				sizeof(struct nfsd_file), 0, 0, NULL);
>  	if (!nfsd_file_slab) {
> @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
>  	nfsd_file_slab = NULL;
>  	kmem_cache_destroy(nfsd_file_mark_slab);
>  	nfsd_file_mark_slab = NULL;
> -	destroy_workqueue(nfsd_filecache_wq);
> -	nfsd_filecache_wq = NULL;
>  	rhashtable_destroy(&nfsd_file_rhash_tbl);
>  	goto out;
>  }
> @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
>  	fsnotify_wait_marks_destroyed();
>  	kmem_cache_destroy(nfsd_file_mark_slab);
>  	nfsd_file_mark_slab = NULL;
> -	destroy_workqueue(nfsd_filecache_wq);
> -	nfsd_filecache_wq = NULL;
>  	rhashtable_destroy(&nfsd_file_rhash_tbl);
>  
>  	for_each_possible_cpu(i) {

Chuck Lever Nov. 7, 2022, 6:16 p.m. UTC | #2

> On Nov 7, 2022, at 12:28 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
>> There's no clear benefit to allocating our own over just using the
>> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
>> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
>> is leaked.
>> 
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>> fs/nfsd/filecache.c | 13 +------------
>> 1 file changed, 1 insertion(+), 12 deletions(-)
>> 
> 
> I'm playing with this and it seems to be ok, but reading further into
> the workqueue doc, it says this:
> 
> * A wq serves as a domain for forward progress guarantee
>  (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
>  which are not involved in memory reclaim and don't need to be
>  flushed as a part of a group of work items, and don't require any
>  special attribute, can use one of the system wq.  There is no
>  difference in execution characteristics between using a dedicated wq
>  and a system wq.
> 
> These jobs are involved in mem reclaim however, due to the shrinker.
> OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
> 
> In any case, we aren't flushing the work or anything as part of mem
> reclaim, so maybe the above bullet point doesn't apply here?

In the steady state, deferring writeback seems like the right
thing to do, and I don't see the need for a special WQ for that
case -- hence nfsd_file_schedule_laundrette() can use the
system_wq.

That might explain the dual WQ arrangement in the current code.

But I'd feel better if the shrinker skipped files that require
writeback to avoid a potential deadlock scenario for some
filesystems.


>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 1e76b0d3b83a..59e06d68d20c 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
>> 	struct list_head freeme;
>> };
>> 
>> -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
>> -
>> static struct kmem_cache		*nfsd_file_slab;
>> static struct kmem_cache		*nfsd_file_mark_slab;
>> static struct list_lru			nfsd_file_lru;
>> @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
>> 	spin_lock(&l->lock);
>> 	list_splice_tail_init(files, &l->freeme);
>> 	spin_unlock(&l->lock);
>> -	queue_work(nfsd_filecache_wq, &l->work);
>> +	queue_work(system_wq, &l->work);
>> }
>> 
>> static void
>> @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
>> 	if (ret)
>> 		return ret;
>> 
>> -	ret = -ENOMEM;
>> -	nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
>> -	if (!nfsd_filecache_wq)
>> -		goto out;
>> -
>> 	nfsd_file_slab = kmem_cache_create("nfsd_file",
>> 				sizeof(struct nfsd_file), 0, 0, NULL);
>> 	if (!nfsd_file_slab) {
>> @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
>> 	nfsd_file_slab = NULL;
>> 	kmem_cache_destroy(nfsd_file_mark_slab);
>> 	nfsd_file_mark_slab = NULL;
>> -	destroy_workqueue(nfsd_filecache_wq);
>> -	nfsd_filecache_wq = NULL;
>> 	rhashtable_destroy(&nfsd_file_rhash_tbl);
>> 	goto out;
>> }
>> @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
>> 	fsnotify_wait_marks_destroyed();
>> 	kmem_cache_destroy(nfsd_file_mark_slab);
>> 	nfsd_file_mark_slab = NULL;
>> -	destroy_workqueue(nfsd_filecache_wq);
>> -	nfsd_filecache_wq = NULL;
>> 	rhashtable_destroy(&nfsd_file_rhash_tbl);
>> 
>> 	for_each_possible_cpu(i) {
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

--
Chuck Lever

Jeff Layton Nov. 7, 2022, 6:45 p.m. UTC | #3

On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
> 
> > On Nov 7, 2022, at 12:28 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> > > There's no clear benefit to allocating our own over just using the
> > > system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> > > current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> > > is leaked.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > fs/nfsd/filecache.c | 13 +------------
> > > 1 file changed, 1 insertion(+), 12 deletions(-)
> > > 
> > 
> > I'm playing with this and it seems to be ok, but reading further into
> > the workqueue doc, it says this:
> > 
> > * A wq serves as a domain for forward progress guarantee
> >  (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
> >  which are not involved in memory reclaim and don't need to be
> >  flushed as a part of a group of work items, and don't require any
> >  special attribute, can use one of the system wq.  There is no
> >  difference in execution characteristics between using a dedicated wq
> >  and a system wq.
> > 
> > These jobs are involved in mem reclaim however, due to the shrinker.
> > OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
> > 
> > In any case, we aren't flushing the work or anything as part of mem
> > reclaim, so maybe the above bullet point doesn't apply here?
> 
> In the steady state, deferring writeback seems like the right
> thing to do, and I don't see the need for a special WQ for that
> case -- hence nfsd_file_schedule_laundrette() can use the
> system_wq.
> 
> That might explain the dual WQ arrangement in the current code.
> 

True. Looking through the changelog, the dedicated workqueue was added
by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
assume he was concerned about reclaim.

Trond, if we keep the dedicated workqueue for the laundrette, does it
also need WQ_MEM_RECLAIM?


> But I'd feel better if the shrinker skipped files that require
> writeback to avoid a potential deadlock scenario for some
> filesystems.
> 

It already does this via the nfsd_file_check_writeback call in
nfsd_file_lru_cb.


> 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 1e76b0d3b83a..59e06d68d20c 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
> > > 	struct list_head freeme;
> > > };
> > > 
> > > -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
> > > -
> > > static struct kmem_cache		*nfsd_file_slab;
> > > static struct kmem_cache		*nfsd_file_mark_slab;
> > > static struct list_lru			nfsd_file_lru;
> > > @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
> > > 	spin_lock(&l->lock);
> > > 	list_splice_tail_init(files, &l->freeme);
> > > 	spin_unlock(&l->lock);
> > > -	queue_work(nfsd_filecache_wq, &l->work);
> > > +	queue_work(system_wq, &l->work);
> > > }
> > > 
> > > static void
> > > @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
> > > 	if (ret)
> > > 		return ret;
> > > 
> > > -	ret = -ENOMEM;
> > > -	nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
> > > -	if (!nfsd_filecache_wq)
> > > -		goto out;
> > > -
> > > 	nfsd_file_slab = kmem_cache_create("nfsd_file",
> > > 				sizeof(struct nfsd_file), 0, 0, NULL);
> > > 	if (!nfsd_file_slab) {
> > > @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
> > > 	nfsd_file_slab = NULL;
> > > 	kmem_cache_destroy(nfsd_file_mark_slab);
> > > 	nfsd_file_mark_slab = NULL;
> > > -	destroy_workqueue(nfsd_filecache_wq);
> > > -	nfsd_filecache_wq = NULL;
> > > 	rhashtable_destroy(&nfsd_file_rhash_tbl);
> > > 	goto out;
> > > }
> > > @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
> > > 	fsnotify_wait_marks_destroyed();
> > > 	kmem_cache_destroy(nfsd_file_mark_slab);
> > > 	nfsd_file_mark_slab = NULL;
> > > -	destroy_workqueue(nfsd_filecache_wq);
> > > -	nfsd_filecache_wq = NULL;
> > > 	rhashtable_destroy(&nfsd_file_rhash_tbl);
> > > 
> > > 	for_each_possible_cpu(i) {
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> 
> --
> Chuck Lever
> 
> 
>

Trond Myklebust Nov. 7, 2022, 7:06 p.m. UTC | #4

> On Nov 7, 2022, at 13:45, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
>> 
>>> On Nov 7, 2022, at 12:28 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
>>>> There's no clear benefit to allocating our own over just using the
>>>> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
>>>> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
>>>> is leaked.
>>>> 
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>> fs/nfsd/filecache.c | 13 +------------
>>>> 1 file changed, 1 insertion(+), 12 deletions(-)
>>>> 
>>> 
>>> I'm playing with this and it seems to be ok, but reading further into
>>> the workqueue doc, it says this:
>>> 
>>> * A wq serves as a domain for forward progress guarantee
>>> (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
>>> which are not involved in memory reclaim and don't need to be
>>> flushed as a part of a group of work items, and don't require any
>>> special attribute, can use one of the system wq.  There is no
>>> difference in execution characteristics between using a dedicated wq
>>> and a system wq.
>>> 
>>> These jobs are involved in mem reclaim however, due to the shrinker.
>>> OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
>>> 
>>> In any case, we aren't flushing the work or anything as part of mem
>>> reclaim, so maybe the above bullet point doesn't apply here?
>> 
>> In the steady state, deferring writeback seems like the right
>> thing to do, and I don't see the need for a special WQ for that
>> case -- hence nfsd_file_schedule_laundrette() can use the
>> system_wq.
>> 
>> That might explain the dual WQ arrangement in the current code.
>> 
> 
> True. Looking through the changelog, the dedicated workqueue was added
> by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
> assume he was concerned about reclaim.

It is about separation of concerns. The issue is to ensure that when you have multiple containers each running their own set of knfsd servers, then you won’t end up with one server bottlenecking the cleanup of all the containers. It is of particular interest to do this if one of these containers is actually re-exporting NFS.

> 
> Trond, if we keep the dedicated workqueue for the laundrette, does it
> also need WQ_MEM_RECLAIM?

In theory, the files on that list are supposed to have already been flushed, so I don’t think it is needed.

Jeff Layton Nov. 7, 2022, 8:32 p.m. UTC | #5

On Mon, 2022-11-07 at 19:06 +0000, Trond Myklebust wrote:
> 
> > On Nov 7, 2022, at 13:45, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
> > > 
> > > > On Nov 7, 2022, at 12:28 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> > > > > There's no clear benefit to allocating our own over just using the
> > > > > system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> > > > > current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> > > > > is leaked.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > > fs/nfsd/filecache.c | 13 +------------
> > > > > 1 file changed, 1 insertion(+), 12 deletions(-)
> > > > > 
> > > > 
> > > > I'm playing with this and it seems to be ok, but reading further into
> > > > the workqueue doc, it says this:
> > > > 
> > > > * A wq serves as a domain for forward progress guarantee
> > > > (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
> > > > which are not involved in memory reclaim and don't need to be
> > > > flushed as a part of a group of work items, and don't require any
> > > > special attribute, can use one of the system wq.  There is no
> > > > difference in execution characteristics between using a dedicated wq
> > > > and a system wq.
> > > > 
> > > > These jobs are involved in mem reclaim however, due to the shrinker.
> > > > OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
> > > > 
> > > > In any case, we aren't flushing the work or anything as part of mem
> > > > reclaim, so maybe the above bullet point doesn't apply here?
> > > 
> > > In the steady state, deferring writeback seems like the right
> > > thing to do, and I don't see the need for a special WQ for that
> > > case -- hence nfsd_file_schedule_laundrette() can use the
> > > system_wq.
> > > 
> > > That might explain the dual WQ arrangement in the current code.
> > > 
> > 
> > True. Looking through the changelog, the dedicated workqueue was added
> > by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
> > assume he was concerned about reclaim.
> 
> It is about separation of concerns. The issue is to ensure that when you have multiple containers each running their own set of knfsd servers, then you won’t end up with one server bottlenecking the cleanup of all the containers. It is of particular interest to do this if one of these containers is actually re-exporting NFS.
> 

I'm not sure that having a separate workqueue really prevents that these
days though, does it? They all use the same kthreads. We're not using
flush_workqueue. What scenario do we prevent by having a dedicated
workqueue?

> > 
> > Trond, if we keep the dedicated workqueue for the laundrette, does it
> > also need WQ_MEM_RECLAIM?
> 
> In theory, the files on that list are supposed to have already been flushed, so I don’t think it is needed.
> 

Ok.

nfsd: remove dedicated nfsd_filecache_wq

Commit Message

Comments

Patch