diff mbox

[RFC,v0,43/49] pnfsd: release state lock around iput in put_nfs4_file

Message ID 1380220968-14669-1-git-send-email-bhalevy@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Benny Halevy Sept. 26, 2013, 6:42 p.m. UTC
we don't want to hold the state_lock while the file system may block

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
---
 fs/nfsd/nfs4pnfsd.c |  4 +++-
 fs/nfsd/nfs4state.c | 34 +++++++++++++++++++++++++++++++---
 fs/nfsd/state.h     |  1 +
 3 files changed, 35 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig Sept. 29, 2013, 12:19 p.m. UTC | #1
On Thu, Sep 26, 2013 at 02:42:48PM -0400, Benny Halevy wrote:
> we don't want to hold the state_lock while the file system may block

Needs a much beter changelog:

 - why don't you want to hold it
 - why you think the new version is safe and performs fine.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 1, 2013, 1:31 p.m. UTC | #2
On 2013-09-29 15:19, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:42:48PM -0400, Benny Halevy wrote:
>> we don't want to hold the state_lock while the file system may block
> 
> Needs a much beter changelog:
> 
>  - why don't you want to hold it
>  - why you think the new version is safe and performs fine.

OK.

So the reason not to hold it is that the nfs state lock is global to the
server and blocks all state modifying operations such as:
open, close, lock, clientid, session operations, etc.

It is safe to release the state lock from the pnfs call sites
on the resource dereferencing path as:
a. The file system is not expected to recurs back into the knfsd code
while holding the state lock.
b. The high level operation is already done at this point and it is not
required to hold the state lock any further.

Note that there are more call sites of put_nfs4_file in nfs4state.c
that need further analysis and move to put_nfs4_file_locked where possible.

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 1, 2013, 1:37 p.m. UTC | #3
On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
> So the reason not to hold it is that the nfs state lock is global to the
> server and blocks all state modifying operations such as:
> open, close, lock, clientid, session operations, etc.

While not really related to this patch: what's the reason it's not
split?  The way nfsd works there should be almost no state that isn't
per-export.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 2, 2013, 2:17 p.m. UTC | #4
On 2013-10-01 16:37, Christoph Hellwig wrote:
> On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
>> So the reason not to hold it is that the nfs state lock is global to the
>> server and blocks all state modifying operations such as:
>> open, close, lock, clientid, session operations, etc.
> 
> While not really related to this patch: what's the reason it's not
> split?  The way nfsd works there should be almost no state that isn't
> per-export.

I'll defer to Bruce.  We did reduce the use of the state_lock in the past
but I think there is potential for further reducing what it covers.

Memory allocations for client, file, stateid etc. and can be optimized for
the common path by opportunistically allocating these after a quick
search (lockless of rcu_read is possible)
In the uncommon case insertion of the newly allocated stuff would fail and
they will be freed.

The candidates I can quickly see are:
- the idr calls manipulating and looking up the different stateid hash tables,
- clientid rb_trees, and
- file hash table
  - currently protected with the recall_lock spin_lock but we can factor out,
    I think, the call to nfsd4_alloc_file and combine find_file and nfsd4_init_file
    for conditional insertion under the recall_lock.

Benny


> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Oct. 2, 2013, 3:26 p.m. UTC | #5
On Tue, Oct 01, 2013 at 06:37:39AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
> > So the reason not to hold it is that the nfs state lock is global to the
> > server and blocks all state modifying operations such as:
> > open, close, lock, clientid, session operations, etc.
> 
> While not really related to this patch: what's the reason it's not
> split?  The way nfsd works there should be almost no state that isn't
> per-export.

There's the NFSv4 client, but yes, the global state lock is an
embarassment....

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 11, 2013, 7:47 p.m. UTC | #6
I've been looking at this patch and the surrounding code a bit more and
start to really dislike it.

put_nfs4_file is very much unrelated to the state lock, so having a
version that internally drops it seems wrong.  If we look at the usage
of your new locked version there's very few callers:


 put_nfs4_file_locked
  + destroy_layout_state
     + put_layout_state
        + destroy_layout
	  + destroy_layout_list
	     + nfs4_pnfs_return_layout
	     + pnfs_expire_client
	+ nfs4_pnfs_get_layout
	+ nfs4_pnfs_return_layout

Except for pnfs_expire_client all these are right near the places where
the state lock is dropped, so simply refactoring the code sounds like
a very valid option.


And btw, the state locking is even more of a mess than I though.  I
think it really needs to be split up sooner or later.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 13, 2013, 6:26 a.m. UTC | #7
On 2013-10-11 22:47, Christoph Hellwig wrote:
> I've been looking at this patch and the surrounding code a bit more and
> start to really dislike it.
> 
> put_nfs4_file is very much unrelated to the state lock, so having a
> version that internally drops it seems wrong.  If we look at the usage
> of your new locked version there's very few callers:
> 
> 
>  put_nfs4_file_locked
>   + destroy_layout_state
>      + put_layout_state
>         + destroy_layout
> 	  + destroy_layout_list
> 	     + nfs4_pnfs_return_layout
> 	     + pnfs_expire_client
> 	+ nfs4_pnfs_get_layout
> 	+ nfs4_pnfs_return_layout
> 
> Except for pnfs_expire_client all these are right near the places where
> the state lock is dropped, so simply refactoring the code sounds like
> a very valid option.

I hear you.  Makes sense.

> 
> 
> And btw, the state locking is even more of a mess than I though.  I
> think it really needs to be split up sooner or later.
> 

I'm working on patches eliminating the state lock by refactoring the code
that is currently protected by it, moving all blocking calls to either before
or after the critical section.  I'll send a RFC patchset as soon as I have
the first stab at it completed.

Benny
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 8d16b85..1807455 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -166,6 +166,7 @@  struct sbid_tracker {
 {
 	struct nfs4_layout_state *ls =
 			container_of(kref, struct nfs4_layout_state, ls_ref);
+	struct nfs4_file *fp;
 
 	nfsd4_remove_stid(&ls->ls_stid);
 	if (!list_empty(&ls->ls_perclnt)) {
@@ -173,8 +174,9 @@  struct sbid_tracker {
 		unhash_layout_state(ls);
 		spin_unlock(&layout_lock);
 	}
-	put_nfs4_file(ls->ls_file);
+	fp = ls->ls_file;
 	nfsd4_free_stid(layout_state_slab, &ls->ls_stid);
+	put_nfs4_file_locked(fp);
 }
 
 /*
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e11d96f..5d5dead 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -239,14 +239,42 @@  static void nfsd4_free_file(struct nfs4_file *f)
 	kmem_cache_free(file_slab, f);
 }
 
-void
-put_nfs4_file(struct nfs4_file *fi)
+static struct inode *put_nfs4_file_common(struct nfs4_file *fi)
 {
 	if (atomic_dec_and_lock(&fi->fi_ref, &recall_lock)) {
+		struct inode *ino;
+
 		hlist_del(&fi->fi_hash);
 		spin_unlock(&recall_lock);
-		iput(fi->fi_inode);
+		ino = fi->fi_inode;
 		nfsd4_free_file(fi);
+
+		return ino;
+	}
+	return NULL;
+}
+
+void
+put_nfs4_file(struct nfs4_file *fi)
+{
+	struct inode *ino;
+
+	ino = put_nfs4_file_common(fi);
+	if (ino)
+		iput(ino);
+}
+
+void
+put_nfs4_file_locked(struct nfs4_file *fi)
+{
+	struct inode *ino;
+
+	nfs4_assert_state_locked();
+	ino = put_nfs4_file_common(fi);
+	if (ino) {
+		nfs4_unlock_state();
+		iput(ino);
+		nfs4_lock_state();
 	}
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1ef09ae..3be7507 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -491,6 +491,7 @@  extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
 extern void nfsd4_free_slab(struct kmem_cache **);
 extern struct nfs4_file *find_alloc_file(struct inode *, struct svc_fh *);
 extern void put_nfs4_file(struct nfs4_file *);
+extern void put_nfs4_file_locked(struct nfs4_file *);
 extern void get_nfs4_file(struct nfs4_file *);
 extern struct nfs4_client *find_confirmed_client(clientid_t *, bool sessions, struct nfsd_net *);
 extern struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab);