diff mbox

NFS4: Retry destroy session when getting -NFS4ERR_DELAY

Message ID 1427119769.16955.6.camel@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust March 23, 2015, 2:09 p.m. UTC
On Mon, 2015-03-23 at 09:16 +0800, Kinglong Mee wrote:
> On 2015/3/23 3:14, Trond Myklebust wrote:
> > On Fri, Mar 20, 2015 at 4:31 AM, Kinglong Mee <kinglongmee@gmail.com> wrote:
> >> When umounting a client, it cost near ten seconds.
> >> Dump the request, got
> >>    client                     server
> >> DELEGRETURN              ---->
> >> DESTROY_SESSION          ---->
> >>           NFS4ERR_DELAY  <----  DESTROY_SESSION reply
> >>                 NFS4_OK  <----  DELEGRETURN reply
> >> DESTROY_CLIENTID          ---->
> >>   NFS4ERR_CLIENTID_BUSY  <----  DESTROY_CLIENTID reply
> >> DESTROY_CLIENTID          ---->
> >>   NFS4ERR_CLIENTID_BUSY  <----  DESTROY_CLIENTID reply
> >>          ... ....                  ... ...
> >> There are ten DESTROY_CLIENTID requests.
> >> This patch retry DESTROY_SESSION when getting NFS4ERR_DELAY,
> >> try the best to destroy the session as destroy clientid.
> >>
> >> With this patch, only cost more than 1 seconds, as,
> >>    client                     server
> >> DELEGRETURN          ---->
> >> DESTROY_SESSION      ---->
> >>      NFS4ERR_DELAY  <----  DESTROY_SESSION reply
> >>            NFS4_OK  <----  DELEGRETURN reply
> >> DESTROY_SESSION      ---->
> >>            NFS4_OK  <----  DESTROY_SESSION reply
> >> DESTROY_CLIENTID     ---->
> >>            NFS4_OK  <----  DESTROY_CLIENTID reply
> >>
> >> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> >> ---
> >>  fs/nfs/nfs4proc.c | 26 +++++++++++++++++++++-----
> >>  1 file changed, 21 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >> index 627f37c..2631dc2 100644
> >> --- a/fs/nfs/nfs4proc.c
> >> +++ b/fs/nfs/nfs4proc.c
> >> @@ -7320,7 +7320,7 @@ out:
> >>   * Issue the over-the-wire RPC DESTROY_SESSION.
> >>   * The caller must serialize access to this routine.
> >>   */
> >> -int nfs4_proc_destroy_session(struct nfs4_session *session,
> >> +static int _nfs4_proc_destroy_session(struct nfs4_session *session,
> >>                 struct rpc_cred *cred)
> >>  {
> >>         struct rpc_message msg = {
> >> @@ -7332,10 +7332,6 @@ int nfs4_proc_destroy_session(struct nfs4_session *session,
> >>
> >>         dprintk("--> nfs4_proc_destroy_session\n");
> >>
> >> -       /* session is still being setup */
> >> -       if (!test_and_clear_bit(NFS4_SESSION_ESTABLISHED, &session->session_state))
> >> -               return 0;
> >> -
> >>         status = rpc_call_sync(session->clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
> >>         trace_nfs4_destroy_session(session->clp, status);
> >>
> >> @@ -7347,6 +7343,26 @@ int nfs4_proc_destroy_session(struct nfs4_session *session,
> >>         return status;
> >>  }
> >>
> >> +int nfs4_proc_destroy_session(struct nfs4_session *session,
> >> +               struct rpc_cred *cred)
> >> +{
> >> +       unsigned int loop;
> >> +       int ret;
> >> +
> >> +       /* session is still being setup */
> >> +       if (!test_and_clear_bit(NFS4_SESSION_ESTABLISHED, &session->session_state))
> >> +               return 0;
> >> +
> >> +       for (loop = NFS4_MAX_LOOP_ON_RECOVER; loop != 0; loop--) {
> >> +               ret = _nfs4_proc_destroy_session(session, cred);
> >> +               if (ret != -NFS4ERR_DELAY)
> >> +                       break;
> >> +               ssleep(1);
> >> +       }
> >> +
> >> +       return ret;
> >> +}
> >> +
> >>  /*
> >>   * Renew the cl_session lease.
> >>   */
> >> --
> >> 2.3.3
> >>
> > 
> > I don't understand. All you've done is paper over the problem AFAICS.
> > How is that useful?
> 
> Sorry for missing more comments.
> When unmounting nfs with delegation, client will return delegation first,
> then call nfs41_shutdown_client() destory session and clientid.
> 
> DELEGRETURN using asynchronous RPC?destroy session will be send immediately.
> If sever processes DELEGRETUEN slowly, destory session maybe processed before.
> so that, destory session will be denied with NFS4ERR_DELAY.
> 
> NFS client don't care the return value of DESTROY_SESSION,
> and call DESTROY_CLIENTID directly, so that, all DESTROY_CLIENTID will fail
> with NFS4ERR_CLIENTID_BUSY, retry DESTROY_CLIENTID is usefulness. 
> 
> After that, nfs client have destroy all information about nfs server,
> but nfs server also records client's information for DESTORY_CLIENTID and 
> DESTROY_SESSION failed.
> 
> This patch make sure the DESTROY_SESSION/DESTORY_CLIENTID success,
> first, cut down the cost of umount as I said above.
> second, make sure server clean the recording of client not until expired.

Hi Kinglong,

Ultimately, what you are saying above is that we need to drain the
session before destroying it. I agree with that goal, however not the
method that you choose to achieve it.

Please consider the following patch instead.

Cheers,
 Trond

8<---------------------------------------------------------------
From 21fb62639ad69ecc5c443dba5b41ad2bd64c6e76 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Mon, 23 Mar 2015 09:51:41 -0400
Subject: [PATCH] NFSv4: Ensure that we drain the session before shutting it
 down

Kinglong Mee reports that the call to DESTROY_SESSION in NFSv4.1
are racing with the asynchronous DELEGRETURN calls that precede it.
This points to the root cause being that we're not waiting for the
session to drain before we destroy it.

This patch ensures that we do so for both NFSv4 and NFSv4.1.

Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs4_fs.h    |  3 +++
 fs/nfs/nfs4client.c |  7 ++-----
 fs/nfs/nfs4state.c  | 18 ++++++++++++++++++
 3 files changed, 23 insertions(+), 5 deletions(-)

Comments

Trond Myklebust March 23, 2015, 4:09 p.m. UTC | #1
On Mon, Mar 23, 2015 at 10:09 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> 8<---------------------------------------------------------------
> From 21fb62639ad69ecc5c443dba5b41ad2bd64c6e76 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> Date: Mon, 23 Mar 2015 09:51:41 -0400
> Subject: [PATCH] NFSv4: Ensure that we drain the session before shutting it
>  down
>
> Kinglong Mee reports that the call to DESTROY_SESSION in NFSv4.1
> are racing with the asynchronous DELEGRETURN calls that precede it.
> This points to the root cause being that we're not waiting for the
> session to drain before we destroy it.
>
> This patch ensures that we do so for both NFSv4 and NFSv4.1.
>
> Reported-by: Kinglong Mee <kinglongmee@gmail.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> ---
>  fs/nfs/nfs4_fs.h    |  3 +++
>  fs/nfs/nfs4client.c |  7 ++-----
>  fs/nfs/nfs4state.c  | 18 ++++++++++++++++++
>  3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index fdef424b0cd3..594f53c3aee5 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -257,6 +257,8 @@ extern int nfs4_set_rw_stateid(nfs4_stateid *stateid,
>                 const struct nfs_lock_context *l_ctx,
>                 fmode_t fmode);
>
> +extern void nfs40_shutdown_session(struct nfs_client *clp);
> +
>  #if defined(CONFIG_NFS_V4_1)
>  static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
>  {
> @@ -269,6 +271,7 @@ extern int nfs41_setup_sequence(struct nfs4_session *session,
>  extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
>  extern int nfs4_proc_create_session(struct nfs_client *, struct rpc_cred *);
>  extern int nfs4_proc_destroy_session(struct nfs4_session *, struct rpc_cred *);
> +extern void nfs41_shutdown_session(struct nfs_client *clp, struct nfs4_session *session);
>  extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
>                 struct nfs_fsinfo *fsinfo);
>  extern int nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data,
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 86d6214ea022..bdabbf9b6322 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -160,7 +160,7 @@ void nfs41_shutdown_client(struct nfs_client *clp)
>  {
>         if (nfs4_has_session(clp)) {
>                 nfs4_shutdown_ds_clients(clp);
> -               nfs4_destroy_session(clp->cl_session);
> +               nfs41_shutdown_session(clp, clp->cl_session);
>                 nfs4_destroy_clientid(clp);
>         }
>
> @@ -169,10 +169,7 @@ void nfs41_shutdown_client(struct nfs_client *clp)
>
>  void nfs40_shutdown_client(struct nfs_client *clp)
>  {
> -       if (clp->cl_slot_tbl) {
> -               nfs4_shutdown_slot_table(clp->cl_slot_tbl);
> -               kfree(clp->cl_slot_tbl);
> -       }
> +       nfs40_shutdown_session(clp);
>  }
>
>  struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index f95e3b58bbc3..54fa7e2bc3e3 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2134,6 +2134,18 @@ out_unlock:
>         return status;
>  }
>
> +void nfs40_shutdown_session(struct nfs_client *clp)
> +{
> +       struct nfs4_slot_table  *tbl = clp->cl_slot_tbl;
> +
> +       if (tbl) {
> +               nfs4_drain_slot_tbl(tbl);
> +               nfs4_shutdown_slot_table(tbl);
> +               clp->cl_slot_tbl = NULL;
> +               kfree(tbl);
> +       }
> +}
> +
>  #ifdef CONFIG_NFS_V4_1
>  void nfs4_schedule_session_recovery(struct nfs4_session *session, int err)
>  {
> @@ -2314,6 +2326,12 @@ static int nfs4_bind_conn_to_session(struct nfs_client *clp)
>         }
>         return 0;
>  }
> +
> +void nfs41_shutdown_session(struct nfs_client *clp, struct nfs4_session *session)
> +{
> +       nfs4_begin_drain_session(clp);

Argh. We can't quite do this, because we do want all outstanding RPC
calls to complete, not just the ones that have already been allocated
slots. Let me respin.

> +       nfs4_destroy_session(session);
> +}
>  #else /* CONFIG_NFS_V4_1 */
>  static int nfs4_reset_session(struct nfs_client *clp) { return 0; }
>
> --
> 2.1.0
>
>
>
>
diff mbox

Patch

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index fdef424b0cd3..594f53c3aee5 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -257,6 +257,8 @@  extern int nfs4_set_rw_stateid(nfs4_stateid *stateid,
 		const struct nfs_lock_context *l_ctx,
 		fmode_t fmode);
 
+extern void nfs40_shutdown_session(struct nfs_client *clp);
+
 #if defined(CONFIG_NFS_V4_1)
 static inline struct nfs4_session *nfs4_get_session(const struct nfs_server *server)
 {
@@ -269,6 +271,7 @@  extern int nfs41_setup_sequence(struct nfs4_session *session,
 extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
 extern int nfs4_proc_create_session(struct nfs_client *, struct rpc_cred *);
 extern int nfs4_proc_destroy_session(struct nfs4_session *, struct rpc_cred *);
+extern void nfs41_shutdown_session(struct nfs_client *clp, struct nfs4_session *session);
 extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
 		struct nfs_fsinfo *fsinfo);
 extern int nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data,
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 86d6214ea022..bdabbf9b6322 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -160,7 +160,7 @@  void nfs41_shutdown_client(struct nfs_client *clp)
 {
 	if (nfs4_has_session(clp)) {
 		nfs4_shutdown_ds_clients(clp);
-		nfs4_destroy_session(clp->cl_session);
+		nfs41_shutdown_session(clp, clp->cl_session);
 		nfs4_destroy_clientid(clp);
 	}
 
@@ -169,10 +169,7 @@  void nfs41_shutdown_client(struct nfs_client *clp)
 
 void nfs40_shutdown_client(struct nfs_client *clp)
 {
-	if (clp->cl_slot_tbl) {
-		nfs4_shutdown_slot_table(clp->cl_slot_tbl);
-		kfree(clp->cl_slot_tbl);
-	}
+	nfs40_shutdown_session(clp);
 }
 
 struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index f95e3b58bbc3..54fa7e2bc3e3 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2134,6 +2134,18 @@  out_unlock:
 	return status;
 }
 
+void nfs40_shutdown_session(struct nfs_client *clp)
+{
+	struct nfs4_slot_table  *tbl = clp->cl_slot_tbl;
+
+	if (tbl) {
+		nfs4_drain_slot_tbl(tbl);
+		nfs4_shutdown_slot_table(tbl);
+		clp->cl_slot_tbl = NULL;
+		kfree(tbl);
+	}
+}
+
 #ifdef CONFIG_NFS_V4_1
 void nfs4_schedule_session_recovery(struct nfs4_session *session, int err)
 {
@@ -2314,6 +2326,12 @@  static int nfs4_bind_conn_to_session(struct nfs_client *clp)
 	}
 	return 0;
 }
+
+void nfs41_shutdown_session(struct nfs_client *clp, struct nfs4_session *session)
+{
+	nfs4_begin_drain_session(clp);
+	nfs4_destroy_session(session);
+}
 #else /* CONFIG_NFS_V4_1 */
 static int nfs4_reset_session(struct nfs_client *clp) { return 0; }