diff mbox

[1/1,SUNRPC] make sure to clone timeout values

Message ID CAN-5tyEEVqnQgNJq_pbaujsVBeVQ0y0AwNgP=cJxA9aj2=o_9w@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Olga Kornievskaia May 22, 2018, 7:28 p.m. UTC
On Tue, May 22, 2018 at 3:03 PM, Trond Myklebust
<trondmy@hammerspace.com> wrote:
> On Tue, 2018-05-22 at 14:40 -0400, Olga Kornievskaia wrote:
>> From: Olga Kornievskaia <olga.kornievskaia@gmail.com>
>>
>> For pNFS, the operations to DS currently timeout in 10s. According
>> to the spec, the client must not be re-trying an NFSv4.1 operation
>> unless the connection was broken.
>>
>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>> ---
>>  net/sunrpc/clnt.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 6e432ec..97517eb 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -668,6 +668,7 @@ struct rpc_clnt *
>>               .prognumber     = clnt->cl_prog,
>>               .version        = clnt->cl_vers,
>>               .authflavor     = flavor,
>> +             .timeout        = clnt->cl_timeout,
>>       };
>>       return __rpc_clone_client(&args, clnt);
>>  }
>
> What does this patch have to do with pNFS? That's the generic RPC
> client cloning API you are changing.
>
> The pNFS/files timeouts are intended to be set using the
> dataserver_retrans and dataserver_timeo module parameters described at
> the bottom of fs/nfs/filelayout/filelayoutdev.c

Ok so perhaps the code needs to re-written so that it allows for the
DS to get an rpc client with its timeouts set. Which currently doesn't
happen.

From what I could tell the DS code tries to set the timeout values in
nfs4_set_ds_client() but that has no effect.

nfs4_find_or_create_ds_client() calls rpc_clone_client_set_auth()
which creates an rpc client but the timeout that were set are ignored
and instead the rpc client is getting created with this 10s timeout.

(but I thought that in general it made sense that a clone also copies
the timeout values)

Is that more acceptable?

  * @args: pointer to the new transport arguments

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust May 22, 2018, 7:56 p.m. UTC | #1
On Tue, 2018-05-22 at 15:28 -0400, Olga Kornievskaia wrote:
> On Tue, May 22, 2018 at 3:03 PM, Trond Myklebust

> <trondmy@hammerspace.com> wrote:

> > On Tue, 2018-05-22 at 14:40 -0400, Olga Kornievskaia wrote:

> > > From: Olga Kornievskaia <olga.kornievskaia@gmail.com>

> > > 

> > > For pNFS, the operations to DS currently timeout in 10s.

> > > According

> > > to the spec, the client must not be re-trying an NFSv4.1

> > > operation

> > > unless the connection was broken.

> > > 

> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

> > > ---

> > >  net/sunrpc/clnt.c | 1 +

> > >  1 file changed, 1 insertion(+)

> > > 

> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c

> > > index 6e432ec..97517eb 100644

> > > --- a/net/sunrpc/clnt.c

> > > +++ b/net/sunrpc/clnt.c

> > > @@ -668,6 +668,7 @@ struct rpc_clnt *

> > >               .prognumber     = clnt->cl_prog,

> > >               .version        = clnt->cl_vers,

> > >               .authflavor     = flavor,

> > > +             .timeout        = clnt->cl_timeout,

> > >       };

> > >       return __rpc_clone_client(&args, clnt);

> > >  }

> > 

> > What does this patch have to do with pNFS? That's the generic RPC

> > client cloning API you are changing.

> > 

> > The pNFS/files timeouts are intended to be set using the

> > dataserver_retrans and dataserver_timeo module parameters described

> > at

> > the bottom of fs/nfs/filelayout/filelayoutdev.c

> 

> Ok so perhaps the code needs to re-written so that it allows for the

> DS to get an rpc client with its timeouts set. Which currently

> doesn't

> happen.

> 

> From what I could tell the DS code tries to set the timeout values in

> nfs4_set_ds_client() but that has no effect.

> 

> nfs4_find_or_create_ds_client() calls rpc_clone_client_set_auth()

> which creates an rpc client but the timeout that were set are ignored

> and instead the rpc client is getting created with this 10s timeout.

> 

> (but I thought that in general it made sense that a clone also copies

> the timeout values)

> 


It does not make sense when you consider that the timeout is a per-
transport attribute.

FWIW, I've no idea where this 10s timeout you are seeing is coming
from. Perhaps it is worthwhile figuring that out first?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
Olga Kornievskaia May 22, 2018, 8:51 p.m. UTC | #2
On Tue, May 22, 2018 at 3:56 PM, Trond Myklebust
<trondmy@hammerspace.com> wrote:
> On Tue, 2018-05-22 at 15:28 -0400, Olga Kornievskaia wrote:
>> On Tue, May 22, 2018 at 3:03 PM, Trond Myklebust
>> <trondmy@hammerspace.com> wrote:
>> > On Tue, 2018-05-22 at 14:40 -0400, Olga Kornievskaia wrote:
>> > > From: Olga Kornievskaia <olga.kornievskaia@gmail.com>
>> > >
>> > > For pNFS, the operations to DS currently timeout in 10s.
>> > > According
>> > > to the spec, the client must not be re-trying an NFSv4.1
>> > > operation
>> > > unless the connection was broken.
>> > >
>> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>> > > ---
>> > >  net/sunrpc/clnt.c | 1 +
>> > >  1 file changed, 1 insertion(+)
>> > >
>> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> > > index 6e432ec..97517eb 100644
>> > > --- a/net/sunrpc/clnt.c
>> > > +++ b/net/sunrpc/clnt.c
>> > > @@ -668,6 +668,7 @@ struct rpc_clnt *
>> > >               .prognumber     = clnt->cl_prog,
>> > >               .version        = clnt->cl_vers,
>> > >               .authflavor     = flavor,
>> > > +             .timeout        = clnt->cl_timeout,
>> > >       };
>> > >       return __rpc_clone_client(&args, clnt);
>> > >  }
>> >
>> > What does this patch have to do with pNFS? That's the generic RPC
>> > client cloning API you are changing.
>> >
>> > The pNFS/files timeouts are intended to be set using the
>> > dataserver_retrans and dataserver_timeo module parameters described
>> > at
>> > the bottom of fs/nfs/filelayout/filelayoutdev.c
>>
>> Ok so perhaps the code needs to re-written so that it allows for the
>> DS to get an rpc client with its timeouts set. Which currently
>> doesn't
>> happen.
>>
>> From what I could tell the DS code tries to set the timeout values in
>> nfs4_set_ds_client() but that has no effect.
>>
>> nfs4_find_or_create_ds_client() calls rpc_clone_client_set_auth()
>> which creates an rpc client but the timeout that were set are ignored
>> and instead the rpc client is getting created with this 10s timeout.
>>
>> (but I thought that in general it made sense that a clone also copies
>> the timeout values)
>>
>
> It does not make sense when you consider that the timeout is a per-
> transport attribute.
>
> FWIW, I've no idea where this 10s timeout you are seeing is coming
> from. Perhaps it is worthwhile figuring that out first?

Besides the value of the 10s (which I also have been having a really
hard time figuring out) it's also the max timeout and the fact that,
after  the 10s are up it's giving up and failing the operation which
is then re-tried against the MDS. This shouldn't happen. So I felt
like even if that value was 60s, it shouldn't have timed out after 60s
and re-tried (without the fix that I'm proposing).

I'll give it a bit more to figure out where 10s is coming from.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 04612c2..2c81eef8 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -99,7 +99,7 @@  struct nfs4_ds_server {
  if (dss == NULL)
  return ERR_PTR(-ENOMEM);

- dss->rpc_clnt = rpc_clone_client_set_auth(ds_clp->cl_rpcclient, flavor);
+ dss->rpc_clnt =
rpc_clone_client_set_auth_and_timeout(ds_clp->cl_rpcclient, flavor);
  if (IS_ERR(dss->rpc_clnt)) {
  int err = PTR_ERR(dss->rpc_clnt);
  kfree (dss);
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index ed761f7..1c2b317 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -150,6 +150,8 @@  struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *,
 struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
 struct rpc_clnt *rpc_clone_client_set_auth(struct rpc_clnt *,
  rpc_authflavor_t);
+struct rpc_clnt *rpc_clone_client_set_auth_and_timeout(struct rpc_clnt *,
+ rpc_authflavor_t);
 int rpc_switch_client_transport(struct rpc_clnt *,
  struct xprt_create *,
  const struct rpc_timeout *);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6e432ec..50a4a44 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -674,6 +674,29 @@  struct rpc_clnt *
 EXPORT_SYMBOL_GPL(rpc_clone_client_set_auth);

 /**
+ * rpc_clone_client_set_auth_and_timeout - Clone an RPC client structure
+ * and set its auth and timeouts
+ *
+ * @clnt: RPC client whose parameters are copied
+ * @flavor: security flavor for new client
+ *
+ * Returns a fresh RPC client or an ERR_PTR.
+ */
+struct rpc_clnt *
+rpc_clone_client_set_auth_and_timeout(struct rpc_clnt *clnt,
rpc_authflavor_t flavor)
+{
+ struct rpc_create_args args = {
+ .program = clnt->cl_program,
+ .prognumber = clnt->cl_prog,
+ .version = clnt->cl_vers,
+ .authflavor = flavor,
+ .timeout = clnt->cl_timeout,
+ };
+ return __rpc_clone_client(&args, clnt);
+}
+EXPORT_SYMBOL_GPL(rpc_clone_client_set_auth_and_timeout);
+
+/**
  * rpc_switch_client_transport: switch the RPC transport on the fly
  * @clnt: pointer to a struct rpc_clnt