diff mbox

NFS/TCP timeout sequence

Message ID 19989.27202.793003.725608@regina.usersys.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Max Matveev July 7, 2011, 8:11 a.m. UTC
I've had to look at the way NFS/TCP does its timeouts and backoff
and it does not make a lot of sense to me: according to the
following paragram from nfs(5) on Fedora 14 (I'm using Fedora 14
because it has more text then the same page in nfs-utils):

      timeo=n    The time (in tenths of a second) the  NFS  client  waits
                 for a response before it retries an NFS request. If this
                 option is not specified, requests are retried  every  60
                 seconds  for NFS over TCP.  The NFS client does not per?
                 form any kind of timeout backoff for NFS over TCP.

but if I try the mount with timeo=20,retrans=7 then I'm getting
retransmits which are 2, 4, 6, 8, 2, 4, 6, 8 seconds apart, i.e.
there is a) linear backoff and b) the backoff is not long enough to
let the complete sequence of 7 retransmits run its course.

This is happening because to_maxval for NFS_TCP is too short to
accomodate the linear backoff - we need to either increase the
to_maxval to something like:


max
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust July 7, 2011, 1:47 p.m. UTC | #1
On Thu, 2011-07-07 at 18:11 +1000, Max Matveev wrote: 
> I've had to look at the way NFS/TCP does its timeouts and backoff
> and it does not make a lot of sense to me: according to the
> following paragram from nfs(5) on Fedora 14 (I'm using Fedora 14
> because it has more text then the same page in nfs-utils):
> 
>       timeo=n    The time (in tenths of a second) the  NFS  client  waits
>                  for a response before it retries an NFS request. If this
>                  option is not specified, requests are retried  every  60
>                  seconds  for NFS over TCP.  The NFS client does not per?
>                  form any kind of timeout backoff for NFS over TCP.
> 
> but if I try the mount with timeo=20,retrans=7 then I'm getting
> retransmits which are 2, 4, 6, 8, 2, 4, 6, 8 seconds apart, i.e.
> there is a) linear backoff and b) the backoff is not long enough to
> let the complete sequence of 7 retransmits run its course.

Sigh... Firstly, 2 second timeouts are complete lunacy when using a
protocol that guarantees reliable delivery, such as TCP does. Anyone who
tries it deserves exactly what they get: poor unreliable performance.

Secondly, the _other_ fix for this problem is to fix the documentation.

Trond
Chuck Lever July 7, 2011, 2:04 p.m. UTC | #2
On Jul 7, 2011, at 9:47 AM, Trond Myklebust wrote:

> On Thu, 2011-07-07 at 18:11 +1000, Max Matveev wrote: 
>> I've had to look at the way NFS/TCP does its timeouts and backoff
>> and it does not make a lot of sense to me: according to the
>> following paragram from nfs(5) on Fedora 14 (I'm using Fedora 14
>> because it has more text then the same page in nfs-utils):
>> 
>>      timeo=n    The time (in tenths of a second) the  NFS  client  waits
>>                 for a response before it retries an NFS request. If this
>>                 option is not specified, requests are retried  every  60
>>                 seconds  for NFS over TCP.  The NFS client does not per?
>>                 form any kind of timeout backoff for NFS over TCP.
>> 
>> but if I try the mount with timeo=20,retrans=7 then I'm getting
>> retransmits which are 2, 4, 6, 8, 2, 4, 6, 8 seconds apart, i.e.
>> there is a) linear backoff and b) the backoff is not long enough to
>> let the complete sequence of 7 retransmits run its course.
> 
> Sigh... Firstly, 2 second timeouts are complete lunacy when using a
> protocol that guarantees reliable delivery, such as TCP does. Anyone who
> tries it deserves exactly what they get: poor unreliable performance.

We shouldn't allow such low settings.

> Secondly, the _other_ fix for this problem is to fix the documentation.

How is the documentation incorrect?  We do not want any kind of back-off for stream transports.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Matveev July 8, 2011, 12:20 a.m. UTC | #3
On Thu, 07 Jul 2011 09:47:19 -0400, Trond Myklebust wrote:

 TM> On Thu, 2011-07-07 at 18:11 +1000, Max Matveev wrote: 

 TM> Sigh... Firstly, 2 second timeouts are complete lunacy when using a
 TM> protocol that guarantees reliable delivery, such as TCP does. Anyone who
 TM> tries it deserves exactly what they get: poor unreliable performance.

2 seconds is besides the point - I'm not going to wait for 28 minutes
(timeout=600,retrans=7) when doing timeout testing just to prove that
it does not work.

 TM> Secondly, the _other_ fix for this problem is to fix the documentation.

I can live with that too.

max
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Matveev July 8, 2011, 6:05 a.m. UTC | #4
On Thu, 07 Jul 2011 10:16:53 -0400, Trond Myklebust wrote:

> Anyway, why shouldn't we back off if the server is failing to respond?

Wasn't it the no-backoff/drop-connection approach what Mike Eisler was
advocating back in '06 during Connectathon?

http://www.connectathon.org/talks06/eisler.pdf

I think this was the trigger to go from exponential backoff to linear.

max
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Matveev Aug. 4, 2011, 5:54 a.m. UTC | #5
Resuming a conversation from last month - it fizzed out and died
without any resoltion...

On Thu, 07 Jul 2011 10:59:12 -0400, Trond Myklebust wrote:

 trond> Looking at the code:

 trond> v2.6.0: exponential back off
 trond> v2.6.4: exponential back off
 trond> v2.6.9: exponential back off
 trond> v2.6.16: linear back off
 trond> v2.6.18: linear back off
 trond> v2.6.24: linear back off
 trond> v2.6.32: linear back off
 trond> ....

 trond> So I've no idea what you were testing.

I'm going to assume that we're keeping linear backoff and
send a patch which correct calculation of maxint for this case.

 >> So it seems to me the kernel has diverged (perhaps long ago) from
 >> the documentation, not the other way around.

 trond> Nope. The documentation has simply always been inaccurate afaics from
 trond> the above inspection.

And another one to update the documentation..

max

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -606,7 +606,8 @@  static void nfs_init_timeout_values(struct rpc_timeout *to, 
                if (to->to_initval > NFS_MAX_TCP_TIMEOUT)
                        to->to_initval = NFS_MAX_TCP_TIMEOUT;
                to->to_increment = to->to_initval;
-               to->to_maxval = to->to_initval + (to->to_increment * to->to_retries);
+               to->to_maxval = to->to_increment * (to->to_retries + 1) 
+                             * (to->to_retries + 2) / 2;
                if (to->to_maxval > NFS_MAX_TCP_TIMEOUT)
                        to->to_maxval = NFS_MAX_TCP_TIMEOUT;
                if (to->to_maxval < to->to_initval)

or don't do the linear backoff

--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -546,7 +546,7 @@  static void xprt_reset_majortimeo(struct rpc_rqst *req)
        if (to->to_exponential)
                req->rq_majortimeo <<= to->to_retries;
        else
-               req->rq_majortimeo += to->to_increment * to->to_retries;
+               req->rq_majortimeo += to->to_increment;
        if (req->rq_majortimeo > to->to_maxval || req->rq_majortimeo == 0)
                req->rq_majortimeo = to->to_maxval;
        req->rq_majortimeo += jiffies;