diff mbox

[-,v2] mount.nfs: Fix fallback from tcp to udp

Message ID 20140313122311.1d6cb500@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown March 13, 2014, 1:23 a.m. UTC
> >> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is 
> >> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect 
> >> the kernel to never time out at all.
> > It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client. 
> > 
> 
> Which patch caused the behaviour to change?

561ec1603171cd9b38dcf6cac53e8710f437a48d is the first bad commit
commit 561ec1603171cd9b38dcf6cac53e8710f437a48d
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Thu Sep 26 15:22:45 2013 -0400

    SUNRPC: call_connect_status should recheck bind and connect status on error
    
    Currently, we go directly to call_transmit which sends us to call_status
    on error. If we know that the connect attempt failed, we should rather
    just jump straight back to call_bind and call_connect.
    
    Ditto for EAGAIN, except do not delay.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


If I revert that commit from mainline (which may be a completely bogus thing
to do) then mainline works (at least for this specific simple test).
(The revert required some wiggling - I'll include it below).

To be precise, the test is to try to mount
  mount server:/path /mnt
from a server which has run
   rpc.nfsd -T -N4

"success" is getting periodic messages:

mount.nfs: trying text-based options 'retry=1,vers=4,addr=10.0.10.2,clientaddr=10.0.10.1'
mount.nfs: mount(2): Connection refused


"failure" is not getting those messages.


There is another change though.
For the commit above I don't not get "Connection refused", but after 2
minutes I get 

mount.nfs: mount(2): Connection timed out

With mainline, it waits forever.
I did a second git bisect for this change and found

2118071d3b0d57a03fad77885f4fdc364798aa87 is the first bad commit
commit 2118071d3b0d57a03fad77885f4fdc364798aa87
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Tue Dec 31 13:22:59 2013 -0500

    SUNRPC: Report connection error values to rpc_tasks on the pending queue
    
    Currently we only report EAGAIN, which is not descriptive enough for
    softconn tasks.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


From this commit, a mount attempt which is getting connections denied will
block indefinitely.

Hope that is helpful.

NeilBrown



This is the revert that I mentioned - just for completeness.


From 9c1462ff54fcc2adc79c825b867c32c19e30a9a7 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 13 Mar 2014 11:38:54 +1100
Subject: [PATCH] Revert "SUNRPC: call_connect_status should recheck bind and
 connect status on error"

This reverts commit 561ec1603171cd9b38dcf6cac53e8710f437a48d.

Conflicts:
	net/sunrpc/clnt.c
diff mbox

Patch

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0edada973434..ba0cd114f0e1 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1796,7 +1796,6 @@  call_connect_status(struct rpc_task *task)
 	dprint_status(task);
 
 	trace_rpc_connect_status(task, status);
-	task->tk_status = 0;
 	switch (status) {
 		/* if soft mounted, test if we've timed out */
 	case -ETIMEDOUT:
@@ -1805,16 +1804,14 @@  call_connect_status(struct rpc_task *task)
 	case -ECONNREFUSED:
 	case -ECONNRESET:
 	case -ECONNABORTED:
-	case -ENETUNREACH:
 	case -EHOSTUNREACH:
-		/* retry with existing socket, after a delay */
-		rpc_delay(task, 3*HZ);
+	case -ENETUNREACH:
 		if (RPC_IS_SOFTCONN(task))
 			break;
-	case -EAGAIN:
-		task->tk_action = call_bind;
-		return;
+		/* retry with existing socket, after a delay */
 	case 0:
+	case -EAGAIN:
+		task->tk_status = 0;
 		clnt->cl_stats->netreconn++;
 		task->tk_action = call_transmit;
 		return;