From patchwork Thu Mar 13 01:23:11 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 3821471 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id C43B49F2BB for ; Thu, 13 Mar 2014 01:23:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 016D32028D for ; Thu, 13 Mar 2014 01:23:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E1E912027D for ; Thu, 13 Mar 2014 01:23:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752271AbaCMBXV (ORCPT ); Wed, 12 Mar 2014 21:23:21 -0400 Received: from cantor2.suse.de ([195.135.220.15]:57214 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751573AbaCMBXV (ORCPT ); Wed, 12 Mar 2014 21:23:21 -0400 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D69E375017; Thu, 13 Mar 2014 01:23:19 +0000 (UTC) Date: Thu, 13 Mar 2014 12:23:11 +1100 From: NeilBrown To: Trond Myklebust Cc: Dickson Steve , NFS , Dr Fields James Bruce , Lever Charles Edward , Carsten Ziepke Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp Message-ID: <20140313122311.1d6cb500@notabene.brown> In-Reply-To: <0AC6E29F-B377-4EE0-9599-26A72A8F85DA@primarydata.com> References: <20140224142349.784345f9@notabene.brown> <531E2E3F.2020805@RedHat.com> <20140311090124.05409b1b@notabene.brown> <531F2334.2030203@RedHat.com> <20140312163803.0e911784@notabene.brown> <53203D97.6090005@RedHat.com> <0AC6E29F-B377-4EE0-9599-26A72A8F85DA@primarydata.com> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; x86_64-suse-linux-gnu) Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP > >> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is > >> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect > >> the kernel to never time out at all. > > It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client. > > > > Which patch caused the behaviour to change? 561ec1603171cd9b38dcf6cac53e8710f437a48d is the first bad commit commit 561ec1603171cd9b38dcf6cac53e8710f437a48d Author: Trond Myklebust Date: Thu Sep 26 15:22:45 2013 -0400 SUNRPC: call_connect_status should recheck bind and connect status on error Currently, we go directly to call_transmit which sends us to call_status on error. If we know that the connect attempt failed, we should rather just jump straight back to call_bind and call_connect. Ditto for EAGAIN, except do not delay. Signed-off-by: Trond Myklebust If I revert that commit from mainline (which may be a completely bogus thing to do) then mainline works (at least for this specific simple test). (The revert required some wiggling - I'll include it below). To be precise, the test is to try to mount mount server:/path /mnt from a server which has run rpc.nfsd -T -N4 "success" is getting periodic messages: mount.nfs: trying text-based options 'retry=1,vers=4,addr=10.0.10.2,clientaddr=10.0.10.1' mount.nfs: mount(2): Connection refused "failure" is not getting those messages. There is another change though. For the commit above I don't not get "Connection refused", but after 2 minutes I get mount.nfs: mount(2): Connection timed out With mainline, it waits forever. I did a second git bisect for this change and found 2118071d3b0d57a03fad77885f4fdc364798aa87 is the first bad commit commit 2118071d3b0d57a03fad77885f4fdc364798aa87 Author: Trond Myklebust Date: Tue Dec 31 13:22:59 2013 -0500 SUNRPC: Report connection error values to rpc_tasks on the pending queue Currently we only report EAGAIN, which is not descriptive enough for softconn tasks. Signed-off-by: Trond Myklebust From this commit, a mount attempt which is getting connections denied will block indefinitely. Hope that is helpful. NeilBrown This is the revert that I mentioned - just for completeness. From 9c1462ff54fcc2adc79c825b867c32c19e30a9a7 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 13 Mar 2014 11:38:54 +1100 Subject: [PATCH] Revert "SUNRPC: call_connect_status should recheck bind and connect status on error" This reverts commit 561ec1603171cd9b38dcf6cac53e8710f437a48d. Conflicts: net/sunrpc/clnt.c diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 0edada973434..ba0cd114f0e1 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1796,7 +1796,6 @@ call_connect_status(struct rpc_task *task) dprint_status(task); trace_rpc_connect_status(task, status); - task->tk_status = 0; switch (status) { /* if soft mounted, test if we've timed out */ case -ETIMEDOUT: @@ -1805,16 +1804,14 @@ call_connect_status(struct rpc_task *task) case -ECONNREFUSED: case -ECONNRESET: case -ECONNABORTED: - case -ENETUNREACH: case -EHOSTUNREACH: - /* retry with existing socket, after a delay */ - rpc_delay(task, 3*HZ); + case -ENETUNREACH: if (RPC_IS_SOFTCONN(task)) break; - case -EAGAIN: - task->tk_action = call_bind; - return; + /* retry with existing socket, after a delay */ case 0: + case -EAGAIN: + task->tk_status = 0; clnt->cl_stats->netreconn++; task->tk_action = call_transmit; return;