From patchwork Fri Mar 10 16:06:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9617699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D2BAB60415 for ; Fri, 10 Mar 2017 16:06:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B4C05286F7 for ; Fri, 10 Mar 2017 16:06:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A8A652871C; Fri, 10 Mar 2017 16:06:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40204286F7 for ; Fri, 10 Mar 2017 16:06:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932713AbdCJQGd (ORCPT ); Fri, 10 Mar 2017 11:06:33 -0500 Received: from mail-it0-f66.google.com ([209.85.214.66]:34143 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933065AbdCJQGb (ORCPT ); Fri, 10 Mar 2017 11:06:31 -0500 Received: by mail-it0-f66.google.com with SMTP id r141so1728828ita.1; Fri, 10 Mar 2017 08:06:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=Yr4GXq3/V9nreHrc8SMxXHLsw7UXI/MUvR+4L1rwSD4=; b=gAu71LBHRfiIYAG7xwhj52NUbU1m+oUf817w3gRCpSaecLN2G9UtE7ZKUv5e2/4f47 BB3V70xbqKlzur0XJEltIXQYWXtMsvkooipEHNeTu2Aur8lcIobuK+Hk8SG9/4q0TUQk 5j3gVX0bYlNSaxjN5rYE3vnBl1yM9OqwA2geptwHxUc8NJ48CmBVR/jBKoYSHLAeRhHP 2Vvi4cRAU2dIeepQAyMj8xKkqipf3kr69i/LMWKBB9raSiNX690ApClKaYmlHhahxRDS PQuW2Y/A/laWcnD69URK1on889bWQbYvDne0g8/dy+HQ30mTeG7YBmqKOArrq8TcJERS pCkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=Yr4GXq3/V9nreHrc8SMxXHLsw7UXI/MUvR+4L1rwSD4=; b=SI21gHnJ4TX5ls6GkXwPXNO+CQUth24D04GjHMNWua5DeNCGci/u3K/G6Dvkxoorby 4tiosY2fcnoMBX/b5O7k1Zhj+ZgTiytyY6EJdZ7bUDqcacz9s5Wf/wfdC2S/1uy2S8ac o3boT2enq2sSzRnke7eZFogNhVz9TTC4HSB/frvZD/YvNWldEjgVl/oliyCawkKM6ADb g85hPlOmazYVheFe6TN5YhfJ6VbjwDR2JUwed4nfVQrMqmZasUuny5vA5ZM509tK3TTB jgY4UazgwickO+QvTcLLHZGeRMboJnusIYr+FURH+ABzUCW3x0zQ98lOHoDlx9ggr/jp /d3Q== X-Gm-Message-State: AFeK/H2Fhr1Jdky7qTa6znhL1ubL9xT7EocFMVjyHmyNTsOl5wsJvFeWSrFx2upxbHw4Mg== X-Received: by 10.36.39.20 with SMTP id g20mr2760848ita.58.1489161990102; Fri, 10 Mar 2017 08:06:30 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id e20sm1512948itc.3.2017.03.10.08.06.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Mar 2017 08:06:29 -0800 (PST) Subject: [PATCH v1 05/11] xprtrdma: Detect unreachable NFS/RDMA servers more reliably From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Fri, 10 Mar 2017 11:06:28 -0500 Message-ID: <20170310160628.6314.52515.stgit@manet.1015granger.net> In-Reply-To: <20170310154131.6314.35201.stgit@manet.1015granger.net> References: <20170310154131.6314.35201.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Current NFS clients rely on connection loss to determine when to retransmit. In particular, for protocols like NFSv4, clients no longer rely on RPC timeouts to drive retransmission: NFSv4 servers are required to terminate a connection when they need a client to retransmit pending RPCs. When a server is no longer reachable, either because it has crashed or because the network path has broken, the server cannot actively terminate a connection. Thus NFS clients depend on transport-level keepalive to determine when a connection must be replaced and pending RPCs retransmitted. However, RDMA RC connections do not have a native keepalive mechanism. If an NFS/RDMA server crashes after a client has sent RPCs successfully (an RC ACK has been received for all OTW RDMA requests), there is no way for the client to know the connection is moribund. In addition, new RDMA requests are subject to the RPC-over-RDMA credit limit. If the client has consumed all granted credits with NFS traffic, it is not allowed to send another RDMA request until the server replies. Thus it has no way to send a true keepalive when the workload has already consumed all credits with pending RPCs. To address this, forcibly disconnect a transport when an RPC times out. This prevents moribund connections from stopping the detection of failover or other configuration changes on the server. Note that even if the connection is still good, retransmitting any RPC will trigger a disconnect thanks to this logic in xprt_rdma_send_request: /* Must suppress retransmit to maintain credits */ if (req->rl_connect_cookie == xprt->connect_cookie) goto drop_connection; req->rl_connect_cookie = xprt->connect_cookie; Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/transport.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 26c9a19..240f0da 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -484,6 +484,27 @@ dprintk("RPC: %s: %u\n", __func__, port); } +/** + * xprt_rdma_timer - invoked when an RPC times out + * @xprt: controlling RPC transport + * @task: RPC task that timed out + * + * Invoked when the transport is still connected, but an RPC + * retransmit timeout occurs. + * + * Since RDMA connections don't have a keep-alive, forcibly + * disconnect and retry to connect. This drives full + * detection of the network path, and retransmissions of + * all pending RPCs. + */ +static void +xprt_rdma_timer(struct rpc_xprt *xprt, struct rpc_task *task) +{ + dprintk("RPC: %5u %s: xprt = %p\n", task->tk_pid, __func__, xprt); + + xprt_force_disconnect(xprt); +} + static void xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task) { @@ -776,6 +797,7 @@ void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq) .alloc_slot = xprt_alloc_slot, .release_request = xprt_release_rqst_cong, /* ditto */ .set_retrans_timeout = xprt_set_retrans_timeout_def, /* ditto */ + .timer = xprt_rdma_timer, .rpcbind = rpcb_getport_async, /* sunrpc/rpcb_clnt.c */ .set_port = xprt_rdma_set_port, .connect = xprt_rdma_connect,