From patchwork Mon May 2 18:43:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 8994721 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 21B4FBF29F for ; Mon, 2 May 2016 18:43:10 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4301F201C7 for ; Mon, 2 May 2016 18:43:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5A4B2201BB for ; Mon, 2 May 2016 18:43:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755036AbcEBSnH (ORCPT ); Mon, 2 May 2016 14:43:07 -0400 Received: from mail-ig0-f194.google.com ([209.85.213.194]:35076 "EHLO mail-ig0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754998AbcEBSnF (ORCPT ); Mon, 2 May 2016 14:43:05 -0400 Received: by mail-ig0-f194.google.com with SMTP id fn8so41671igb.2; Mon, 02 May 2016 11:43:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=0Us/DZNc8VQBKEyy1hmHijqTbghtB9T0hnuKPgb1CXc=; b=l5ats/cqgNcqwJQ0ax9UUgXLzEAWd7AIrbYkuYFqgoePyw5Km/nabenDKDRhm5f7sx OLyX/f4aTNRKHsX/q04GMs8h8GY9FwGk3IiXWuPAesfUV8AGK6UWkWSZp8ffQlIx+hrS UqDUUUjYhsdczHx9T2PvEx2jGwcPRlQsxS4hwl5pKe13rHtsLSP0chOwX+zSBnmwUkXb BcF912+hlJ75QAF7xUE2e/wuxGTXj9Lv/VnpiJNROU4x3D1a+eiFxazLqRayhmnVQ7nk w5S7ZgcMUht/GUq6nnque2YeMQOD3cQ37LsTdhAKU1YAForVbej950BvvQA4b8BqLNW5 m5Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=0Us/DZNc8VQBKEyy1hmHijqTbghtB9T0hnuKPgb1CXc=; b=WMAxRs4GqhyRLyEbi9UprQTHPAWnPSbS/RUjRxPJAzXiv5lxFePGIT/Za4YI0YAIYC TGxsOR2DY3uo/DoL7uGH4DUz/1pAVOEdyqkGSuZKyApZuw9rCuvuLdE97cLUtl/kFtR0 7hF9DjkNiMsJ6mMd8kFd2zTP7FEzl2JRyirJojwX1fN7Ug+oT9F8QE1yB7HdVJ92wMgm orLRDaBG1CplZe8OPR3nP8rjDLk2tu6hlnXgDiIckROrCDr+yQEKy/c9JTlnO/lTX/TN WvsYRIZaBIkezOO3uPRoN/E4DHnJUFZ7D57gvTiKAzrYxupuKMi00T/9fhu7DMsyN03q IdjQ== X-Gm-Message-State: AOPr4FXuhPfrxRT0rYnwWRg3QX8fS63XottF7tus9MX+ZOYDyw94l0+jQS7JASGImEF0ww== X-Received: by 10.50.18.166 with SMTP id x6mr22072168igd.12.1462214584522; Mon, 02 May 2016 11:43:04 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id e101sm42704iod.29.2016.05.02.11.43.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 May 2016 11:43:04 -0700 (PDT) Subject: [PATCH v3 20/20] xprtrdma: Faster server reboot recovery From: Chuck Lever To: anna.schumaker@netapp.com Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 02 May 2016 14:43:03 -0400 Message-ID: <20160502184303.10798.4709.stgit@manet.1015granger.net> In-Reply-To: <20160502183144.10798.99847.stgit@manet.1015granger.net> References: <20160502183144.10798.99847.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In a cluster failover scenario, it is desirable for the client to attempt to reconnect quickly, as an alternate NFS server is already waiting to take over for the down server. The client can't see that a server IP address has moved to a new server until the existing connection is gone. For fabrics and devices where it is meaningful, set a definite upper bound on the amount of time before it is determined that a connection is no longer valid. This allows the RPC client to detect connection loss in a timely matter, then perform a fresh resolution of the server GUID in case it has changed (cluster failover). Signed-off-by: Chuck Lever Tested-by: Steve Wise --- net/sunrpc/xprtrdma/verbs.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index b7a5bc1..be66f65 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -554,6 +554,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, ep->rep_attr.recv_cq = recvcq; /* Initialize cma parameters */ + memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma)); /* RPC/RDMA does not use private data */ ep->rep_remote_cma.private_data = NULL; @@ -567,7 +568,16 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, ep->rep_remote_cma.responder_resources = ia->ri_device->attrs.max_qp_rd_atom; - ep->rep_remote_cma.retry_count = 7; + /* Limit transport retries so client can detect server + * GID changes quickly. RPC layer handles re-establishing + * transport connection and retransmission. + */ + ep->rep_remote_cma.retry_count = 6; + + /* RPC-over-RDMA handles its own flow control. In addition, + * make all RNR NAKs visible so we know that RPC-over-RDMA + * flow control is working correctly (no NAKs should be seen). + */ ep->rep_remote_cma.flow_control = 0; ep->rep_remote_cma.rnr_retry_count = 0;