From patchwork Wed Feb 8 22:00:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 9563557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 83C8260236 for ; Wed, 8 Feb 2017 22:03:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D12C284F6 for ; Wed, 8 Feb 2017 22:03:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 61D4728530; Wed, 8 Feb 2017 22:03:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9BAB284FF for ; Wed, 8 Feb 2017 22:03:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751103AbdBHWDA (ORCPT ); Wed, 8 Feb 2017 17:03:00 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:32910 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751128AbdBHWC4 (ORCPT ); Wed, 8 Feb 2017 17:02:56 -0500 Received: by mail-io0-f195.google.com with SMTP id 101so364838iom.0; Wed, 08 Feb 2017 14:00:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=swC/y4J7MIeumAsNFDSYeZtuIAStOZ9TPtc7HtPteuo=; b=gUzhEjaz6A3VRU1HhxlEM0H6UtzI/A6HKJT1O/AKwOyCWhM3JEfcxNjUQKZrIHO4Em 1usu9lFObgIlQEDlyBZ5jeOTT8n0bMKNs7uhU8V7emh2QC8lXLC7BJGNawlETZ9wvAZk jUBrh4QCvC1NumGrq1pBDDtYZLyAaUn8vmVAox9hsnr2mosSKUTQmLPvKWUCc/toilV8 9YTmjVc3Nm54tzkLt/a9lAsX71SiLxTokC9fg6wNcvQkuengsNTyZvSNPAYXdLpwSPDP 4GDxg/h8krjBv6/rI4Axn3N4JI6m5++zHzasyMhmQ6ZsQubieP7xfYZCVPRJqxesEA3o bsvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=swC/y4J7MIeumAsNFDSYeZtuIAStOZ9TPtc7HtPteuo=; b=ms8pTdpZtGvYS7jpbFicc+Ad/6sI1+mBX8StluUEJ0H6wD4i2fccUJLlLBY/7UDKUQ nbVDe8l2VCpMmrYIIHJiTLyt6+o0Tn0/PRu3tAv2LHDvOY5wx5+42ln2L79gM7N3zuL5 322QZ72h6Qd3rO3qYK/5yx2cAafoolL3St/q+yD+VEsYjnCL95DLUZ8DmUScxVI3NVOJ PeFbYv5O4yf6qHIGaQiLjAR1tOvIERhfNZ/3gY6p6+yBdg1kHpgajGxXwJmqMmZ+GAnE 3lRHy9PrSb1QfAseAw0nGW9OrUMO5V74lgIdv5NsYU+TjKYTlkgxuVNqJGdvgPXdIYON sw6w== X-Gm-Message-State: AMke39lUJbkatQQWTODknSYda+1E3+HYCFbJ91CdUXuZANYId1c2G0WTFZuRoPpIBEmf/Q== X-Received: by 10.107.47.97 with SMTP id j94mr449975ioo.8.1486591236303; Wed, 08 Feb 2017 14:00:36 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id c124sm12015757ioc.39.2017.02.08.14.00.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Feb 2017 14:00:35 -0800 (PST) Subject: [PATCH v3 07/12] xprtrdma: Handle stale connection rejection From: Chuck Lever To: anna.schumaker@netapp.com Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Wed, 08 Feb 2017 17:00:35 -0500 Message-ID: <20170208220035.7152.1001.stgit@manet.1015granger.net> In-Reply-To: <20170208214854.7152.83331.stgit@manet.1015granger.net> References: <20170208214854.7152.83331.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A server rejects a connection attempt with STALE_CONNECTION when a client attempts to connect to a working remote service, but uses a QPN and GUID that corresponds to an old connection that was abandoned. This might occur after a client crashes and restarts. Fix rpcrdma_conn_upcall() to distinguish between a normal rejection and rejection of stale connection parameters. As an additional clean-up, remove the code that retries the connection attempt with different ORD/IRD values. Code audit of other ULP initiators shows no similar special case handling of initiator_depth or responder_resources. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 66 ++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 45 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 61d16c3..d1ee33f 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -54,6 +54,7 @@ #include #include #include /* try_module_get()/module_put() */ +#include #include "xprt_rdma.h" @@ -279,7 +280,14 @@ connstate = -ENETDOWN; goto connected; case RDMA_CM_EVENT_REJECTED: +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) + pr_info("rpcrdma: connection to %pIS:%u on %s rejected: %s\n", + sap, rpc_get_port(sap), ia->ri_device->name, + rdma_reject_msg(id, event->status)); +#endif connstate = -ECONNREFUSED; + if (event->status == IB_CM_REJ_STALE_CONN) + connstate = -EAGAIN; goto connected; case RDMA_CM_EVENT_DISCONNECTED: connstate = -ECONNABORTED; @@ -643,20 +651,21 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id) int rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) { + struct rpcrdma_xprt *r_xprt = container_of(ia, struct rpcrdma_xprt, + rx_ia); struct rdma_cm_id *id, *old; + struct sockaddr *sap; + unsigned int extras; int rc = 0; - int retry_count = 0; if (ep->rep_connected != 0) { - struct rpcrdma_xprt *xprt; retry: dprintk("RPC: %s: reconnecting...\n", __func__); rpcrdma_ep_disconnect(ep, ia); - xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); - id = rpcrdma_create_id(xprt, ia, - (struct sockaddr *)&xprt->rx_data.addr); + sap = (struct sockaddr *)&r_xprt->rx_data.addr; + id = rpcrdma_create_id(r_xprt, ia, sap); if (IS_ERR(id)) { rc = -EHOSTUNREACH; goto out; @@ -711,51 +720,18 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id) } wait_event_interruptible(ep->rep_connect_wait, ep->rep_connected != 0); - - /* - * Check state. A non-peer reject indicates no listener - * (ECONNREFUSED), which may be a transient state. All - * others indicate a transport condition which has already - * undergone a best-effort. - */ - if (ep->rep_connected == -ECONNREFUSED && - ++retry_count <= RDMA_CONNECT_RETRY_MAX) { - dprintk("RPC: %s: non-peer_reject, retry\n", __func__); - goto retry; - } if (ep->rep_connected <= 0) { - /* Sometimes, the only way to reliably connect to remote - * CMs is to use same nonzero values for ORD and IRD. */ - if (retry_count++ <= RDMA_CONNECT_RETRY_MAX + 1 && - (ep->rep_remote_cma.responder_resources == 0 || - ep->rep_remote_cma.initiator_depth != - ep->rep_remote_cma.responder_resources)) { - if (ep->rep_remote_cma.responder_resources == 0) - ep->rep_remote_cma.responder_resources = 1; - ep->rep_remote_cma.initiator_depth = - ep->rep_remote_cma.responder_resources; + if (ep->rep_connected == -EAGAIN) goto retry; - } rc = ep->rep_connected; - } else { - struct rpcrdma_xprt *r_xprt; - unsigned int extras; - - dprintk("RPC: %s: connected\n", __func__); - - r_xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); - extras = r_xprt->rx_buf.rb_bc_srv_max_requests; - - if (extras) { - rc = rpcrdma_ep_post_extra_recv(r_xprt, extras); - if (rc) { - pr_warn("%s: rpcrdma_ep_post_extra_recv: %i\n", - __func__, rc); - rc = 0; - } - } + goto out; } + dprintk("RPC: %s: connected\n", __func__); + extras = r_xprt->rx_buf.rb_bc_srv_max_requests; + if (extras) + rpcrdma_ep_post_extra_recv(r_xprt, extras); + out: if (rc) ep->rep_connected = rc;