From patchwork Thu Jan 26 17:55:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9539899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DD7AC604AA for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1D1C27F54 for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C69DC27F9F; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 663DB28138 for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753991AbdAZR4l (ORCPT ); Thu, 26 Jan 2017 12:56:41 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:35727 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754038AbdAZR4g (ORCPT ); Thu, 26 Jan 2017 12:56:36 -0500 Received: by mail-io0-f195.google.com with SMTP id m98so5685532iod.2; Thu, 26 Jan 2017 09:55:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=hagC4aH504Yw5LLKZSbBRvMKz63e+O0ohQ/uuP6U1h8=; b=HEyurtbTDNe69NJGg2k14LhCHAKeUy4R/WV6bQ5YLjijMpxjyJyNYS25edGgbr1KSJ 6YbJtu4fKQSyYY1CTzKn0dH7OSAqh7+xLK3w8nwK2LQeVsZ9O4l++7RuVlTaolgnmJb9 uY26rdFN0DQtRasrIqD7TDJePEynwSOTossIzCirOX8YF4wKEQC0u/OnfjRgFuIQhYC1 cfS7BGu3w4DtJF4D2gjo8cJOJYk3/PqmSDqoyf63Piq8j7+b4CW30NRMFyq8+sfCBkJC 9SmolYZPAuUamJhWTkHEqlURStKo1HMON+ib+l4mX008aHsI1OJgcCsh8GXe15QvES44 Cksw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=hagC4aH504Yw5LLKZSbBRvMKz63e+O0ohQ/uuP6U1h8=; b=gEg9RPZy4TxPhM8bft+8Wv5ZDO4vD1R7cAtgNYu4E7J2jIYgXaeOvmgSm/wJE7vnLY 0Ji3cffGLQ9p9zAn/qNcfc3cpcySpVa0MCwkgbc+weLvhPZl3j0lP71rlSr4vfQ2HPvc KonVX51xVInt9N/Oc3k8bzDtZYYoL3kLSE5Ti8Mf8cuWwL0wR3iAX7TQr6KV2WQGiqo6 SsoJ/ca44lVj2F+yF09R3ASZA7QQY87fkyq8etVe/ze9VKMm0r6u9kA5L38RWuC6107F ZOmyNTuKNRXAhvXYxKQr6jYHG2dxzPRrMZbbaC/AgKw5FAnlaahlFsGC+sNHNjhqBPm+ s8Eg== X-Gm-Message-State: AIkVDXIPx4mfkQAE+JReflc0zX+/2zhKt+jDB4vKHIiQMSY9JISpLl2p6ecY1X1oLXDl7g== X-Received: by 10.107.1.149 with SMTP id 143mr4020325iob.179.1485453348601; Thu, 26 Jan 2017 09:55:48 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id c21sm1746784iod.38.2017.01.26.09.55.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 09:55:48 -0800 (PST) Subject: [PATCH v1 2/7] xprtrdma: Handle stale connection rejection From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Thu, 26 Jan 2017 12:55:47 -0500 Message-ID: <20170126175547.5794.86451.stgit@manet.1015granger.net> In-Reply-To: <20170126174806.5794.14678.stgit@manet.1015granger.net> References: <20170126174806.5794.14678.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A server rejects a connection attempt with STALE_CONNECTION when a client attempts to connect to a working remote service, but uses a QPN and GUID that corresponds to an old connection that was abandoned. This might occur after a client crashes and restarts. Fix rpcrdma_conn_upcall() to distinguish between a normal rejection and rejection of stale connection parameters. As an additional clean-up, remove the code that retries the connection attempt with different ORD/IRD values. Code audit of other ULP initiators shows no similar special case handling of initiator_depth or responder_resources. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 65 +++++++++++++------------------------------ 1 file changed, 20 insertions(+), 45 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 61d16c3..45db2b4 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -54,6 +54,7 @@ #include #include #include /* try_module_get()/module_put() */ +#include #include "xprt_rdma.h" @@ -279,7 +280,14 @@ connstate = -ENETDOWN; goto connected; case RDMA_CM_EVENT_REJECTED: +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) + pr_info("rpcrdma: connection to %pIS:%u on %s rejected: %s\n", + sap, rpc_get_port(sap), ia->ri_device->name, + rdma_reject_msg(id, event->status)); +#endif connstate = -ECONNREFUSED; + if (event->status == IB_CM_REJ_STALE_CONN) + connstate = -EAGAIN; goto connected; case RDMA_CM_EVENT_DISCONNECTED: connstate = -ECONNABORTED; @@ -643,20 +651,20 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id) int rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) { + struct rpcrdma_xprt *r_xprt = container_of(ia, struct rpcrdma_xprt, + rx_ia); struct rdma_cm_id *id, *old; + unsigned int extras; int rc = 0; - int retry_count = 0; if (ep->rep_connected != 0) { - struct rpcrdma_xprt *xprt; retry: dprintk("RPC: %s: reconnecting...\n", __func__); rpcrdma_ep_disconnect(ep, ia); - xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); - id = rpcrdma_create_id(xprt, ia, - (struct sockaddr *)&xprt->rx_data.addr); + id = rpcrdma_create_id(r_xprt, ia, + (struct sockaddr *)&r_xprt->rx_data.addr); if (IS_ERR(id)) { rc = -EHOSTUNREACH; goto out; @@ -711,51 +719,18 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id) } wait_event_interruptible(ep->rep_connect_wait, ep->rep_connected != 0); - - /* - * Check state. A non-peer reject indicates no listener - * (ECONNREFUSED), which may be a transient state. All - * others indicate a transport condition which has already - * undergone a best-effort. - */ - if (ep->rep_connected == -ECONNREFUSED && - ++retry_count <= RDMA_CONNECT_RETRY_MAX) { - dprintk("RPC: %s: non-peer_reject, retry\n", __func__); - goto retry; - } if (ep->rep_connected <= 0) { - /* Sometimes, the only way to reliably connect to remote - * CMs is to use same nonzero values for ORD and IRD. */ - if (retry_count++ <= RDMA_CONNECT_RETRY_MAX + 1 && - (ep->rep_remote_cma.responder_resources == 0 || - ep->rep_remote_cma.initiator_depth != - ep->rep_remote_cma.responder_resources)) { - if (ep->rep_remote_cma.responder_resources == 0) - ep->rep_remote_cma.responder_resources = 1; - ep->rep_remote_cma.initiator_depth = - ep->rep_remote_cma.responder_resources; + if (ep->rep_connected == -EAGAIN) goto retry; - } rc = ep->rep_connected; - } else { - struct rpcrdma_xprt *r_xprt; - unsigned int extras; - - dprintk("RPC: %s: connected\n", __func__); - - r_xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); - extras = r_xprt->rx_buf.rb_bc_srv_max_requests; - - if (extras) { - rc = rpcrdma_ep_post_extra_recv(r_xprt, extras); - if (rc) { - pr_warn("%s: rpcrdma_ep_post_extra_recv: %i\n", - __func__, rc); - rc = 0; - } - } + goto out; } + dprintk("RPC: %s: connected\n", __func__); + extras = r_xprt->rx_buf.rb_bc_srv_max_requests; + if (extras) + rpcrdma_ep_post_extra_recv(r_xprt, extras); + out: if (rc) ep->rep_connected = rc;