From patchwork Wed Apr 30 19:31:47 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 4095951 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 53C099F169 for ; Wed, 30 Apr 2014 19:31:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 79DD220131 for ; Wed, 30 Apr 2014 19:31:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8919F20303 for ; Wed, 30 Apr 2014 19:31:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1945961AbaD3Tbt (ORCPT ); Wed, 30 Apr 2014 15:31:49 -0400 Received: from mail-ie0-f176.google.com ([209.85.223.176]:40117 "EHLO mail-ie0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1945938AbaD3Tbt (ORCPT ); Wed, 30 Apr 2014 15:31:49 -0400 Received: by mail-ie0-f176.google.com with SMTP id rd18so2538109iec.35 for ; Wed, 30 Apr 2014 12:31:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:subject:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=/enKLLyrHgDcRfWwLomATt/vlBGdImHW8Gx8gMslwfc=; b=xzbkwG9dF81oq1cLkyHjRmakvYD4PnPgS8ZrD+zDeNhHudrxcpWkrtN7N0RhhYcuuO fvI6IjctrP1kwf+aVa6gUWdmsxIUX96XFV8gevpmWy8MqyXnpWx8KKXvD5581DlhNej7 ccmhgnxTc3Wa+TGli4t6jswywstQOmadurOcIB8Pj8KdpK8o8SiIe60vvjKWfQTcrkLQ j2PjO4yq/oFcidNfBTTiS4v4hBPS1pevCLjeP1e5e6KTq0KaFYyS9nCyJxKgamHvD6/T 2yKcVZg0pn+mVvmWv8jUM1sNRxa6/2SQ4NqntwuGwrC8ygast8GM5PBeG0WdZWedvkHH XGDg== X-Received: by 10.42.224.194 with SMTP id ip2mr3654736icb.91.1398886308692; Wed, 30 Apr 2014 12:31:48 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by mx.google.com with ESMTPSA id n5sm309238igr.0.2014.04.30.12.31.48 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Apr 2014 12:31:48 -0700 (PDT) From: Chuck Lever Subject: [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Cc: Anna.Schumaker@netapp.com Date: Wed, 30 Apr 2014 15:31:47 -0400 Message-ID: <20140430193147.5663.13351.stgit@manet.1015granger.net> In-Reply-To: <20140430191433.5663.16217.stgit@manet.1015granger.net> References: <20140430191433.5663.16217.stgit@manet.1015granger.net> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Devesh Sharma reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 29 ++++++++++++++++++++--------- 1 files changed, 20 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index c80995a..54edf2a 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) if (ep->rep_connected != 0) { struct rpcrdma_xprt *xprt; retry: + dprintk("RPC: %s: reconnecting...\n", __func__); rc = rpcrdma_ep_disconnect(ep, ia); if (rc && rc != -ENOTCONN) dprintk("RPC: %s: rpcrdma_ep_disconnect" @@ -879,7 +880,7 @@ retry: id = rpcrdma_create_id(xprt, ia, (struct sockaddr *)&xprt->rx_data.addr); if (IS_ERR(id)) { - rc = PTR_ERR(id); + rc = -EHOSTUNREACH; goto out; } /* TEMP TEMP TEMP - fail if new device: @@ -893,20 +894,30 @@ retry: printk("RPC: %s: can't reconnect on " "different device!\n", __func__); rdma_destroy_id(id); - rc = -ENETDOWN; + rc = -ENETUNREACH; goto out; } /* END TEMP */ + rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + rdma_destroy_id(id); + rc = -ENETUNREACH; + goto out; + } rdma_destroy_qp(ia->ri_id); rdma_destroy_id(ia->ri_id); ia->ri_id = id; - } - - rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); - if (rc) { - dprintk("RPC: %s: rdma_create_qp failed %i\n", - __func__, rc); - goto out; + } else { + dprintk("RPC: %s: connecting...\n", __func__); + rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + /* do not update ep->rep_connected */ + return -ENETUNREACH; + } } /* XXX Tavor device performs badly with 2K MTU! */