From patchwork Thu May 22 00:56:33 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 4219591 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 83CDCBEEAB for ; Thu, 22 May 2014 00:56:39 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 945322037A for ; Thu, 22 May 2014 00:56:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A29902026C for ; Thu, 22 May 2014 00:56:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753071AbaEVA4g (ORCPT ); Wed, 21 May 2014 20:56:36 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:47902 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753005AbaEVA4f (ORCPT ); Wed, 21 May 2014 20:56:35 -0400 Received: by mail-ie0-f174.google.com with SMTP id lx4so2841203iec.33 for ; Wed, 21 May 2014 17:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:subject:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=/enKLLyrHgDcRfWwLomATt/vlBGdImHW8Gx8gMslwfc=; b=bDUA3UnK+UlS63fJUmzIlc9UBcVhWrEkvKxjH5CloQFlUGf8l3Pkx8JpZ6HYN+bnCz Oh3z11/Fpk+vOY7kXjFR1+W+5QT+FNJFuQhqoXlxhmW5n1O8tp9ZtgJjsjA+Ci0t2Euy znjXZpjHVhd4LZKjqo63DffKkAeCAZMewkTod9agGz6IziGhGRsIUF9bJvxx9QZE8DyU YyWg67JzGLNqNaSB8+jg0NpJy+rKPfarDCtNFOc7Mxmapagg4UzIkdZBFTChT69mqYaT yMwwdcYTmgObEayqlTxSGsOz/VTQif3INiQQ36Zj1LY68lJjJR4b/qb/CS2VY4SZNPUX FQXw== X-Received: by 10.50.114.4 with SMTP id jc4mr17705454igb.33.1400720195063; Wed, 21 May 2014 17:56:35 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by mx.google.com with ESMTPSA id ri2sm9269312igc.1.2014.05.21.17.56.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 May 2014 17:56:34 -0700 (PDT) From: Chuck Lever Subject: [PATCH v4 16/24] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Wed, 21 May 2014 20:56:33 -0400 Message-ID: <20140522005633.27190.84563.stgit@manet.1015granger.net> In-Reply-To: <20140522004505.27190.58897.stgit@manet.1015granger.net> References: <20140522004505.27190.58897.stgit@manet.1015granger.net> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Devesh Sharma reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 29 ++++++++++++++++++++--------- 1 files changed, 20 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index c80995a..54edf2a 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) if (ep->rep_connected != 0) { struct rpcrdma_xprt *xprt; retry: + dprintk("RPC: %s: reconnecting...\n", __func__); rc = rpcrdma_ep_disconnect(ep, ia); if (rc && rc != -ENOTCONN) dprintk("RPC: %s: rpcrdma_ep_disconnect" @@ -879,7 +880,7 @@ retry: id = rpcrdma_create_id(xprt, ia, (struct sockaddr *)&xprt->rx_data.addr); if (IS_ERR(id)) { - rc = PTR_ERR(id); + rc = -EHOSTUNREACH; goto out; } /* TEMP TEMP TEMP - fail if new device: @@ -893,20 +894,30 @@ retry: printk("RPC: %s: can't reconnect on " "different device!\n", __func__); rdma_destroy_id(id); - rc = -ENETDOWN; + rc = -ENETUNREACH; goto out; } /* END TEMP */ + rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + rdma_destroy_id(id); + rc = -ENETUNREACH; + goto out; + } rdma_destroy_qp(ia->ri_id); rdma_destroy_id(ia->ri_id); ia->ri_id = id; - } - - rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); - if (rc) { - dprintk("RPC: %s: rdma_create_qp failed %i\n", - __func__, rc); - goto out; + } else { + dprintk("RPC: %s: connecting...\n", __func__); + rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + /* do not update ep->rep_connected */ + return -ENETUNREACH; + } } /* XXX Tavor device performs badly with 2K MTU! */