From patchwork Wed May 28 14:34:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 4256001 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E67039F1E7 for ; Wed, 28 May 2014 14:34:13 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F22C62026C for ; Wed, 28 May 2014 14:34:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 042A620212 for ; Wed, 28 May 2014 14:34:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754496AbaE1OeK (ORCPT ); Wed, 28 May 2014 10:34:10 -0400 Received: from mail-ie0-f181.google.com ([209.85.223.181]:49949 "EHLO mail-ie0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754271AbaE1OeJ (ORCPT ); Wed, 28 May 2014 10:34:09 -0400 Received: by mail-ie0-f181.google.com with SMTP id rp18so8796475iec.40 for ; Wed, 28 May 2014 07:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:subject:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=/enKLLyrHgDcRfWwLomATt/vlBGdImHW8Gx8gMslwfc=; b=tjSHfXHMGREg7zIDwhQMTJxzGYzXGUSK1oRS6X8ejCAbLuw91w4UNzMj2alO2g9lcv NwNcLv0SGgilQvEtdIsRl02hLmxZvNenZfKdmM4lHQN3bmbUtypZA7btAeq/iCyM9nrp ArMtszSFYGAs+QqeK9NG0lDstc60oPMmkXVxT5xiIpv6DYgfYg11hZxYS6RTf9wA5dib 2q3YbLN9vC5E/6mpTGUTMo4+wmkCAlUDXSlDiEALElAMxj/2DsLkrMkedEJYye6kcByk XI6ZVQMnWvTHhQSfQz7GyGnd/4GXqOxLpuCzwGIX5QuHOaw5AAtbFCqC+8rHgaf5H4W8 yEkw== X-Received: by 10.43.66.202 with SMTP id xr10mr11841583icb.77.1401287649113; Wed, 28 May 2014 07:34:09 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by mx.google.com with ESMTPSA id hy3sm4063190igb.1.2014.05.28.07.34.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 May 2014 07:34:08 -0700 (PDT) From: Chuck Lever Subject: [PATCH v5 16/24] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Cc: Anna.Schumaker@netapp.com Date: Wed, 28 May 2014 10:34:07 -0400 Message-ID: <20140528143407.23214.2601.stgit@manet.1015granger.net> In-Reply-To: <20140528142521.23214.39655.stgit@manet.1015granger.net> References: <20140528142521.23214.39655.stgit@manet.1015granger.net> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Devesh Sharma reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 29 ++++++++++++++++++++--------- 1 files changed, 20 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index c80995a..54edf2a 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) if (ep->rep_connected != 0) { struct rpcrdma_xprt *xprt; retry: + dprintk("RPC: %s: reconnecting...\n", __func__); rc = rpcrdma_ep_disconnect(ep, ia); if (rc && rc != -ENOTCONN) dprintk("RPC: %s: rpcrdma_ep_disconnect" @@ -879,7 +880,7 @@ retry: id = rpcrdma_create_id(xprt, ia, (struct sockaddr *)&xprt->rx_data.addr); if (IS_ERR(id)) { - rc = PTR_ERR(id); + rc = -EHOSTUNREACH; goto out; } /* TEMP TEMP TEMP - fail if new device: @@ -893,20 +894,30 @@ retry: printk("RPC: %s: can't reconnect on " "different device!\n", __func__); rdma_destroy_id(id); - rc = -ENETDOWN; + rc = -ENETUNREACH; goto out; } /* END TEMP */ + rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + rdma_destroy_id(id); + rc = -ENETUNREACH; + goto out; + } rdma_destroy_qp(ia->ri_id); rdma_destroy_id(ia->ri_id); ia->ri_id = id; - } - - rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); - if (rc) { - dprintk("RPC: %s: rdma_create_qp failed %i\n", - __func__, rc); - goto out; + } else { + dprintk("RPC: %s: connecting...\n", __func__); + rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); + if (rc) { + dprintk("RPC: %s: rdma_create_qp failed %i\n", + __func__, rc); + /* do not update ep->rep_connected */ + return -ENETUNREACH; + } } /* XXX Tavor device performs badly with 2K MTU! */