From patchwork Wed Nov 9 19:04:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9420037 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1A90C6048E for ; Wed, 9 Nov 2016 19:05:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BBFB293B7 for ; Wed, 9 Nov 2016 19:05:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2FDE293C6; Wed, 9 Nov 2016 19:05:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DCF5293B7 for ; Wed, 9 Nov 2016 19:05:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752990AbcKITFA (ORCPT ); Wed, 9 Nov 2016 14:05:00 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:34096 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752941AbcKITE7 (ORCPT ); Wed, 9 Nov 2016 14:04:59 -0500 Received: by mail-it0-f67.google.com with SMTP id q124so23809348itd.1; Wed, 09 Nov 2016 11:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=lEfBW/KtStszuul/nAcrW6/N0Itq83737hhLsrLKMN4=; b=oS7YYujxRnL1Xc0w5om4xXyXHbf+c8C9pvURiB7jxLCDu4CpZxhqiOYpT6pGnLnxK/ /VmHSN72k6EmfzccVjs1qI0zERdGWDK5XZI0gcC0D4s32lD2MOrnBZeV5Ff9zscmDL2X JM55RvDX0vsrzT9H7qPFDKm6zu8lPYUnBmvSWbG1d1tbbYVptpv+SB1D8f4ZSoZq8Iob bYRNwuiuktm9yPnpNR9Va5TNMNHOPXoYnG9ZKYUzHXSQ8/gMrMQWIY4duQnD/9LsutQ7 TOzOoDVePTmvmsrsGdG9DHfeQy/fH/e6nbd7PZsoiyj1mvEsDbw430qxizqpGycpEfGZ LMHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=lEfBW/KtStszuul/nAcrW6/N0Itq83737hhLsrLKMN4=; b=UKlyFbhBRLQav2BpZnVWZpZ2uv74Tq6oMhfmUFxNU04oQs/0rIVd5YcvUNgSjqQV1K S2mKHS1I21azl5gcFpvAuTU2MkQO07wR8jzVoAGzPEJFxzA1XQZEZXPnBFRfxeAOcb0k Muz0FHqWOlG7OT8/o6kwtubZ+hhlzMN1U6VLMw7AIgZ+ADO9/2L+rg49VdujMhO7uiiR yj0AD8nZ9cXo+YY3OmoALrKcW8Ro5nM1xzEiFor32yRBwuWz8TvAHOAGQplajriOyT16 iel9EaeA0sRrxfTrg5us6Ki1/k5i4gaTqsnBbKo3mHifTmfMo6RZeQGW6nrDUyylAX+Y 3XlA== X-Gm-Message-State: ABUngvd12Svp/afzsmomBmEX8zq0PvpbQcLwTH4HMyFbN7Zlwf0Msk9PBvnxxE5EhboUlQ== X-Received: by 10.36.57.20 with SMTP id l20mr1423820ita.54.1478718298097; Wed, 09 Nov 2016 11:04:58 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id q77sm547120itb.22.2016.11.09.11.04.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Nov 2016 11:04:57 -0800 (PST) Subject: [PATCH v1 01/14] xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Wed, 09 Nov 2016 14:04:56 -0500 Message-ID: <20161109190456.15007.87544.stgit@manet.1015granger.net> In-Reply-To: <20161109184735.15007.96507.stgit@manet.1015granger.net> References: <20161109184735.15007.96507.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a LOCALINV WR is flushed, the frmr is marked STALE, then frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs are then recovered when frwr_op_map hunts for an INVALID frmr to use. All other cases that need frmr recovery leave that SGL DMA-mapped. The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL. To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs, alter the recovery logic (rather than the hot frwr_op_unmap_sync path) to distinguish among these cases. This solution also takes care of the case where multiple LOCAL_INV WRs are issued for the same rpcrdma_req, some complete successfully, but some are flushed. Signed-off-by: Chuck Lever Tested-by: Vasco Steinmetz --- net/sunrpc/xprtrdma/frwr_ops.c | 37 ++++++++++++++++++++++--------------- net/sunrpc/xprtrdma/xprt_rdma.h | 3 ++- 2 files changed, 24 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 2109495..26b26be 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -44,18 +44,20 @@ * being done. * * When the underlying transport disconnects, MRs are left in one of - * three states: + * four states: * * INVALID: The MR was not in use before the QP entered ERROR state. - * (Or, the LOCAL_INV WR has not completed or flushed yet). - * - * STALE: The MR was being registered or unregistered when the QP - * entered ERROR state, and the pending WR was flushed. * * VALID: The MR was registered before the QP entered ERROR state. * - * When frwr_op_map encounters STALE and VALID MRs, they are recovered - * with ib_dereg_mr and then are re-initialized. Beause MR recovery + * FLUSHED_FR: The MR was being registered when the QP entered ERROR + * state, and the pending WR was flushed. + * + * FLUSHED_LI: The MR was being invalidated when the QP entered ERROR + * state, and the pending WR was flushed. + * + * When frwr_op_map encounters FLUSHED and VALID MRs, they are recovered + * with ib_dereg_mr and then are re-initialized. Because MR recovery * allocates fresh resources, it is deferred to a workqueue, and the * recovered MRs are placed back on the rb_mws list when recovery is * complete. frwr_op_map allocates another MR for the current RPC while @@ -177,12 +179,15 @@ static void frwr_op_recover_mr(struct rpcrdma_mw *mw) { + enum rpcrdma_frmr_state state = mw->frmr.fr_state; struct rpcrdma_xprt *r_xprt = mw->mw_xprt; struct rpcrdma_ia *ia = &r_xprt->rx_ia; int rc; rc = __frwr_reset_mr(ia, mw); - ib_dma_unmap_sg(ia->ri_device, mw->mw_sg, mw->mw_nents, mw->mw_dir); + if (state != FRMR_FLUSHED_LI) + ib_dma_unmap_sg(ia->ri_device, + mw->mw_sg, mw->mw_nents, mw->mw_dir); if (rc) goto out_release; @@ -262,10 +267,8 @@ } static void -__frwr_sendcompletion_flush(struct ib_wc *wc, struct rpcrdma_frmr *frmr, - const char *wr) +__frwr_sendcompletion_flush(struct ib_wc *wc, const char *wr) { - frmr->fr_state = FRMR_IS_STALE; if (wc->status != IB_WC_WR_FLUSH_ERR) pr_err("rpcrdma: %s: %s (%u/0x%x)\n", wr, ib_wc_status_msg(wc->status), @@ -288,7 +291,8 @@ if (wc->status != IB_WC_SUCCESS) { cqe = wc->wr_cqe; frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe); - __frwr_sendcompletion_flush(wc, frmr, "fastreg"); + frmr->fr_state = FRMR_FLUSHED_FR; + __frwr_sendcompletion_flush(wc, "fastreg"); } } @@ -308,7 +312,8 @@ if (wc->status != IB_WC_SUCCESS) { cqe = wc->wr_cqe; frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe); - __frwr_sendcompletion_flush(wc, frmr, "localinv"); + frmr->fr_state = FRMR_FLUSHED_LI; + __frwr_sendcompletion_flush(wc, "localinv"); } } @@ -328,8 +333,10 @@ /* WARNING: Only wr_cqe and status are reliable at this point */ cqe = wc->wr_cqe; frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe); - if (wc->status != IB_WC_SUCCESS) - __frwr_sendcompletion_flush(wc, frmr, "localinv"); + if (wc->status != IB_WC_SUCCESS) { + frmr->fr_state = FRMR_FLUSHED_LI; + __frwr_sendcompletion_flush(wc, "localinv"); + } complete(&frmr->fr_linv_done); } diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 0d35b76..6e1bba3 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -216,7 +216,8 @@ struct rpcrdma_rep { enum rpcrdma_frmr_state { FRMR_IS_INVALID, /* ready to be used */ FRMR_IS_VALID, /* in use */ - FRMR_IS_STALE, /* failed completion */ + FRMR_FLUSHED_FR, /* flushed FASTREG WR */ + FRMR_FLUSHED_LI, /* flushed LOCALINV WR */ }; struct rpcrdma_frmr {