From patchwork Wed Feb 8 22:00:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9563553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9864E60236 for ; Wed, 8 Feb 2017 22:03:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 81CF4284F6 for ; Wed, 8 Feb 2017 22:03:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 76C422851E; Wed, 8 Feb 2017 22:03:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2B0E284FF for ; Wed, 8 Feb 2017 22:03:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751050AbdBHWC7 (ORCPT ); Wed, 8 Feb 2017 17:02:59 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:36231 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751103AbdBHWC4 (ORCPT ); Wed, 8 Feb 2017 17:02:56 -0500 Received: by mail-it0-f67.google.com with SMTP id f200so437593itf.3; Wed, 08 Feb 2017 14:00:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=cR9uAPh6D3oI/ozO1leysz8o08LMMGnjlSXtNSawS1Q=; b=BxwyJKB3Z19mOYfClW8umpGfqnDZXvPH80lTrpRL0DRbToCHcx+GEuDPeFd9kK16al d6yVB+3Xgb5xwO4RGCFO8EJKLo0yZmAAFj8OOcywNrI/wRsRzkzM+fHFJB0KGbs6T0Vz XCQ+/AVAw6pVBjDs8qSmJeVlscIHwJsv9Tb8PYJD7WgPm7UvBXS7CVxnm/ZSFq0ok6ye EXrj4QvxnTCJpPmG0PBaq+ECHe61wxl1yMC8+hBikV/nWJzJGjxt1i7iuHUKnOq9/sEM ELTFtRT6c+HdFtxdXqDDAZiKnd3egv+XyO/yBvC1u7wNpABmieMEUbQEAyU0qhgvwbhU XyQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=cR9uAPh6D3oI/ozO1leysz8o08LMMGnjlSXtNSawS1Q=; b=ker2mZu4bVGWcZJk3QL4Za861th4i6Ih434QUrdhB/uoUMBvEYTSGwC4CPPEynKig1 fqJ3QfnldkSxI0BI5bP2caCQK/EkhDEtF0AnNLU/ndhRoxjPcFaUr4dirOhtQwn0Urun 4w0VoZO/dsViiqAxKaB8AvGNHJwcyN+BU1R5aRSEM0EgIEBPjSQL+oKPgmPucoIyLOos slL8e/3n3+Di/ovejMTb5RUQf0hBS6CEfLJd64oZPDGzOWSvcYvoL2XCicBSIFq2+XPp LU9eCJnrtbmLgVmH+iildj8s6lCOHolMI7n+5Qmc6TImXZtoCwGE3lBKFxECDfZzFeKK 8IfQ== X-Gm-Message-State: AMke39nZsZtZKNh6cTp3tcBYk70/R4faNchm8USS4EvhmQblwkVdLVwjX24+CHcZtW03gA== X-Received: by 10.36.65.161 with SMTP id b33mr369972itd.114.1486591228184; Wed, 08 Feb 2017 14:00:28 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id q185sm1740757itc.22.2017.02.08.14.00.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Feb 2017 14:00:27 -0800 (PST) Subject: [PATCH v3 06/12] xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs From: Chuck Lever To: anna.schumaker@netapp.com Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Wed, 08 Feb 2017 17:00:27 -0500 Message-ID: <20170208220027.7152.36270.stgit@manet.1015granger.net> In-Reply-To: <20170208214854.7152.83331.stgit@manet.1015granger.net> References: <20170208214854.7152.83331.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sriharsha (sriharsha.basavapatna@broadcom.com) reports an occasional double DMA unmap of an FRWR MR when a connection is lost. I see one way this can happen. When a request requires more than one segment or chunk, rpcrdma_marshal_req loops, invoking ->frwr_op_map for each segment (MR) in each chunk. Each call posts a FASTREG Work Request to register one MR. Now suppose that the transport connection is lost part-way through marshaling this request. As part of recovering and resetting that req, rpcrdma_marshal_req invokes ->frwr_op_unmap_safe, which hands all the req's registered FRWRs to the MR recovery thread. But note: FRWR registration is asynchronous. So it's possible that some of these "already registered" FRWRs are fully registered, and some are still waiting for their FASTREG WR to complete. When the connection is lost, the "already registered" frmrs are marked FRMR_IS_VALID, and the "still waiting" WRs flush. Then frwr_wc_fastreg marks these frmrs FRMR_FLUSHED_FR. But thanks to ->frwr_op_unmap_safe, the MR recovery thread is doing an unreg / alloc_mr, a DMA unmap, and marking each of these frwrs FRMR_IS_INVALID, at the same time frwr_wc_fastreg might be running. - If the recovery thread runs last, then the frmr is marked FRMR_IS_INVALID, and life continues. - If frwr_wc_fastreg runs last, the frmr is marked FRMR_FLUSHED_FR, but the recovery thread has already DMA unmapped that MR. When ->frwr_op_map later re-uses this frmr, it sees it is not marked FRMR_IS_INVALID, and tries to recover it before using it, resulting in a second DMA unmap of the same MR. The fix is to guarantee in-flight FASTREG WRs have flushed before MR recovery runs on those FRWRs. Thus we depend on ro_unmap_safe (called from xprt_rdma_send_request on retransmit, or from xprt_rdma_free) to clean up old registrations as needed. Reported-by: Sriharsha Basavapatna Signed-off-by: Chuck Lever Tested-by: Sriharsha Basavapatna --- net/sunrpc/xprtrdma/rpc_rdma.c | 14 ++++++++------ net/sunrpc/xprtrdma/transport.c | 4 ---- 2 files changed, 8 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index d889883..72b3ca0 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -759,13 +759,13 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, iptr = headerp->rm_body.rm_chunks; iptr = rpcrdma_encode_read_list(r_xprt, req, rqst, iptr, rtype); if (IS_ERR(iptr)) - goto out_unmap; + goto out_err; iptr = rpcrdma_encode_write_list(r_xprt, req, rqst, iptr, wtype); if (IS_ERR(iptr)) - goto out_unmap; + goto out_err; iptr = rpcrdma_encode_reply_chunk(r_xprt, req, rqst, iptr, wtype); if (IS_ERR(iptr)) - goto out_unmap; + goto out_err; hdrlen = (unsigned char *)iptr - (unsigned char *)headerp; dprintk("RPC: %5u %s: %s/%s: hdrlen %zd rpclen %zd\n", @@ -776,12 +776,14 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, hdrlen, &rqst->rq_snd_buf, rtype)) { iptr = ERR_PTR(-EIO); - goto out_unmap; + goto out_err; } return 0; -out_unmap: - r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false); +out_err: + pr_err("rpcrdma: rpcrdma_marshal_req failed, status %ld\n", + PTR_ERR(iptr)); + r_xprt->rx_stats.failed_marshal_count++; return PTR_ERR(iptr); } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 6990581..c717f54 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -709,10 +709,6 @@ return 0; failed_marshal: - dprintk("RPC: %s: rpcrdma_marshal_req failed, status %i\n", - __func__, rc); - if (rc == -EIO) - r_xprt->rx_stats.failed_marshal_count++; if (rc != -ENOTCONN) return rc; drop_connection: