From patchwork Thu Jan 26 17:55:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9539897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A737B604A9 for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B52427F54 for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 900512819A; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42F9E27F54 for ; Thu, 26 Jan 2017 17:56:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754116AbdAZR4j (ORCPT ); Thu, 26 Jan 2017 12:56:39 -0500 Received: from mail-io0-f193.google.com ([209.85.223.193]:34536 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753991AbdAZR4g (ORCPT ); Thu, 26 Jan 2017 12:56:36 -0500 Received: by mail-io0-f193.google.com with SMTP id c80so5690113iod.1; Thu, 26 Jan 2017 09:55:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=5tuS6MSzt0D/fKNOFsUmjdd/NJjnKjFKWpluwUa7Aec=; b=COdo6Gaz8JVzH+HntUgX1LPh4xmoKFICdUivgocsOxxQl0kp1hT6gHShSQIitZHO1j OayP94jtgTLfjoF40RdbSZ9PZA+k/EuTFIcZbSqIjrjRM9y60xokn1pT8lyVTtAOW/sp w1Kh0pe6VoCrf/GvB3pn8wd9cjhVVgmEna4kzTqAkRZ7C68d1N7U3RTtV2mGFD+I/ctX N4JFNY9XuSCJTBUi1TOjrsOr3nsghTD8pwcL1XnqAM0mqhxer9mQug+SvP944d1KsRTW 5V3YbUglMQ0gCorHhWCt1+22r2Lm1sXpUDgdvn6XwfAAu8GuG7V23fbySem/Ja0LQZ46 QZJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=5tuS6MSzt0D/fKNOFsUmjdd/NJjnKjFKWpluwUa7Aec=; b=OMNeRJZDbKgJuAK2+MPhMBzztDFNQ9O6BFdvcZAAwRbRbv8JzkLYVJuoXORc4c6QTm XmyokAuDa8nhvEmM53Aoxqzbtm8lKH/zRNDZ7X27wZgz0kyT5bYLlnIEg4tJ9cU1LotG noxQ33LEGPcDmAEl2a6o0za8YyHw5qM8ktjFG+bk2heZ57eg0+0PMmHGtqnUQAEhxv/J Hxl1aGFWAJLvOH105BcTdK0XgfdzVtKywp0Y1+6xANdhQv8qPvV7clf+Gb2IFcPkLFGT +XWdkslGfk35zoO9EhW33bpziVq9GJBy/SIFIw0PuXA7EPkQOfm6p3nuzS5aom50XriA cXQg== X-Gm-Message-State: AIkVDXKXqlUplcOem710Biqlt0iocjJ2RG4av+k8d6r/5m8BBlFd8ItsngGQH1browxrgg== X-Received: by 10.107.132.153 with SMTP id o25mr4612623ioi.192.1485453340544; Thu, 26 Jan 2017 09:55:40 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id g130sm12566175ita.10.2017.01.26.09.55.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 09:55:40 -0800 (PST) Subject: [PATCH v1 1/7] xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Thu, 26 Jan 2017 12:55:39 -0500 Message-ID: <20170126175539.5794.5426.stgit@manet.1015granger.net> In-Reply-To: <20170126174806.5794.14678.stgit@manet.1015granger.net> References: <20170126174806.5794.14678.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sriharsha (sriharsha.basavapatna@broadcom.com) reports an occasional double DMA unmap of an FRWR MR when a connection is lost. I see one way this can happen. When a request requires more than one segment or chunk, rpcrdma_marshal_req loops, invoking ->frwr_op_map for each segment (MR) in each chunk. Each call posts a FASTREG Work Request to register one MR. Now suppose that the transport connection is lost part-way through marshaling this request. As part of recovering and resetting that req, rpcrdma_marshal_req invokes ->frwr_op_unmap_safe, which hands all the req's registered FRWRs to the MR recovery thread. But note: FRWR registration is asynchronous. So it's possible that some of these "already registered" FRWRs are fully registered, and some are still waiting for their FASTREG WR to complete. When the connection is lost, the "already registered" frmrs are marked FRMR_IS_VALID, and the "still waiting" WRs flush. Then frwr_wc_fastreg marks these frmrs FRMR_FLUSHED_FR. But thanks to ->frwr_op_unmap_safe, the MR recovery thread is doing an unreg / alloc_mr, a DMA unmap, and marking each of these frwrs FRMR_IS_INVALID, at the same time frwr_wc_fastreg might be running. - If the recovery thread runs last, then the frmr is marked FRMR_IS_INVALID, and life continues. - If frwr_wc_fastreg runs last, the frmr is marked FRMR_FLUSHED_FR, but the recovery thread has already DMA unmapped that MR. When ->frwr_op_map later re-uses this frmr, it sees it is not marked FRMR_IS_INVALID, and tries to recover it before using it, resulting in a second DMA unmap of the same MR. The fix is to guarantee in-flight FASTREG WRs have flushed before MR recovery runs on those FRWRs. It's safe to wait until the RPC is retransmitted. Reported-by: Sriharsha Basavapatna Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/rpc_rdma.c | 1 - 1 file changed, 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index d889883..0ce7a7b 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -781,7 +781,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, return 0; out_unmap: - r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false); return PTR_ERR(iptr); }