From patchwork Thu Feb 27 21:08:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D1B9138D for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 25DAB246A1 for ; Thu, 27 Feb 2020 21:19:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 25DAB246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36CBA21FB3B; Thu, 27 Feb 2020 13:18:58 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 52B5F21FA7D for ; Thu, 27 Feb 2020 13:18:21 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6A0389E1; Thu, 27 Feb 2020 16:18:13 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 68DFD46C; Thu, 27 Feb 2020 16:18:13 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:08:09 -0500 Message-Id: <1582838290-17243-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 021/622] lustre: ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Assertion fails on !desc->bd_registered during retry after ENOMEM. Drop bd_registered flag and exit via cleanup_bulk to ensure that bulk is fully unregistered. Cray-bug-id: MRP-4733 WC-bug-id: https://jira.whamcloud.com/browse/LU-10643 Lustre-commit: 4a81be263079 ("LU-10643 ptlrpc: ptlrpc_register_bulk() LBUG on ENOMEM") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/31228 Reviewed-by: Alexandr Boyko Reviewed-by: Andrew Perepechko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 1 + fs/lustre/ptlrpc/niobuf.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 653a456..67500b5 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -349,6 +349,7 @@ #define OBD_FAIL_PTLRPC_DROP_BULK 0x51a #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK 0x51b #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK 0x51c +#define OBD_FAIL_PTLRPC_BULK_ATTACH 0x521 #define OBD_FAIL_OBD_PING_NET 0x600 #define OBD_FAIL_OBD_LOG_CANCEL_NET 0x601 diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 02ed373..2e866fe 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -179,8 +179,13 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LNET_MD_OP_GET : LNET_MD_OP_PUT); ptlrpc_fill_bulk_md(&md, desc, posted_md); - rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, - LNET_UNLINK, LNET_INS_AFTER, &me_h); + if (posted_md > 0 && posted_md + 1 == total_md && + OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) { + rc = -ENOMEM; + } else { + rc = LNetMEAttach(desc->bd_portal, peer, mbits, 0, + LNET_UNLINK, LNET_INS_AFTER, &me_h); + } if (rc != 0) { CERROR("%s: LNetMEAttach failed x%llu/%d: rc = %d\n", desc->bd_import->imp_obd->obd_name, mbits, @@ -209,6 +214,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) LASSERT(desc->bd_md_count >= 0); mdunlink_iterate_helper(desc->bd_mds, desc->bd_md_max_brw); req->rq_status = -ENOMEM; + desc->bd_registered = 0; return -ENOMEM; } @@ -585,7 +591,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) if (request->rq_bulk) { rc = ptlrpc_register_bulk(request); if (rc != 0) - goto out; + goto cleanup_bulk; /* * All the mds in the request will have the same cpt * encoded in the cookie. So we can just get the first