From patchwork Thu Jul 25 02:44:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11057845 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABA7513B1 for ; Thu, 25 Jul 2019 02:44:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 995A0287C2 for ; Thu, 25 Jul 2019 02:44:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8DB67288AA; Thu, 25 Jul 2019 02:44:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3A03C287C2 for ; Thu, 25 Jul 2019 02:44:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1FF694C43F9; Wed, 24 Jul 2019 19:44:24 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F7DC21FB77 for ; Wed, 24 Jul 2019 19:44:13 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 710091005267; Wed, 24 Jul 2019 22:44:11 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 66E4C2DC; Wed, 24 Jul 2019 22:44:11 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown , Shaun Tancheff Date: Wed, 24 Jul 2019 22:44:01 -0400 Message-Id: <1564022647-17351-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1564022647-17351-1-git-send-email-jsimmons@infradead.org> References: <1564022647-17351-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 2/8] lustre: ptlrpc: Fix an rq_no_reply assertion failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Wei , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Li Wei An OSS had an assertion failure: LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout on bulk GET after 0+0s req@ffff88083a61b400 x1476486691018500/t0(4300509964) o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0 lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc 0/0 LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION( req->rq_no_reply == 0 ) failed: Lustre: soaked-OST0000: Bulk IO write error with 8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1), client will retry: rc -110 LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG Pid: 5432, comm: ll_ost_io03_003 Call Trace: [] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [] lbug_with_loc+0x47/0xb0 [libcfs] [] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc] [] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] [] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc] [] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [] ptlrpc_main+0x12e8/0x1990 [ptlrpc] [] ? pick_next_task_fair+0xd0/0x130 [] ? schedule+0x176/0x3b0 [] ? ptlrpc_main+0x0/0x1990 [ptlrpc] [] kthread+0x96/0xa0 [] child_rip+0xa/0x20 [] ? kthread+0x0/0xa0 [] ? child_rip+0x0/0x20 The thread in tgt_brw_write() had decided not to reply by setting rq_no_reply, right before another thread tried to send an early reply for the request. WC-bug-id: https://jira.whamcloud.com/browse/LU-5537 Lustre-commit: a8d448e4cd5978c546911f98067232bcdd30b651 Signed-off-by: Li Wei Reviewed-on: http://review.whamcloud.com/11740 Reviewed-by: Andreas Dilger Reviewed-by: Johann Lombardi --- fs/lustre/ptlrpc/service.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index a40e964..c9ab9c3 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1098,6 +1098,16 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req) reqcopy->rq_reqmsg = reqmsg; memcpy(reqmsg, req->rq_reqmsg, req->rq_reqlen); + /* + * tgt_brw_read() and tgt_brw_write() may have decided not to reply. + * Without this check, we would fail the rq_no_reply assertion in + * ptlrpc_send_reply(). + */ + if (reqcopy->rq_no_reply) { + rc = -ETIMEDOUT; + goto out; + } + LASSERT(atomic_read(&req->rq_refcount)); /** if it is last refcount then early reply isn't needed */ if (atomic_read(&req->rq_refcount) == 1) {