From patchwork Tue Oct 23 22:34:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10653845 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD17013BF for ; Tue, 23 Oct 2018 22:34:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 878D129D09 for ; Tue, 23 Oct 2018 22:34:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 756C029D1E; Tue, 23 Oct 2018 22:34:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2389C29D09 for ; Tue, 23 Oct 2018 22:34:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7195F21F630; Tue, 23 Oct 2018 15:34:29 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E25A421F5C6 for ; Tue, 23 Oct 2018 15:34:26 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A9410AD91; Tue, 23 Oct 2018 22:34:25 +0000 (UTC) From: NeilBrown To: James Simmons , Andreas Dilger , Oleg Drokin Date: Wed, 24 Oct 2018 09:34:17 +1100 In-Reply-To: <878t2q8unf.fsf@notabene.neil.brown.name> References: <1539543498-29105-1-git-send-email-jsimmons@infradead.org> <878t2q8unf.fsf@notabene.neil.brown.name> Message-ID: <87d0s070nq.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Subject: [lustre-devel] [PATCH] lustre: lu_object: fix possible hang waiting for LCS_LEAVING X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP As lu_context_key_quiesce() spins waiting for LCS_LEAVING to change, it is important the we set and then clear in within a non-preemptible region. If the thread that spins pre-empty the thread that sets-and-clears the state while the state is LCS_LEAVING, then it can spin indefinitely, particularly on a single-CPU machine. Also update the comment to explain this dependency. Fixes: ac3f8fd6e61b ("staging: lustre: remove locking from lu_context_exit()") Reviewed-by: James Simmons --- This is the cause of the "something" that went wrong in my recent testing that I mentioned. I wonder if preempt_enable() has recently been enhanced to encourage a preempt, to make this sort of bug easier to see. drivers/staging/lustre/lustre/obdclass/lu_object.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index cb57abf03644..51497c144dd6 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -1654,17 +1654,20 @@ void lu_context_exit(struct lu_context *ctx) unsigned int i; LINVRNT(ctx->lc_state == LCS_ENTERED); - /* - * Ensure lu_context_key_quiesce() sees LCS_LEAVING - * or we see LCT_QUIESCENT - */ - smp_store_mb(ctx->lc_state, LCS_LEAVING); /* * Disable preempt to ensure we get a warning if * any lct_exit ever tries to sleep. That would hurt * lu_context_key_quiesce() which spins waiting for us. + * This also ensure we aren't preempted while the state + * is LCS_LEAVING, as that too would cause problems for + * lu_context_key_quiesce(). */ preempt_disable(); + /* + * Ensure lu_context_key_quiesce() sees LCS_LEAVING + * or we see LCT_QUIESCENT + */ + smp_store_mb(ctx->lc_state, LCS_LEAVING); if (ctx->lc_tags & LCT_HAS_EXIT && ctx->lc_value) { for (i = 0; i < ARRAY_SIZE(lu_keys); ++i) { struct lu_context_key *key; @@ -1677,8 +1680,8 @@ void lu_context_exit(struct lu_context *ctx) } } - preempt_enable(); smp_store_release(&ctx->lc_state, LCS_LEFT); + preempt_enable(); } EXPORT_SYMBOL(lu_context_exit);