From patchwork Thu Jul 17 21:00:40 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tariq Saeed X-Patchwork-Id: 4578961 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 731F2C0514 for ; Thu, 17 Jul 2014 21:02:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6AAA12017D for ; Thu, 17 Jul 2014 21:02:09 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 85CC520160 for ; Thu, 17 Jul 2014 21:02:08 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s6HL1PNF013198 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 17 Jul 2014 21:01:26 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s6HL1Jkb014283 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Jul 2014 21:01:20 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1X7snn-0006i9-O0; Thu, 17 Jul 2014 14:01:19 -0700 Received: from dhcp-5op3-5op4-west-130-35-70-94.usdhcp.oraclecorp.com ([130.35.70.94]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1X7snC-0006g4-W8 for ocfs2-devel@oss.oracle.com; Thu, 17 Jul 2014 14:00:43 -0700 Received: by dhcp-5op3-5op4-west-130-35-70-94.usdhcp.oraclecorp.com (Postfix, from userid 1733) id A4127805EE; Thu, 17 Jul 2014 14:00:42 -0700 (PDT) From: Tariq Saeed To: ocfs2-devel@oss.oracle.com Date: Thu, 17 Jul 2014 14:00:40 -0700 Message-Id: <1405630840-7071-1-git-send-email-tariq.x.saeed@oracle.com> X-Mailer: git-send-email 1.7.1 Subject: [Ocfs2-devel] [PATCH 1/1] ocfs2: race between umount and unfinished remastering during recovery X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Orabug: 19074140 When umount is issued during recovery on the new master that has not finished remastering locks, it triggers BUG() in dlm_send_mig_lockres_msg(). Here is the situation: 1) node A has a lock on resource X mastered by node B. 2) node B dies -> node A sets recovering flag for res X 3) Node C becomes the new master for resources owned by the dead node and is remastering locks of the dead node but has not finished the remastering process yet. 4) umount is issued on node C. 5) During processing of umount, ignoring unfished recovery, node C attempts to migrate resource X to node A. 6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers it a logic error and sends back -EFAULT. 7) node C asserts BUG() upon seeing EFAULT resp from node B. Fix is to delay migrating res X till remastering is finished at which point recovering flag will be cleared on both A and C. Signed-off-by: Tariq Saeed --- fs/ocfs2/dlm/dlmmaster.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 82abf0c..3ec906e 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -2405,6 +2405,10 @@ static int dlm_is_lockres_migrateable(struct dlm_ctxt *dlm, if (res->state & DLM_LOCK_RES_MIGRATING) return 0; + /* delay migration when the lockres is in RECOCERING state */ + if (res->state & DLM_LOCK_RES_RECOVERING) + return 0; + if (res->owner != dlm->node_num) return 0;