From patchwork Tue Oct 28 22:24:15 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Eeda X-Patchwork-Id: 5182271 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 5CD9EC11AC for ; Tue, 28 Oct 2014 22:28:32 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4B6692017A for ; Tue, 28 Oct 2014 22:28:31 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2EE2F20179 for ; Tue, 28 Oct 2014 22:28:30 +0000 (UTC) Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s9SMRmgc016619 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 28 Oct 2014 22:27:49 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s9SMRicO028188 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 Oct 2014 22:27:44 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XjFBr-0008AE-65; Tue, 28 Oct 2014 15:24:35 -0700 Received: from acsinet21.oracle.com ([141.146.126.237]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XjFBb-00089s-Hl for ocfs2-devel@oss.oracle.com; Tue, 28 Oct 2014 15:24:19 -0700 Received: from ca-server1.us.oracle.com (ca-server1.us.oracle.com [139.185.48.5]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s9SMOJtb006975 for ; Tue, 28 Oct 2014 22:24:19 GMT Received: from seeda by ca-server1.us.oracle.com with local (Exim 4.69) (envelope-from ) id 1XjFBa-0007tS-D4 for ocfs2-devel@oss.oracle.com; Tue, 28 Oct 2014 15:24:18 -0700 From: Srinivas Eeda To: ocfs2-devel@oss.oracle.com Date: Tue, 28 Oct 2014 15:24:15 -0700 Message-Id: <1414535056-30193-1-git-send-email-srinivas.eeda@oracle.com> X-Mailer: git-send-email 1.7.5.1 Subject: [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Node A sends master query request to node B which is the master. At this time lockres happens to be on purgelist. dlm_master_request_handler gets the dlm spinlock, finds the resource and releases the dlm spin lock. Right at this dlm_thread on this node could purge the lockres. dlm_master_request_handler can then acquire lockres spinlock and reply to Node A that node B is the master even though lockres on node B is purged. The above scenario will now make node A falsely think node B is the master which is inconsistent. Further if another node C tries to master the same resource, every node will respond they are not the master. Node C then masters the resource and sends assert master to all nodes. This will now make node A crash with the following message. dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current owner is 10! Signed-off-by: Srinivas Eeda Reviewed-by: Wengang Wang Tested-by: Joseph Qi --- fs/ocfs2/dlm/dlmmaster.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 215e41a..3689b35 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -1460,6 +1460,18 @@ way_up_top: /* take care of the easy cases up front */ spin_lock(&res->spinlock); + + /* + * Right after dlm spinlock was released, dlm_thread could have + * purged the lockres. Check if lockres got unhashed. If so + * start over. + */ + if (hlist_unhashed(&res->hash_node)) { + spin_unlock(&res->spinlock); + dlm_lockres_put(res); + goto way_up_top; + } + if (res->state & (DLM_LOCK_RES_RECOVERING| DLM_LOCK_RES_MIGRATING)) { spin_unlock(&res->spinlock);