From patchwork Mon Dec 7 03:34:24 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xue jiufei X-Patchwork-Id: 7779921 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E65289F54F for ; Mon, 7 Dec 2015 03:36:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1B81C20551 for ; Mon, 7 Dec 2015 03:36:26 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B13BF2054C for ; Mon, 7 Dec 2015 03:36:24 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id tB73Yslm028577 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 7 Dec 2015 03:34:55 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id tB73YrE3004373 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Dec 2015 03:34:53 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1a5mZh-0000qH-QQ; Sun, 06 Dec 2015 19:34:53 -0800 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1a5mZf-0000q6-Te for ocfs2-devel@oss.oracle.com; Sun, 06 Dec 2015 19:34:52 -0800 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id tB73Yp1G009329 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 7 Dec 2015 03:34:51 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id tB73Yoti002629 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 7 Dec 2015 03:34:51 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.15.0.59/8.15.0.59) with SMTP id tB73Y2NJ032767 for ; Mon, 7 Dec 2015 03:34:50 GMT Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [119.145.14.66]) by userp2030.oracle.com with ESMTP id 1yks8qsejn-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 07 Dec 2015 03:34:50 +0000 Received: from 172.24.1.47 (EHLO szxeml432-hub.china.huawei.com) ([172.24.1.47]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BSG04131; Mon, 07 Dec 2015 11:34:38 +0800 (CST) Received: from [127.0.0.1] (10.177.22.96) by szxeml432-hub.china.huawei.com (10.82.67.209) with Microsoft SMTP Server id 14.3.235.1; Mon, 7 Dec 2015 11:34:32 +0800 Message-ID: <5664FE40.2010105@huawei.com> Date: Mon, 7 Dec 2015 11:34:24 +0800 From: Xue jiufei User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: "ocfs2-devel@oss.oracle.com" , Andrew Morton , Mark Fasheh X-Originating-IP: [10.177.22.96] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.5664FE51.0028, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a58c088014efa19184fc84ef9da74f9f X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:119.145.14.64/30 ip4:58.251.152.64/30 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.74 ip4:194.213.3.16 ip4:194.213.3.17 ip4:206.16.17.72 ip4:119.145.14.199 ip4:58.251.152.179 ip4:119.145.14.52 ip4:58.251.152.52 ~all X-ServerName: szxga03-in.huawei.com X-Proofpoint-Virus-Version: vendor=nai engine=5700 definitions=8007 signatures=670664 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1507310000 definitions=main-1512070067 Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix a race between purge and migratio X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list Reply-To: xuejiufei@huawei.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We found a race between purge and migration when doing code review. Node A put lockres to purgelist before receiving the migrate message from node B which is the master. dlm_mig_lockres_handler finds the lockres and releases the dlm spinlock. Then dlm_thread on node A can purge the lockres. dlm_mig_lockres_handler then gets the lockres spinlock and sends assert master message to tell other nodes that node A is the master now even lockres is purged. If node C set the master to node A, and another node D master the lockres because no node respond he is the master. That will make node C crash when node D send assert master to node C. So check if lockres gets unhashed in dlm_mig_lockres_handler to fix this race. Signed-off-by: Jiufei Xue Reviewed-by: Joseph Qi Reviewed-by: Yiwen Jiang --- fs/ocfs2/dlm/dlmrecovery.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 58eaa5c..84908a0 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1400,11 +1400,33 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data, /* lookup the lock to see if we have a secondary queue for this * already... just add the locks in and this will have its owner * and RECOVERY flag changed when it completes. */ +way_up_top: res = dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len); if (res) { /* this will get a ref on res */ /* mark it as recovering/migrating and hash it */ spin_lock(&res->spinlock); + + /* + * Right after dlm spinlock was released, dlm_thread could have + * purged the lockres. Check if lockres got unhashed. If so + * start over. + */ + if (hlist_unhashed(&res->hash_node)) { + spin_unlock(&res->spinlock); + dlm_lockres_put(res); + goto way_up_top; + } + + /* Wait on the resource purge to complete before continuing */ + if (res->state & DLM_LOCK_RES_DROPPING_REF) { + __dlm_wait_on_lockres_flags(res, + DLM_LOCK_RES_DROPPING_REF); + spin_unlock(&res->spinlock); + dlm_lockres_put(res); + goto way_up_top; + } + if (mres->flags & DLM_MRES_RECOVERY) { res->state |= DLM_LOCK_RES_RECOVERING; } else {