From patchwork Wed Nov 1 07:13:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changwei Ge X-Patchwork-Id: 10035935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 50B8B6032D for ; Wed, 1 Nov 2017 07:15:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 406D628B55 for ; Wed, 1 Nov 2017 07:15:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3526C28B5D; Wed, 1 Nov 2017 07:15:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C960D28B55 for ; Wed, 1 Nov 2017 07:15:23 +0000 (UTC) Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id vA17F9et010988 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Nov 2017 07:15:10 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vA17EndK017486 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Nov 2017 07:14:49 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1e9nEf-0005PL-AS; Wed, 01 Nov 2017 00:14:49 -0700 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1e9nEG-0005Oi-TF for ocfs2-devel@oss.oracle.com; Wed, 01 Nov 2017 00:14:24 -0700 Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vA17EOun008957 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Wed, 1 Nov 2017 07:14:24 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vA17C18Z020765 for ; Wed, 1 Nov 2017 07:14:24 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=ge.changwei@h3c.com Received: from h3cmg01-ex.h3c.com (smtp.h3c.com [60.191.123.56]) by userp2040.oracle.com with ESMTP id 2dxxvjmpk6-1 for ; Wed, 01 Nov 2017 07:14:23 +0000 Received: from BJHUB01-EX.srv.huawei-3com.com (unknown [10.63.20.169]) by h3cmg01-ex.h3c.com with smtp id 499f_0160_3c0a005e_5143_48a5_97bc_85799d760f7c; Wed, 01 Nov 2017 15:13:31 +0800 Received: from H3CMLB14-EX.srv.huawei-3com.com ([fe80::f804:6772:bd71:f07f]) by BJHUB01-EX.srv.huawei-3com.com ([::1]) with mapi id 14.03.0248.002; Wed, 1 Nov 2017 15:13:21 +0800 From: Changwei Ge To: piaojun , "akpm@linux-foundation.org" , "mfasheh@versity.com" , Joel Becker , "srinivas.eeda@oracle.com" Thread-Topic: [Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres Thread-Index: AQHTUq8RSRIN/65fFkWTE7iD+T3G5g== Date: Wed, 1 Nov 2017 07:13:19 +0000 Message-ID: <63ADC13FD55D6546B7DECE290D39E373CED73541@H3CMLB14-EX.srv.huawei-3com.com> References: <59F91FEF.5020609@huawei.com> <63ADC13FD55D6546B7DECE290D39E373CED73303@H3CMLB14-EX.srv.huawei-3com.com> <59F96136.3070307@huawei.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.125.136.231] MIME-Version: 1.0 X-CLX-Shades: MLX X-CLX-Response: 1TFkXBx4cEQpZTRdnZnIRCllJFxpxGhAadwYHGBxxGx0QGRh3BgcYGgYaEQp ZXhdoY3kRCklGF0VYS0lGT3VaWEVOX0leQ0VEGXVPSxEKQ04XXhsfb095bWJ4THJlX2JQWk1wbX luQ35SZ2sTHlIYaX8RClhcFx8EGgQbGxMHG0gaThhOS08FGxoEGxoaBB4SBBsQGx4aHxoRCl5ZF 3hmUkBbEQpNXBcHGxkRCkxaF2h4bkJNUxEKRVkXb2sRCkxfF3oFBQUFBQUFBQVvEQpNThdoEQpM RhdvY2sRCkNaFxwaBBsTGwQbGBkEHxwRCkJeFxsRCkReFxIRCkRJFx4RCkJGF2JQUHxJQH5fUk5 iEQpCXBcaEQpCRRduGVhMXmEBcFJMYREKQk4XZEJ8WkVEQWIdZFARCkJMF29+XU0YBV1mGlJ7EQ pCbBdkYU9LYEJIEngdZxEKQkAXaF1of1JLUmhkAW0RCkJYF2J9b3kBTxgZcHB7EQpwZxdmHkluS RkdelNjQxAaEQpwaBdiZk1SYhJbRX8SWBAaEQpwaBdhXl9Ff3lITEdMRhAaEQpwaBdlfWcYbm99 RHJGRxAHGRoRCnBoF2YaWERwb2BEQ2NkEBoRCnBoF2d9HB5yX1sBG1pAEBoRCnBnF21acgFDT1x yRU9PEBoRCnBsF21OG29TAUdSSB1zEAcZGhEKbX4XGhEKWE0XSxEg X-PDR: PASS X-ServerName: smtp.h3c.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:60.191.123.56 ip4:60.191.123.50 ip4:221.12.31.13 ip4:221.12.31.56 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8701 signatures=668609 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=-46 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711010105 Cc: "ocfs2-devel@oss.oracle.com" Subject: Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0022.oracle.com [156.151.31.74] X-Virus-Scanned: ClamAV using ClamSMTP Hi Jun, I probably get your point. You mean that dlm finds no lock resource to be migrated and no more lock resource is managed by its hash table. After that a node dies all of a sudden and the dead node is put into dlm's recovery map, right? Furthermore, a lock resource is migrated from other nodes and local node has already asserted master to them. If so, I want to suggest a easier way to solve it. We don't have to add a new flag to dlm structure, just leverage existed dlm status and bitmap. It will bring a bonus - no lock resource will be marked with RECOVERING, it's a safer way, I suppose. Please take a review. Thanks, Changwei Subject: [PATCH] ocfs2/dlm: a node can't be involved in recovery if it is being shutdown Signed-off-by: Changwei Ge --- fs/ocfs2/dlm/dlmdomain.c | 4 ++++ fs/ocfs2/dlm/dlmrecovery.c | 3 +++ 2 files changed, 7 insertions(+) dlm->name, idx); diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index a2b19fbdcf46..5e9283e509a4 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -707,11 +707,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm) * want new domain joins to communicate with us at * least until we've completed migration of our * resources. */ + spin_lock(&dlm->spinlock); dlm->dlm_state = DLM_CTXT_IN_SHUTDOWN; + spin_unlock(&dlm->spinlock); leave = 1; } spin_unlock(&dlm_domain_lock); + dlm_wait_for_recovery(dlm); + if (leave) { mlog(0, "shutting down domain %s\n", dlm->name); dlm_begin_exit_domain(dlm); diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 74407c6dd592..764c95b2b35c 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2441,6 +2441,9 @@ static void __dlm_hb_node_down(struct dlm_ctxt *dlm, int idx) { assert_spin_locked(&dlm->spinlock); + if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) + return; + if (dlm->reco.new_master == idx) { mlog(0, "%s: recovery master %d just died\n",