From patchwork Thu Nov 30 22:24:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 10085727 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 729626064E for ; Thu, 30 Nov 2017 22:25:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61BE72A1FB for ; Thu, 30 Nov 2017 22:25:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5543E2A2DD; Thu, 30 Nov 2017 22:25:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A309C2A1FB for ; Thu, 30 Nov 2017 22:25:06 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id vAUMOths015898 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 30 Nov 2017 22:24:56 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id vAUMOc3q024071 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Nov 2017 22:24:38 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eKXG2-00005P-Qa; Thu, 30 Nov 2017 14:24:38 -0800 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eKXG0-00005A-O4 for ocfs2-devel@oss.oracle.com; Thu, 30 Nov 2017 14:24:36 -0800 Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id vAUMOa1H023977 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Thu, 30 Nov 2017 22:24:36 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vAUMMxMD003565 for ; Thu, 30 Nov 2017 22:24:36 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=akpm@linux-foundation.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2030.oracle.com with ESMTP id 2ejpppp5r9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 30 Nov 2017 22:24:35 +0000 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 7A2D2C16; Thu, 30 Nov 2017 22:24:34 +0000 (UTC) Date: Thu, 30 Nov 2017 14:24:33 -0800 From: akpm@linux-foundation.org To: ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, piaojun@huawei.com, alex.chen@huawei.com, jiangqi903@gmail.com, jiangyiwen@huawei.com, jlbec@evilplan.org, junxiao.bi@oracle.com, mfasheh@versity.com Message-ID: <5a208521.LO9QJOkwf55plmkw%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 X-CLX-Shades: MLX X-CLX-Response: 1TFkXGBIZEQpMehcaEQpZTRdnZnIRCllJFxpxGhAadwYYHBlxHhgYEBp3Bhg aBhoRClleF2hjeREKSUYXRVhLSUZPdVpYRU5fSV5DRUQZdU9LEQpDThdQQlx9E0YcE1h4eRlIY2 VGek9FaFpoSRhofE9cGkReYREKWFwXHwQaBBsbEwcbSBpOGE5LTwUbGgQbGhoEHhIEGxAbHhofG hEKXlkXeHxDRwURCk1cFxkdHhEKTFoXaGlNTV0RCkNaFxseGgQYGxsEGxwTBBsYEQpCXhcbEQpE XhcTEQpCRhdofl1HaEAFBVNBZBEKQlwXGhEKQkUXa3x/YEUYbHNlGmQRCkJOF29vZWIccgVmZl5 hEQpCTBdrfH9gRRhsc2UaZBEKQmwXa3x/YEUYbHNlGmQRCkJAF2JYcl5saE56clNSEQpCWBdifW 95AU8YGXBwexEKWlgXGxEKcGcXbURpa0FMfVpLfEIQGhEKcGgXZBJBW0hpYX4SRV4QGhEKcGgXZ WJrWV5rHE8FSEAQGhEKcGgXZ0tGGmZTb3tvQkEQGhEKcGgXaEBHWXseHmVmQWcQGhEKcGgXZ05F Hnp6E2VreEcQGhEKcGcXZxxYfEtLfmd6e1AQGhEKcEMXbF1gQn1Dc15uZx4QHhIRCm1+FxoRClh NF0sRIA== MIME-Version: 1.0 X-PDR: PASS X-ServerName: mail.linuxfoundation.org X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:140.211.169.12/30 include:_spf.google.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8731 signatures=668635 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=0 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=283 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711300287 Subject: [Ocfs2-devel] [patch 08/11] ocfs2/dlm: wait for dlm recovery done when migrating all lockres X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Virus-Scanned: ClamAV using ClamSMTP From: piaojun Subject: ocfs2/dlm: wait for dlm recovery done when migrating all lockres wait for dlm recovery done when migrating all lockres in case of new lockres to be left after leaving dlm domain. NodeA NodeB NodeC umount and migrate all lockres node down do recovery for NodeB and collect a new lockres form other live nodes leave domain but the new lockres remains mount and join domain request for the owner of the new lockres, but all the other nodes said 'NO', so NodeC decide to the owner, and send do assert msg to other nodes. other nodes receive the msg and found two masters exist. at last cause BUG in dlm_assert_master_handler() -->BUG(); Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_59FFB7AD.90108-40huawei.com&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=eA-mkBvutvIzGPMzaE9P8Cis0buQmhfPdaUQEjK-634&s=zTKHn7nr2VuSBip021NkdyIlj7eSYEoKose-sPG4gA0&e= Fixes: bc9838c4d44a ("dlm: allow dlm do recovery during shutdown") Signed-off-by: Jun Piao Reviewed-by: Alex Chen Reviewed-by: Yiwen Jiang Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Signed-off-by: Andrew Morton --- fs/ocfs2/dlm/dlmcommon.h | 1 + fs/ocfs2/dlm/dlmdomain.c | 14 ++++++++++++++ fs/ocfs2/dlm/dlmrecovery.c | 13 ++++++++++--- 3 files changed, 25 insertions(+), 3 deletions(-) diff -puN fs/ocfs2/dlm/dlmcommon.h~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres fs/ocfs2/dlm/dlmcommon.h --- a/fs/ocfs2/dlm/dlmcommon.h~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres +++ a/fs/ocfs2/dlm/dlmcommon.h @@ -140,6 +140,7 @@ struct dlm_ctxt u8 node_num; u32 key; u8 joining_node; + u8 migrate_done; /* set to 1 means node has migrated all lockres */ wait_queue_head_t dlm_join_events; unsigned long live_nodes_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; unsigned long domain_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; diff -puN fs/ocfs2/dlm/dlmdomain.c~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres fs/ocfs2/dlm/dlmdomain.c --- a/fs/ocfs2/dlm/dlmdomain.c~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres +++ a/fs/ocfs2/dlm/dlmdomain.c @@ -461,6 +461,18 @@ redo_bucket: cond_resched_lock(&dlm->spinlock); num += n; } + + if (!num) { + if (dlm->reco.state & DLM_RECO_STATE_ACTIVE) { + mlog(0, "%s: perhaps there are more lock resources need to " + "be migrated after dlm recovery\n", dlm->name); + ret = -EAGAIN; + } else { + mlog(0, "%s: we won't do dlm recovery after migrating all lockres", + dlm->name); + dlm->migrate_done = 1; + } + } spin_unlock(&dlm->spinlock); wake_up(&dlm->dlm_thread_wq); @@ -2052,6 +2064,8 @@ static struct dlm_ctxt *dlm_alloc_ctxt(c dlm->joining_node = DLM_LOCK_RES_OWNER_UNKNOWN; init_waitqueue_head(&dlm->dlm_join_events); + dlm->migrate_done = 0; + dlm->reco.new_master = O2NM_INVALID_NODE_NUM; dlm->reco.dead_node = O2NM_INVALID_NODE_NUM; diff -puN fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres fs/ocfs2/dlm/dlmrecovery.c --- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-wait-for-dlm-recovery-done-when-migrating-all-lockres +++ a/fs/ocfs2/dlm/dlmrecovery.c @@ -423,12 +423,11 @@ void dlm_wait_for_recovery(struct dlm_ct static void dlm_begin_recovery(struct dlm_ctxt *dlm) { - spin_lock(&dlm->spinlock); + assert_spin_locked(&dlm->spinlock); BUG_ON(dlm->reco.state & DLM_RECO_STATE_ACTIVE); printk(KERN_NOTICE "o2dlm: Begin recovery on domain %s for node %u\n", dlm->name, dlm->reco.dead_node); dlm->reco.state |= DLM_RECO_STATE_ACTIVE; - spin_unlock(&dlm->spinlock); } static void dlm_end_recovery(struct dlm_ctxt *dlm) @@ -456,6 +455,13 @@ static int dlm_do_recovery(struct dlm_ct spin_lock(&dlm->spinlock); + if (dlm->migrate_done) { + mlog(0, "%s: no need do recovery after migrating all lockres\n", + dlm->name); + spin_unlock(&dlm->spinlock); + return 0; + } + /* check to see if the new master has died */ if (dlm->reco.new_master != O2NM_INVALID_NODE_NUM && test_bit(dlm->reco.new_master, dlm->recovery_map)) { @@ -490,12 +496,13 @@ static int dlm_do_recovery(struct dlm_ct mlog(0, "%s(%d):recovery thread found node %u in the recovery map!\n", dlm->name, task_pid_nr(dlm->dlm_reco_thread_task), dlm->reco.dead_node); - spin_unlock(&dlm->spinlock); /* take write barrier */ /* (stops the list reshuffling thread, proxy ast handling) */ dlm_begin_recovery(dlm); + spin_unlock(&dlm->spinlock); + if (dlm->reco.new_master == dlm->node_num) goto master_here;