From patchwork Thu Jul 28 21:06:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 9251695 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1DD7260757 for ; Thu, 28 Jul 2016 21:07:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E4E427E5A for ; Thu, 28 Jul 2016 21:07:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02E6027F0B; Thu, 28 Jul 2016 21:07:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B1FD827EED for ; Thu, 28 Jul 2016 21:07:11 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6SL71Fe021197 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 28 Jul 2016 21:07:02 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u6SL71fe006807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 28 Jul 2016 21:07:01 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bSsWD-0003Rr-Cs; Thu, 28 Jul 2016 14:07:01 -0700 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bSsVP-0003Os-IV for ocfs2-devel@oss.oracle.com; Thu, 28 Jul 2016 14:06:11 -0700 Received: from userp1030.oracle.com (userp1030.oracle.com [156.151.31.80]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u6SL6Bch004680 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 28 Jul 2016 21:06:11 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by userp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6SL6AMp012949 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 28 Jul 2016 21:06:10 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.11/8.16.0.11) with SMTP id u6SL5a5V020820 for ; Thu, 28 Jul 2016 21:06:10 GMT Authentication-Results: oracle.com; spf=pass smtp.mail=akpm@linux-foundation.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2040.oracle.com with ESMTP id 24fm3hbeyf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 28 Jul 2016 21:06:10 +0000 Received: from akpm3.mtv.corp.google.com (unknown [104.132.1.73]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id D4AE5954; Thu, 28 Jul 2016 21:06:08 +0000 (UTC) Date: Thu, 28 Jul 2016 14:06:08 -0700 From: akpm@linux-foundation.org To: ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, piaojun@huawei.com, jlbec@evilplan.org, joseph.qi@huawei.com, junxiao.bi@oracle.com, mfasheh@suse.de, xuejiufei@huawei.com Message-ID: <579a73c0.oJg/IWMTrvYSga3x%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 X-ServerName: mail.linuxfoundation.org X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:140.211.169.12/30 include:_spf.google.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8240 signatures=670720 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607280210 Subject: [Ocfs2-devel] [patch 5/5] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Virus-Scanned: ClamAV using ClamSMTP From: piaojun Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master goes down We found a dlm-blocked situation caused by continuous breakdown of recovery masters described below. To solve this problem, we should purge recovery lock once detecting recovery master goes down. N3 N2 N1(reco master) go down pick up recovery lock and begin recoverying for N2 go down pick up recovery lock failed, then purge it: dlm_purge_lockres ->DROPPING_REF is set send deref to N1 failed, recovery lock is not purged find N1 go down, begin recoverying for N1, but blocked in dlm_do_recovery as DROPPING_REF is set: dlm_do_recovery ->dlm_pick_recovery_master ->dlmlock ->dlm_get_lock_resource ->__dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_DROPPING_REF); Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down") Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com Signed-off-by: Jun Piao Reviewed-by: Joseph Qi Reviewed-by: Jiufei Xue Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Reviewed-by: Mark Fasheh --- fs/ocfs2/dlm/dlmcommon.h | 2 + fs/ocfs2/dlm/dlmmaster.c | 37 ++---------------------- fs/ocfs2/dlm/dlmrecovery.c | 29 ++++++++++++++----- fs/ocfs2/dlm/dlmthread.c | 52 ++++++++++++++++++++++++++++++++--- 4 files changed, 74 insertions(+), 46 deletions(-) diff -puN fs/ocfs2/dlm/dlmcommon.h~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down fs/ocfs2/dlm/dlmcommon.h --- a/fs/ocfs2/dlm/dlmcommon.h~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down +++ a/fs/ocfs2/dlm/dlmcommon.h @@ -1004,6 +1004,8 @@ int dlm_finalize_reco_handler(struct o2n int dlm_do_master_requery(struct dlm_ctxt *dlm, struct dlm_lock_resource *res, u8 nodenum, u8 *real_master); +void __dlm_do_purge_lockres(struct dlm_ctxt *dlm, + struct dlm_lock_resource *res); int dlm_dispatch_assert_master(struct dlm_ctxt *dlm, struct dlm_lock_resource *res, diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down fs/ocfs2/dlm/dlmmaster.c --- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down +++ a/fs/ocfs2/dlm/dlmmaster.c @@ -2425,51 +2425,20 @@ int dlm_deref_lockres_done_handler(struc mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done " "but it is already derefed!\n", dlm->name, res->lockname.len, res->lockname.name, node); - dlm_lockres_put(res); ret = 0; goto done; } - if (!list_empty(&res->purge)) { - mlog(0, "%s: Removing res %.*s from purgelist\n", - dlm->name, res->lockname.len, res->lockname.name); - list_del_init(&res->purge); - dlm_lockres_put(res); - dlm->purge_count--; - } - - if (!__dlm_lockres_unused(res)) { - mlog(ML_ERROR, "%s: res %.*s in use after deref\n", - dlm->name, res->lockname.len, res->lockname.name); - __dlm_print_one_lock_resource(res); - BUG(); - } - - __dlm_unhash_lockres(dlm, res); - - spin_lock(&dlm->track_lock); - if (!list_empty(&res->tracking)) - list_del_init(&res->tracking); - else { - mlog(ML_ERROR, "%s: Resource %.*s not on the Tracking list\n", - dlm->name, res->lockname.len, res->lockname.name); - __dlm_print_one_lock_resource(res); - } - spin_unlock(&dlm->track_lock); - - /* lockres is not in the hash now. drop the flag and wake up - * any processes waiting in dlm_get_lock_resource. - */ - res->state &= ~DLM_LOCK_RES_DROPPING_REF; + __dlm_do_purge_lockres(dlm, res); spin_unlock(&res->spinlock); wake_up(&res->wq); - dlm_lockres_put(res); - spin_unlock(&dlm->spinlock); ret = 0; done: + if (res) + dlm_lockres_put(res); dlm_put(dlm); return ret; } diff -puN fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down fs/ocfs2/dlm/dlmrecovery.c --- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down +++ a/fs/ocfs2/dlm/dlmrecovery.c @@ -2343,6 +2343,7 @@ static void dlm_do_local_recovery_cleanu struct dlm_lock_resource *res; int i; struct hlist_head *bucket; + struct hlist_node *tmp; struct dlm_lock *lock; @@ -2365,7 +2366,7 @@ static void dlm_do_local_recovery_cleanu */ for (i = 0; i < DLM_HASH_BUCKETS; i++) { bucket = dlm_lockres_hash(dlm, i); - hlist_for_each_entry(res, bucket, hash_node) { + hlist_for_each_entry_safe(res, tmp, bucket, hash_node) { /* always prune any $RECOVERY entries for dead nodes, * otherwise hangs can occur during later recovery */ if (dlm_is_recovery_lock(res->lockname.name, @@ -2386,8 +2387,17 @@ static void dlm_do_local_recovery_cleanu break; } } - dlm_lockres_clear_refmap_bit(dlm, res, - dead_node); + + if ((res->owner == dead_node) && + (res->state & DLM_LOCK_RES_DROPPING_REF)) { + dlm_lockres_get(res); + __dlm_do_purge_lockres(dlm, res); + spin_unlock(&res->spinlock); + wake_up(&res->wq); + dlm_lockres_put(res); + continue; + } else if (res->owner == dlm->node_num) + dlm_lockres_clear_refmap_bit(dlm, res, dead_node); spin_unlock(&res->spinlock); continue; } @@ -2398,14 +2408,17 @@ static void dlm_do_local_recovery_cleanu if (res->state & DLM_LOCK_RES_DROPPING_REF) { mlog(0, "%s:%.*s: owned by " "dead node %u, this node was " - "dropping its ref when it died. " - "continue, dropping the flag.\n", + "dropping its ref when master died. " + "continue, purging the lockres.\n", dlm->name, res->lockname.len, res->lockname.name, dead_node); + dlm_lockres_get(res); + __dlm_do_purge_lockres(dlm, res); + spin_unlock(&res->spinlock); + wake_up(&res->wq); + dlm_lockres_put(res); + continue; } - res->state &= ~DLM_LOCK_RES_DROPPING_REF; - dlm_move_lockres_to_recovery_list(dlm, - res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff -puN fs/ocfs2/dlm/dlmthread.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down fs/ocfs2/dlm/dlmthread.c --- a/fs/ocfs2/dlm/dlmthread.c~ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down +++ a/fs/ocfs2/dlm/dlmthread.c @@ -160,6 +160,52 @@ void dlm_lockres_calc_usage(struct dlm_c spin_unlock(&dlm->spinlock); } +/* + * Do the real purge work: + * unhash the lockres, and + * clear flag DLM_LOCK_RES_DROPPING_REF. + * It requires dlm and lockres spinlock to be taken. + */ +void __dlm_do_purge_lockres(struct dlm_ctxt *dlm, + struct dlm_lock_resource *res) +{ + assert_spin_locked(&dlm->spinlock); + assert_spin_locked(&res->spinlock); + + if (!list_empty(&res->purge)) { + mlog(0, "%s: Removing res %.*s from purgelist\n", + dlm->name, res->lockname.len, res->lockname.name); + list_del_init(&res->purge); + dlm_lockres_put(res); + dlm->purge_count--; + } + + if (!__dlm_lockres_unused(res)) { + mlog(ML_ERROR, "%s: res %.*s in use after deref\n", + dlm->name, res->lockname.len, res->lockname.name); + __dlm_print_one_lock_resource(res); + BUG(); + } + + __dlm_unhash_lockres(dlm, res); + + spin_lock(&dlm->track_lock); + if (!list_empty(&res->tracking)) + list_del_init(&res->tracking); + else { + mlog(ML_ERROR, "%s: Resource %.*s not on the Tracking list\n", + dlm->name, res->lockname.len, res->lockname.name); + __dlm_print_one_lock_resource(res); + } + spin_unlock(&dlm->track_lock); + + /* + * lockres is not in the hash now. drop the flag and wake up + * any processes waiting in dlm_get_lock_resource. + */ + res->state &= ~DLM_LOCK_RES_DROPPING_REF; +} + static void dlm_purge_lockres(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) { @@ -176,10 +222,8 @@ static void dlm_purge_lockres(struct dlm if (!master) { if (res->state & DLM_LOCK_RES_DROPPING_REF) { - mlog(ML_NOTICE, "%s: res %.*s already in " - "DLM_LOCK_RES_DROPPING_REF state\n", - dlm->name, res->lockname.len, - res->lockname.name); + mlog(ML_NOTICE, "%s: res %.*s already in DLM_LOCK_RES_DROPPING_REF state\n", + dlm->name, res->lockname.len, res->lockname.name); spin_unlock(&res->spinlock); return; }