From patchwork Mon Aug 7 07:13:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changwei Ge X-Patchwork-Id: 9884491 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2BDE160364 for ; Mon, 7 Aug 2017 07:17:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2140E285A9 for ; Mon, 7 Aug 2017 07:17:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 15CD3285B1; Mon, 7 Aug 2017 07:17:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DCC7B285AE for ; Mon, 7 Aug 2017 07:17:52 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v777GJEd025854 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 7 Aug 2017 07:16:19 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v777GG2t028570 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Aug 2017 07:16:17 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1decGu-0001j4-TI; Mon, 07 Aug 2017 00:16:16 -0700 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1decGW-0001fG-OB for ocfs2-devel@oss.oracle.com; Mon, 07 Aug 2017 00:15:52 -0700 Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v777Fqrs028951 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Mon, 7 Aug 2017 07:15:52 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.21/8.16.0.21) with SMTP id v777F8Cf024684 for ; Mon, 7 Aug 2017 07:15:52 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=ge.changwei@h3c.com Received: from h3cmg01-ex.h3c.com (smtp.h3c.com [60.191.123.56]) by userp2030.oracle.com with ESMTP id 2c5ad7whhn-1 for ; Mon, 07 Aug 2017 07:15:51 +0000 Received: from BJHUB01-EX.srv.huawei-3com.com (unknown [10.63.20.169]) by h3cmg01-ex.h3c.com with smtp id 6277_6648_76005900_33c4_4976_b142_1024b5e75bbf; Mon, 07 Aug 2017 15:13:46 +0800 Received: from H3CMLB14-EX.srv.huawei-3com.com ([fe80::f804:6772:bd71:f07f]) by BJHUB01-EX.srv.huawei-3com.com ([::1]) with mapi id 14.03.0248.002; Mon, 7 Aug 2017 15:13:32 +0800 From: Changwei Ge To: Mark Fasheh , Junxiao Bi , Joseph Qi , Joel Becker Thread-Topic: [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability Thread-Index: AdMPTKxRY0/9eS7/SsCUFnVuDHBq6g== Date: Mon, 7 Aug 2017 07:13:31 +0000 Message-ID: <63ADC13FD55D6546B7DECE290D39E373AC2CB721@H3CMLB14-EX.srv.huawei-3com.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.96.76.120] MIME-Version: 1.0 X-CLX-Shades: MLX X-CLX-Response: 1TFkXGxMYEQpMehcaEQpZTRdnZnIRCllJFxpxGhAadwYbHRhxHxsQGncGGBo GGhEKWV4XaGNmEQpJRhdFWEtJRk91WlhFTl9JXkNFRB4RCkNOF0ZJGh54UHh9B2xrGk4ZH3hkG3 MHfElfX3x1WBxNS35AEQpYXBcfBBoEGxseBxxMThMYThtJBRsaBBsaGgQeEgQfEBseGh8aEQpeW Rd7XEhCbhEKTVwXGB8TEQpMWhdpaGlNTV0RCkxGF2NrEQpDWhccGgQbExsEGxgZBB8cEQpCXhcb EQpEXhceEQpESRcfEQpCRhdnE21gG1tlQh9+fREKQlwXGhEKQkUXbhlYTF5hAXBSTGERCkJOF2R CfFpFREFiHWRQEQpCTBdvfl1NGAVdZhpSexEKQmwXZGFPS2BCSBJ4HWcRCkJAF28TXVxjTEZOW3 9YEQpCWBdifW95AU8YGXBwexEKTV4XGxEKcGgXY05SZBlpTRx/X28QGhEKcGgXbnodckZmYl9Cf h8QGhEKcGgXaXNvEnpgUEJwbW0QGhEKcGgXaFIaGUIYU0VwTU4QGhEKcGgXYwFASEJ8ZnlLfUQQ GhEKcGwXbU4bb1MBR1JIHXMQGhEKbX4XGxEKWE0XSxEg X-PDR: PASS X-ServerName: smtp.h3c.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:60.191.123.56 ip4:60.191.123.50 ip4:221.12.31.13 ip4:221.12.31.56 X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8614 signatures=668537 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=0 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=192 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708070125 Cc: "\"Andrew Morton \(akpm\"@linux-foundation.org" <"Andrew Morton (akpm"@linux-foundation.org>, "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Virus-Scanned: ClamAV using ClamSMTP Hi, In current code, while flushing AST, we don't handle an exception that sending AST or BAST is failed. But it is indeed possible that AST or BAST is lost due to some kind of networks fault. If above exception happens, the requesting node will never obtain an AST back, hence, it will never acquire the lock or abort current locking. With this patch, I'd like to fix this issue by re-queuing the AST or BAST if sending is failed due to networks fault. And the re-queuing AST or BAST will be dropped if the requesting node is dead! It will improve the reliability a lot. Thanks. Changwei. Signed-off-by: Changwei Ge Reviewed-by: Mark Fasheh --- fs/ocfs2/dlm/dlmrecovery.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- fs/ocfs2/dlm/dlmthread.c | 39 +++++++++++++++++++++++++++------ 2 files changed, 81 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 74407c6..ddfaf74 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2263,11 +2263,45 @@ static void dlm_revalidate_lvb(struct dlm_ctxt *dlm, } } +static int dlm_drop_pending_ast_bast(struct dlm_ctxt *dlm, + struct dlm_lock *lock) +{ + int reserved = 0; + + spin_lock(&dlm->ast_lock); + if (!list_empty(&lock->ast_list)) { + mlog(0, "%s: drop pending AST for lock(cookie=%u:%llu).\n", + dlm->name, + dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), + dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie))); + list_del_init(&lock->ast_list); + lock->ast_pending = 0; + dlm_lock_put(lock); + reserved++; + } + + if (!list_empty(&lock->bast_list)) { + mlog(0, "%s: drop pending BAST for lock(cookie=%u:%llu).\n", + dlm->name, + dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)), + dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie))); + list_del_init(&lock->bast_list); + lock->bast_pending = 0; + dlm_lock_put(lock); + reserved++; + } + spin_unlock(&dlm->ast_lock); + + return reserved; +} + static void dlm_free_dead_locks(struct dlm_ctxt *dlm, - struct dlm_lock_resource *res, u8 dead_node) + struct dlm_lock_resource *res, u8 dead_node, + int *reserved) { struct dlm_lock *lock, *next; unsigned int freed = 0; + int reserved_tmp = 0; /* this node is the lockres master: * 1) remove any stale locks for the dead node @@ -2284,6 +2318,9 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm, if (lock->ml.node == dead_node) { list_del_init(&lock->list); dlm_lock_put(lock); + + reserved_tmp += dlm_drop_pending_ast_bast(dlm, lock); + /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually */ dlm_lock_put(lock); freed++; @@ -2293,6 +2330,9 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm, if (lock->ml.node == dead_node) { list_del_init(&lock->list); dlm_lock_put(lock); + + reserved_tmp += dlm_drop_pending_ast_bast(dlm, lock); + /* Can't schedule DLM_UNLOCK_FREE_LOCK - do manually */ dlm_lock_put(lock); freed++; @@ -2308,6 +2348,8 @@ static void dlm_free_dead_locks(struct dlm_ctxt *dlm, } } + *reserved = reserved_tmp; + if (freed) { mlog(0, "%s:%.*s: freed %u locks for dead node %u, " "dropping ref from lockres\n", dlm->name, @@ -2367,6 +2409,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) for (i = 0; i < DLM_HASH_BUCKETS; i++) { bucket = dlm_lockres_hash(dlm, i); hlist_for_each_entry_safe(res, tmp, bucket, hash_node) { + int reserved = 0; /* always prune any $RECOVERY entries for dead nodes, * otherwise hangs can occur during later recovery */ if (dlm_is_recovery_lock(res->lockname.name, @@ -2420,7 +2463,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) continue; } } else if (res->owner == dlm->node_num) { - dlm_free_dead_locks(dlm, res, dead_node); + dlm_free_dead_locks(dlm, res, dead_node, &reserved); __dlm_lockres_calc_usage(dlm, res); } else if (res->owner == DLM_LOCK_RES_OWNER_UNKNOWN) { if (test_bit(dead_node, res->refmap)) { @@ -2432,6 +2475,10 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) } } spin_unlock(&res->spinlock); + while (reserved) { + dlm_lockres_release_ast(dlm, res); + reserved--; + } } } diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index 838a06d..c34a619 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -587,13 +587,13 @@ static int dlm_dirty_list_empty(struct dlm_ctxt *dlm) static void dlm_flush_asts(struct dlm_ctxt *dlm) { - int ret; + int ret = 0; struct dlm_lock *lock; struct dlm_lock_resource *res; u8 hi; spin_lock(&dlm->ast_lock); - while (!list_empty(&dlm->pending_asts)) { + while (!list_empty(&dlm->pending_asts) && !ret) { lock = list_entry(dlm->pending_asts.next, struct dlm_lock, ast_list); /* get an extra ref on lock */ @@ -628,8 +628,20 @@ static void dlm_flush_asts(struct dlm_ctxt *dlm) mlog(0, "%s: res %.*s, AST queued while flushing last " "one\n", dlm->name, res->lockname.len, res->lockname.name); - } else - lock->ast_pending = 0; + } else { + if (unlikely(ret < 0)) { + /* If this AST is not sent back successfully, + * there is no chance that the second lock + * request comes. + */ + spin_lock(&res->spinlock); + __dlm_lockres_reserve_ast(res); + spin_unlock(&res->spinlock); + __dlm_queue_ast(dlm, lock); + } else { + lock->ast_pending = 0; + } + } /* drop the extra ref. * this may drop it completely. */ @@ -637,7 +649,9 @@ static void dlm_flush_asts(struct dlm_ctxt *dlm) dlm_lockres_release_ast(dlm, res); } - while (!list_empty(&dlm->pending_basts)) { + ret = 0; + + while (!list_empty(&dlm->pending_basts) && !ret) { lock = list_entry(dlm->pending_basts.next, struct dlm_lock, bast_list); /* get an extra ref on lock */ @@ -650,7 +664,6 @@ static void dlm_flush_asts(struct dlm_ctxt *dlm) spin_lock(&lock->spinlock); BUG_ON(lock->ml.highest_blocked <= LKM_IVMODE); hi = lock->ml.highest_blocked; - lock->ml.highest_blocked = LKM_IVMODE; spin_unlock(&lock->spinlock); /* remove from list (including ref) */ @@ -681,7 +694,19 @@ static void dlm_flush_asts(struct dlm_ctxt *dlm) "one\n", dlm->name, res->lockname.len, res->lockname.name); } else - lock->bast_pending = 0; + if (unlikely(ret)) { + spin_lock(&res->spinlock); + __dlm_lockres_reserve_ast(res); + spin_unlock(&res->spinlock); + __dlm_queue_bast(dlm, lock); + } else { + lock->bast_pending = 0; + /* Set ::highest_blocked to invalid after + * sending BAST successfully so that + * no more BAST would be queued. + */ + lock->ml.highest_blocked = LKM_IVMODE; + } /* drop the extra ref. * this may drop it completely. */