From patchwork Mon Dec 3 12:20:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: wangjian X-Patchwork-Id: 10709437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6E6414BD for ; Mon, 3 Dec 2018 12:22:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C0FF12B4E3 for ; Mon, 3 Dec 2018 12:22:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B4AEB2B4F3; Mon, 3 Dec 2018 12:22:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A84792B4E3 for ; Mon, 3 Dec 2018 12:22:33 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wB3CEfKO064655; Mon, 3 Dec 2018 12:22:25 GMT Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2p3jxr5twg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 03 Dec 2018 12:22:24 +0000 Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id wB3CMHr0009775 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 3 Dec 2018 12:22:18 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1gTnEv-0004pa-6t; Mon, 03 Dec 2018 04:22:17 -0800 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1gTnEW-0004oj-6j for ocfs2-devel@oss.oracle.com; Mon, 03 Dec 2018 04:21:52 -0800 Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id wB3CLpX8006990 (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 3 Dec 2018 12:21:51 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wB3CIXWH010058 for ; Mon, 3 Dec 2018 12:21:51 GMT Received: from huawei.com (szxga07-in.huawei.com [45.249.212.35]) by userp2040.oracle.com with ESMTP id 2p51trtnwr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 03 Dec 2018 12:21:50 +0000 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 5FD9DBC4F597C; Mon, 3 Dec 2018 20:21:42 +0800 (CST) Received: from [10.177.218.134] (10.177.218.134) by smtp.huawei.com (10.3.19.204) with Microsoft SMTP Server (TLS) id 14.3.408.0; Mon, 3 Dec 2018 20:21:38 +0800 To: , , , , From: wangjian Message-ID: <98f0e80c-9c13-dbb6-047c-b40e100082b1@huawei.com> Date: Mon, 3 Dec 2018 20:20:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 Content-Language: en-US X-Originating-IP: [10.177.218.134] X-CFilter-Loop: Reflected X-CLX-Shades: MLX X-CLX-Response: 1TFkXGBwYEQpMehcYGx0RCllNF2dmchEKWUkXGnEaEBp3BhgeGHEYHB4QGnc GGBoGGhEKWV4XaG5mEQpJRhdFWEtJRk91WlhFTl9JXkNFRBl1T0sRCkNOF15jHXB8fmh+aVxPXU UTQhkcfgdneENubBxaYmFseG14EQpYXBcfBBoEGx8fBxwSExtJHEhOBRsaBBsaGgQeEgQbEBseG h8aEQpeWRd+ZVl4HBEKTVwXGBoYEQpMWhdoaUJNXREKTU4XaBEKQ1oXHh8EGB4TBBgbGAQZHxEK Ql4XGxEKRF4XHxEKREkXGBEKQkYXZxNtYBtbZUIffn0RCkJcFxoRCkJFF2VkXURoSW5EQWFvEQp CThdsQkhZUxpNZXh4HREKQkwXYEMFXWloRRNObn4RCkJsF2heTmsaYhxTQnhgEQpCQBdtTFpzGB hiRUUfaBEKQlgXYn1veQFPGBlwcHsRCk1eFxsRClpYFxgRCnBoF3pFZmESBRhrGmxwEBoRCnBoF 2t8QHBcGHNtUlASEBoRCnBoF2ldGWlJfkJffENaEBoRCnBoF2dLcENgAUdNemRrEBoRCnBoF2sB ZUJgQWNBQmZSEBoRCnBsF2QcRgVjWH9wXm1yEB4SEQptfhcbEQpYTRdLESA= X-PDR: PASS X-Source-IP: 45.249.212.35 X-ServerName: szxga07-in.huawei.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:45.249.212.32 ip4:45.249.212.35 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.72 ip4:45.249.212.255 ip4:45.249.212.187/29 ip4:45.249.212.191 ip4:185.176.76.210 ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9095 signatures=668686 X-Proofpoint-DMARC-Record: none X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=217 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=262 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812030117 X-Spam: Clean Cc: ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9095 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812030117 X-Virus-Scanned: ClamAV using ClamSMTP In the dlm_move_lockres_to_recovery_list function, if the lock is in the granted queue and cancel_pending is set, it will encounter a BUG. I think this is a meaningless BUG, so be prepared to remove it. A scenario that causes this BUG will be given below. At the beginning, Node 1 is the master and has NL lock, Node 2 has PR lock, Node 3 has PR lock too. Node 1 Node 2 Node 3 want to get EX lock. want to get EX lock. Node 3 hinder Node 2 to get EX lock, send Node 3 a BAST. receive BAST from Node 1. downconvert thread begin to cancel PR to EX conversion. In dlmunlock_common function, downconvert thread has set lock->cancel_pending, but did not enter dlm_send_remote_unlock_request function. Node2 dies because the host is powered down. In recovery process, clean the lock that related to Node2. then finish Node 3 PR to EX request. give Node 3 a AST. receive AST from Node 1. change lock level to EX, move lock to granted list. Node1 dies because the host is powered down. In dlm_move_lockres_to_recovery_list function. the lock is in the granted queue and cancel_pending is set. BUG_ON. But after clearing this BUG, process will encounter the second BUG in the ocfs2_unlock_ast function. Here is a scenario that will cause the second BUG in ocfs2_unlock_ast as follows: At the beginning, Node 1 is the master and has NL lock, Node 2 has PR lock, Node 3 has PR lock too. Node 1 Node 2 Node 3 want to get EX lock. want to get EX lock. Node 3 hinder Node 2 to get EX lock, send Node 3 a BAST. receive BAST from Node 1. downconvert thread begin to cancel PR to EX conversion. In dlmunlock_common function, downconvert thread has released lock->spinlock and res->spinlock, but did not enter dlm_send_remote_unlock_request function. Node2 dies because the host is powered down. In recovery process, clean the lock that related to Node2. then finish Node 3 PR to EX request. give Node 3 a AST. receive AST from Node 1. change lock level to EX, move lock to granted list, set lockres->l_unlock_action as OCFS2_UNLOCK_INVALID in ocfs2_locking_ast function. Node2 dies because the host is powered down. Node 3 realize that Node 1 is dead, remove Node 1 from domain_map. downconvert thread get DLM_NORMAL from dlm_send_remote_unlock_request function and set *call_ast as 1. Then downconvert thread meet BUG in ocfs2_unlock_ast function. To avoid meet the second BUG, function dlmunlock_common shuold return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled. Signed-off-by: Jian Wang Reviewed-by: Yiwen Jiang Signed-off-by: Jian Wang<wangjian161@huawei.com> Reviewed-by: Yiwen Jiang<jiangyiwen@huawei.com> Signed-off-by: Jian Wang<wangjian161@huawei.com> Reviewed-by: Yiwen Jiang<jiangyiwen@huawei.com> --- fs/ocfs2/dlm/dlmrecovery.c | 1 - fs/ocfs2/dlm/dlmunlock.c | 5 +++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 802636d..7489652 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, * if this had completed successfully * before sending this lock state to the * new master */ - BUG_ON(i != DLM_CONVERTING_LIST); mlog(0, "node died with cancel pending " "on %.*s. move back to granted list.\n", res->lockname.len, res->lockname.name); diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c index 63d701c..505bb6c 100644 --- a/fs/ocfs2/dlm/dlmunlock.c +++ b/fs/ocfs2/dlm/dlmunlock.c @@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, flags, owner); spin_lock(&res->spinlock); spin_lock(&lock->spinlock); + + if ((flags & LKM_CANCEL) && + dlm_lock_on_list(&res->granted, lock)) + status = DLM_CANCELGRANT; + /* if the master told us the lock was already granted, * let the ast handle all of these actions */ if (status == DLM_CANCELGRANT) {