From patchwork Thu Sep 17 13:17:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 7207271 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E7E2F9F336 for ; Thu, 17 Sep 2015 13:18:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0AB3420852 for ; Thu, 17 Sep 2015 13:18:41 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5E6022084B for ; Thu, 17 Sep 2015 13:18:39 +0000 (UTC) Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t8HDIF5P029570 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 17 Sep 2015 13:18:17 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t8HDIDSl014939 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Sep 2015 13:18:14 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZcZ4n-0006s7-O9; Thu, 17 Sep 2015 06:18:13 -0700 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZcZ4O-0006qc-RZ for ocfs2-devel@oss.oracle.com; Thu, 17 Sep 2015 06:17:49 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t8HDHlE3013110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 17 Sep 2015 13:17:47 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t8HDHlju011146 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 17 Sep 2015 13:17:47 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.15.0.59/8.15.0.59) with SMTP id t8HDETKN030866 for ; Thu, 17 Sep 2015 13:17:47 GMT Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) by userp2040.oracle.com with ESMTP id 1wvap8njk6-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 17 Sep 2015 13:17:46 +0000 Received: from 172.24.1.49 (EHLO szxeml426-hub.china.huawei.com) ([172.24.1.49]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CVH44630; Thu, 17 Sep 2015 21:17:32 +0800 (CST) Received: from [127.0.0.1] (10.177.22.101) by szxeml426-hub.china.huawei.com (10.82.67.181) with Microsoft SMTP Server id 14.3.235.1; Thu, 17 Sep 2015 21:17:30 +0800 Message-ID: <55FABD65.4010107@huawei.com> Date: Thu, 17 Sep 2015 21:17:25 +0800 From: Joseph Qi User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.177.22.101] X-CFilter-Loop: Reflected X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:119.145.14.64/30 ip4:58.251.152.64/30 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.74 ip4:194.213.3.16 ip4:194.213.3.17 ip4:206.16.17.72 ~all X-ServerName: szxga01-in.huawei.com X-Proofpoint-Virus-Version: vendor=nai engine=5700 definitions=7926 signatures=670636 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1507310000 definitions=main-1509170206 Cc: Mark Fasheh , "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix race between convert and recovery X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0022.oracle.com [156.151.31.74] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is a race window between dlmconvert_remote and dlm_move_lockres_to_recovery_list, which will cause a lock with OCFS2_LOCK_BUSY in grant list, thus system hangs. dlmconvert_remote { spin_lock(&res->spinlock); list_move_tail(&lock->list, &res->converting); lock->convert_pending = 1; spin_unlock(&res->spinlock); status = dlm_send_remote_convert_request(); >>>>>> race window, master has queued ast and return DLM_NORMAL, and then down before sending ast. this node detects master down and call dlm_move_lockres_to_recovery_list, which will revert the lock to grant list. Then OCFS2_LOCK_BUSY won't be cleared as new master won't send ast any more because it thinks already be authorized. spin_lock(&res->spinlock); lock->convert_pending = 0; if (status != DLM_NORMAL) dlm_revert_pending_convert(res, lock); spin_unlock(&res->spinlock); } In this case, just leave it in convert list and new master will take care of it after recovery. And if convert request returns other than DLM_NORMAL, convert thread will do the revert itself. So remove the revert logic in dlm_move_lockres_to_recovery_list. Reported-by: Yiwen Jiang Signed-off-by: Joseph Qi Cc: --- fs/ocfs2/dlm/dlmrecovery.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index ce12e0b..d797d49 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2058,15 +2058,7 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, queue = dlm_list_idx_to_ptr(res, i); list_for_each_entry_safe(lock, next, queue, list) { dlm_lock_get(lock); - if (lock->convert_pending) { - /* move converting lock back to granted */ - BUG_ON(i != DLM_CONVERTING_LIST); - mlog(0, "node died with convert pending " - "on %.*s. move back to granted list.\n", - res->lockname.len, res->lockname.name); - dlm_revert_pending_convert(res, lock); - lock->convert_pending = 0; - } else if (lock->lock_pending) { + if (lock->lock_pending) { /* remove pending lock requests completely */ BUG_ON(i != DLM_BLOCKED_LIST); mlog(0, "node died with lock pending "