From patchwork Tue Sep 6 01:01:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 9315383 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EE11C60760 for ; Tue, 6 Sep 2016 01:03:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7B792893B for ; Tue, 6 Sep 2016 01:03:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CAC5528945; Tue, 6 Sep 2016 01:03:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A2F972893B for ; Tue, 6 Sep 2016 01:03:15 +0000 (UTC) Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8612w4I027701 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 6 Sep 2016 01:03:00 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8612jCe011207 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 6 Sep 2016 01:02:45 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bh4mj-0007E4-20; Mon, 05 Sep 2016 18:02:45 -0700 Received: from userv0022.oracle.com ([156.151.31.74]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bh4lz-0007BS-BP for ocfs2-devel@oss.oracle.com; Mon, 05 Sep 2016 18:01:59 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u8611wAb020336 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 6 Sep 2016 01:01:58 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8611vG8014235 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 6 Sep 2016 01:01:57 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.17/8.16.0.17) with SMTP id u860w2Xf018407 for ; Tue, 6 Sep 2016 01:01:57 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=joseph.qi@huawei.com Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by userp2040.oracle.com with ESMTP id 2588dr066f-1 (version=TLSv1 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 06 Sep 2016 01:01:56 +0000 Received: from 172.24.1.47 (EHLO SZXEML429-HUB.china.huawei.com) ([172.24.1.47]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DMR53816; Tue, 06 Sep 2016 09:01:47 +0800 (CST) Received: from [127.0.0.1] (10.177.22.101) by SZXEML429-HUB.china.huawei.com (10.82.67.184) with Microsoft SMTP Server id 14.3.235.1; Tue, 6 Sep 2016 09:01:36 +0800 Message-ID: <57CE1569.8010704@huawei.com> Date: Tue, 6 Sep 2016 09:01:29 +0800 From: Joseph Qi User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.177.22.101] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.57CE157C.0143, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 4c9bd2485cb3a9324e4281f7e1292e9e X-ServerName: szxga02-in.huawei.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:119.145.14.64/30 ip4:58.251.152.64/30 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.74 ip4:194.213.3.16 ip4:194.213.3.17 ip4:206.16.17.72 ip4:119.145.14.199 ip4:58.251.152.179 ip4:119.145.14.52 ip4:58.251.152.52 ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8279 signatures=670665 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1609060013 Cc: Mark Fasheh , "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix race between convert and migration X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Virus-Scanned: ClamAV using ClamSMTP Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") checks if lockres master has changed to identify whether new master has finished recovery or not. This will introduce a race that right after old master does umount ( means master will change), a new convert request comes. In this case, it will reset lockres state to DLM_RECOVERING and then retry convert, and then fail with lockres->l_action being set to OCFS2_AST_INVALID, which will cause inconsistent lock level between ocfs2 and dlm, and then finally BUG. Since dlm recovery will clear lock->convert_pending in dlm_move_lockres_to_recovery_list, we can use it to correctly identify the race case between convert and recovery. So fix it. Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Signed-off-by: Joseph Qi Signed-off-by: Jun Piao Cc: --- fs/ocfs2/dlm/dlmconvert.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c index cdeafb4..0bb1286 100644 --- a/fs/ocfs2/dlm/dlmconvert.c +++ b/fs/ocfs2/dlm/dlmconvert.c @@ -268,7 +268,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, struct dlm_lock *lock, int flags, int type) { enum dlm_status status; - u8 old_owner = res->owner; mlog(0, "type=%d, convert_type=%d, busy=%d\n", lock->ml.type, lock->ml.convert_type, res->state & DLM_LOCK_RES_IN_PROGRESS); @@ -335,7 +334,6 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, spin_lock(&res->spinlock); res->state &= ~DLM_LOCK_RES_IN_PROGRESS; - lock->convert_pending = 0; /* if it failed, move it back to granted queue. * if master returns DLM_NORMAL and then down before sending ast, * it may have already been moved to granted queue, reset to @@ -344,12 +342,14 @@ enum dlm_status dlmconvert_remote(struct dlm_ctxt *dlm, if (status != DLM_NOTQUEUED) dlm_error(status); dlm_revert_pending_convert(res, lock); - } else if ((res->state & DLM_LOCK_RES_RECOVERING) || - (old_owner != res->owner)) { - mlog(0, "res %.*s is in recovering or has been recovered.\n", - res->lockname.len, res->lockname.name); + } else if (!lock->convert_pending) { + mlog(0, "%s: res %.*s, owner died and lock has been moved back " + "to granted list, retry convert.\n", + dlm->name, res->lockname.len, res->lockname.name); status = DLM_RECOVERING; } + + lock->convert_pending = 0; bail: spin_unlock(&res->spinlock);