From patchwork Tue Jul 12 02:08:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: piaojun X-Patchwork-Id: 9224509 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 85C13604DB for ; Tue, 12 Jul 2016 02:09:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67D8727E66 for ; Tue, 12 Jul 2016 02:09:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5C8BF27F8F; Tue, 12 Jul 2016 02:09:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7384327E66 for ; Tue, 12 Jul 2016 02:09:51 +0000 (UTC) Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6C29Tri013750 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jul 2016 02:09:30 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u6C29Nln021002 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 12 Jul 2016 02:09:25 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bMn8V-0000yX-PF; Mon, 11 Jul 2016 19:09:23 -0700 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bMn7r-0000xT-Fq for ocfs2-devel@oss.oracle.com; Mon, 11 Jul 2016 19:08:43 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u6C28hBH020477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 12 Jul 2016 02:08:43 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6C28gux015311 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 12 Jul 2016 02:08:42 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.11/8.16.0.11) with SMTP id u6C22heb005426 for ; Tue, 12 Jul 2016 02:08:42 GMT Authentication-Results: oracle.com; spf=pass smtp.mail=piaojun@huawei.com Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by userp2040.oracle.com with ESMTP id 242qw5qrk4-1 (version=TLSv1 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 12 Jul 2016 02:08:40 +0000 Received: from 172.24.1.47 (EHLO szxeml428-hub.china.huawei.com) ([172.24.1.47]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DKB82135; Tue, 12 Jul 2016 10:08:13 +0800 (CST) Received: from [127.0.0.1] (10.177.234.95) by szxeml428-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server id 14.3.235.1; Tue, 12 Jul 2016 10:08:05 +0800 To: From: piaojun Message-ID: <57845103.3070406@huawei.com> Date: Tue, 12 Jul 2016 10:08:03 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 X-Originating-IP: [10.177.234.95] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.57845110.0008, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a77ed7f5ac2ed389addca74108001247 X-ServerName: szxga02-in.huawei.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:119.145.14.64/30 ip4:58.251.152.64/30 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.74 ip4:194.213.3.16 ip4:194.213.3.17 ip4:206.16.17.72 ip4:119.145.14.199 ip4:58.251.152.179 ip4:119.145.14.52 ip4:58.251.152.52 ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8223 signatures=670668 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=4 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607120019 Cc: mfasheh@suse.com, ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH v2 2/3] ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0022.oracle.com [156.151.31.74] X-Virus-Scanned: ClamAV using ClamSMTP We found a BUG situation that lockres is migrated during deref described below. To solve the BUG, we could purge lockres directly when other node says I did not have a ref. Additionally, we'd better purge lockres if master goes down, as no one will response deref done. Node 1 Node 2(old master) Node3(new master) dlm_purge_lockres send deref to N2 leave domain migrate lockres to N3 finish migration send do assert master to N1 receive do assert msg form N3, but can not find lockres because DROPPING_REF is set, so the owner is still N2. receive deref from N1 and response -EINVAL because lockres is migrated BUG when receive -EINVAL in dlm_drop_lockres_ref Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now") Signed-off-by: Jun Piao Reviewed-by: Joseph Qi Reviewed-by: Jiufei Xue --- fs/ocfs2/dlm/dlmmaster.c | 9 ++++++--- fs/ocfs2/dlm/dlmthread.c | 13 +++++++++++-- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 9456217..311404f 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -2276,9 +2276,12 @@ int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) mlog(ML_ERROR, "%s: res %.*s, DEREF to node %u got %d\n", dlm->name, namelen, lockname, res->owner, r); dlm_print_one_lock_resource(res); - BUG(); - } - return ret ? ret : r; + if (r == -ENOMEM) + BUG(); + } else + ret = r; + + return ret; } int dlm_deref_lockres_handler(struct o2net_msg *msg, u32 len, void *data, diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index 68d239b..ce39722 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -175,6 +175,15 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm, res->lockname.len, res->lockname.name, master); if (!master) { + if (res->state & DLM_LOCK_RES_DROPPING_REF) { + mlog(ML_NOTICE, "%s: res %.*s already in " + "DLM_LOCK_RES_DROPPING_REF state\n", + dlm->name, res->lockname.len, + res->lockname.name); + spin_unlock(&res->spinlock); + return; + } + res->state |= DLM_LOCK_RES_DROPPING_REF; /* drop spinlock... retake below */ spin_unlock(&res->spinlock); @@ -203,8 +212,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm, dlm->purge_count--; } - if (!master && ret != 0) { - mlog(0, "%s: deref %.*s in progress or master goes down\n", + if (!master && ret == DLM_DEREF_RESPONSE_INPROG) { + mlog(0, "%s: deref %.*s in progress\n", dlm->name, res->lockname.len, res->lockname.name); spin_unlock(&res->spinlock); return;