From patchwork Mon Mar 5 03:44:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: piaojun X-Patchwork-Id: 10258073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EA91960211 for ; Mon, 5 Mar 2018 03:47:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E851828879 for ; Mon, 5 Mar 2018 03:47:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DCDF528880; Mon, 5 Mar 2018 03:47:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00 autolearn=ham version=3.3.1 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) (using TLSv1.2 with cipher AES256-SHA256 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2DF5B28879 for ; Mon, 5 Mar 2018 03:47:15 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w253fxo3110994; Mon, 5 Mar 2018 03:45:03 GMT Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2ggwwbg1r6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 05 Mar 2018 03:45:03 +0000 Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w253j2Q0024691 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 5 Mar 2018 03:45:02 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1esh3e-0005fw-17; Sun, 04 Mar 2018 19:45:02 -0800 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1esh3d-0005ff-1E for ocfs2-devel@oss.oracle.com; Sun, 04 Mar 2018 19:45:01 -0800 Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w253j0n5026415 (version=TLSv1/SSLv3 cipher=AES256-SHA256 bits=256 verify=FAIL) for ; Mon, 5 Mar 2018 03:45:00 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w253g6p1014107 for ; Mon, 5 Mar 2018 03:45:00 GMT Received: from huawei.com (szxga05-in.huawei.com [45.249.212.191]) by userp2030.oracle.com with ESMTP id 2ggus7skur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 05 Mar 2018 03:45:00 +0000 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id AD2D5333019B; Mon, 5 Mar 2018 11:44:41 +0800 (CST) Received: from [10.177.253.249] (10.177.253.249) by smtp.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.361.1; Mon, 5 Mar 2018 11:44:36 +0800 To: "akpm@linux-foundation.org" , Mark Fasheh , Joel Becker , Junxiao Bi , Joseph Qi From: piaojun Message-ID: <5A9CBD19.5020107@huawei.com> Date: Mon, 5 Mar 2018 11:44:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 X-Originating-IP: [10.177.253.249] X-CFilter-Loop: Reflected X-CLX-Shades: MLX X-CLX-Response: 1TFkXGxMaEQpMehcdEhEKWU0XZ2ZyEQpZSRcacRoQGncGGx0acR4TEBp3Bhg aBhoRClleF2hueREKSUYXRVhLSUZPdVpYRU5fSV5DRUQZdU9LEQpDThdIHxpBHGV4ZmxEGhlgf2 FdB21DE3h7YHtLGENpTEQTGREKWFwXHwQaBBsYHAdLSxgYHx5PGQUbGgQbGhoEHhIEGxAbHhofG hEKXlkXeBpHYnIRCk1cFx4TGREKTFoXaGlCTXsRCk1OF2gRCkNaFx4fBBgeEwQYGxgEGxMbEQpC XhcbEQpEXhccEQpESRceEQpCRhdsQlhpfEJGcHkaHxEKQlwXGhEKQkUXZlxse3BkYnoSfEMRCkJ OF2xCSFlTGk1leHgdEQpCTBdvSxkSEkRZeVsbXxEKQmwXYwVCUmZAYl5ae1IRCkJAF2dkcGlrc2 tAbmxdEQpCWBdifW95AU8YGXBwexEKWlgXGxEKcGcXZxxYfEtLfmd6e1AQGRoRCnBoF2V8ZUFQX 3Bdf2hAEBkaEQpwaBdiR1pOEkRrc0cbHRAZGhEKcGgXYVB/HlsBYhpsfEgQHhIRCnBoF2FgWFgY RlBgaE1+EBkaEQpwaBdtXUJmeRMZTHBZUBAZGhEKcGwXYUl5Q3pzSXhnbGIQGRoRCm1+FxoRClh NF0sRIA== X-PDR: PASS X-Source-IP: 45.249.212.191 X-ServerName: szxga05-in.huawei.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:45.249.212.32 ip4:45.249.212.35 ip4:119.145.14.93 ip4:58.251.152.93 ip4:194.213.3.17 ip4:206.16.17.72 ip4:45.249.212.255 ip4:45.249.212.187/29 ip4:45.249.212.191 ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8822 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=78 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=190 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=404 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803050043 X-Spam: Clean Cc: "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH v3] ocfs2/dlm: don't handle migrate lockres if already in shutdown X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8822 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=823 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803050043 X-Virus-Scanned: ClamAV using ClamSMTP We should not handle migrate lockres if we are already in 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving dlm domain. At last other nodes will get stuck into infinite loop when requsting lock from us. The problem is caused by concurrency umount between nodes. Before receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the migrate target. So N2 will continue sending lockres to N1 even though N1 has left domain. N1 N2 (owner) touch file access the file, and get pr lock begin leave domain and pick up N1 as new owner begin leave domain and migrate all lockres done begin migrate lockres to N1 end leave domain, but the lockres left unexpectedly, because migrate task has passed Signed-off-by: Jun Piao Reviewed-by: Yiwen Jiang Reviewed-by: Joseph Qi Reviewed-by: Changwei Ge --- fs/ocfs2/dlm/dlmdomain.c | 14 -------------- fs/ocfs2/dlm/dlmdomain.h | 25 ++++++++++++++++++++++++- fs/ocfs2/dlm/dlmrecovery.c | 9 +++++++++ 3 files changed, 33 insertions(+), 15 deletions(-) diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index e1fea14..25b76f0 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -675,20 +675,6 @@ static void dlm_leave_domain(struct dlm_ctxt *dlm) spin_unlock(&dlm->spinlock); } -int dlm_shutting_down(struct dlm_ctxt *dlm) -{ - int ret = 0; - - spin_lock(&dlm_domain_lock); - - if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) - ret = 1; - - spin_unlock(&dlm_domain_lock); - - return ret; -} - void dlm_unregister_domain(struct dlm_ctxt *dlm) { int leave = 0; diff --git a/fs/ocfs2/dlm/dlmdomain.h b/fs/ocfs2/dlm/dlmdomain.h index fd6122a..8a92814 100644 --- a/fs/ocfs2/dlm/dlmdomain.h +++ b/fs/ocfs2/dlm/dlmdomain.h @@ -28,7 +28,30 @@ extern spinlock_t dlm_domain_lock; extern struct list_head dlm_domains; -int dlm_shutting_down(struct dlm_ctxt *dlm); +static inline int dlm_joined(struct dlm_ctxt *dlm) +{ + int ret = 0; + + spin_lock(&dlm_domain_lock); + if (dlm->dlm_state == DLM_CTXT_JOINED) + ret = 1; + spin_unlock(&dlm_domain_lock); + + return ret; +} + +static inline int dlm_shutting_down(struct dlm_ctxt *dlm) +{ + int ret = 0; + + spin_lock(&dlm_domain_lock); + if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) + ret = 1; + spin_unlock(&dlm_domain_lock); + + return ret; +} + void dlm_fire_domain_eviction_callbacks(struct dlm_ctxt *dlm, int node_num); diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index ec8f758..505ab42 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1378,6 +1378,15 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data, if (!dlm_grab(dlm)) return -EINVAL; + if (!dlm_joined(dlm)) { + mlog(ML_ERROR, "Domain %s not joined! " + "lockres %.*s, master %u\n", + dlm->name, mres->lockname_len, + mres->lockname, mres->master); + dlm_put(dlm); + return -EINVAL; + } + BUG_ON(!(mres->flags & (DLM_MRES_RECOVERY|DLM_MRES_MIGRATION))); real_master = mres->master;