From patchwork Thu Nov 17 06:03:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Changwei Ge X-Patchwork-Id: 9433549 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6542660471 for ; Thu, 17 Nov 2016 06:05:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CECC2924C for ; Thu, 17 Nov 2016 06:05:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3FA5F29278; Thu, 17 Nov 2016 06:05:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2E95E2924C for ; Thu, 17 Nov 2016 06:05:32 +0000 (UTC) Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id uAH64mcw021095 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Nov 2016 06:04:49 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id uAH64jAt015551 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Nov 2016 06:04:46 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1c7FoT-0003rs-75; Wed, 16 Nov 2016 22:04:45 -0800 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1c7Fo0-0003qz-KQ for ocfs2-devel@oss.oracle.com; Wed, 16 Nov 2016 22:04:16 -0800 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id uAH64FQU014305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Thu, 17 Nov 2016 06:04:16 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id uAH64Efm015233 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 17 Nov 2016 06:04:15 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.17/8.16.0.17) with SMTP id uAH62NO8001002 for ; Thu, 17 Nov 2016 06:04:14 GMT Authentication-Results: oracle.com; spf=none smtp.mailfrom=ge.changwei@h3c.com Received: from h3cmg01-ex.h3c.com (smtp.h3c.com [60.191.123.56]) by userp2030.oracle.com with ESMTP id 26rsn000xj-1 for ; Thu, 17 Nov 2016 06:04:14 +0000 Received: from BJHUB01-EX.srv.huawei-3com.com (unknown [10.63.20.169]) by h3cmg01-ex.h3c.com with smtp id 7b0a_06f0_e398e80d_751b_4126_b97d_b87adc40ae6a; Thu, 17 Nov 2016 14:03:45 +0800 Received: from H3CMLB12-EX.srv.huawei-3com.com ([fe80::10fe:abde:731b:fdde]) by BJHUB01-EX.srv.huawei-3com.com ([::1]) with mapi id 14.03.0248.002; Thu, 17 Nov 2016 14:03:29 +0800 From: Gechangwei To: "akpm@linux-foundation.org" Thread-Topic: [PATCH] ocfs2/dlm: fix umount hang Thread-Index: AdJAhHS8m4ejpSTPSoeEKTl8EkrCMw== Date: Thu, 17 Nov 2016 06:03:28 +0000 Message-ID: <63ADC13FD55D6546B7DECE290D39E37342E59EFA@H3CMLB12-EX.srv.huawei-3com.com> Accept-Language: en-US, zh-CN Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.96.76.110] MIME-Version: 1.0 X-PDR: PASS X-ServerName: smtp.h3c.com X-Proofpoint-SPF-Result: None X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8351 signatures=670727 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611170112 Cc: "mfasheh@suse.com" , "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Virus-Scanned: ClamAV using ClamSMTP Hi During my recent test on OCFS2, an umount hang issue was found. Below clues can help us to analyze this issue. From the debug information, we can see some abnormal stats like only node 1 is in DLM domain map, however, node 3 - 9 are still in MLE's node map and vote map. The root cause of unchanging vote map I think is that HB events are detached too early! That caused no chance of transforming from BLOCK MLE into MASTER MLE. Thus NODE 1 can't master lock resource even other nodes are all dead. To fix this, I propose a patch. From 3163fa7024d96f8d6e6ec2b37ad44e2cc969abd9 Mon Sep 17 00:00:00 2001 From: gechangwei Date: Thu, 17 Nov 2016 14:00:45 +0800 Subject: [PATCH] fix umount hang Signed-off-by: gechangwei --- fs/ocfs2/dlm/dlmmaster.c | 2 -- 1 file changed, 2 deletions(-) -- 2.5.1.windows.1 root@HXY-CVK110:~# grep P000000000000000000000000000000 bbb Lockres: P000000000000000000000000000000 Owner: 255 State: 0x10 InProgress root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat dlm_state Domain: 7DA412FEB1374366B0F3C70025EB1437 Key: 0x8ff804a1 Protocol: 1.2 Thread Pid: 21679 Node: 1 State: JOINED Number of Joins: 1 Joining Node: 255 Domain Map: 1 Exit Domain Map: Live Map: 1 2 3 4 5 6 7 8 9 Lock Resources: 29 (116) MLEs: 1 (119) Blocking: 1 (4) Mastery: 0 (115) Migration: 0 (0) Lists: Dirty=Empty Purge=Empty PendingASTs=Empty PendingBASTs=Empty Purge Count: 0 Refs: 1 Dead Node: 255 Recovery Pid: 21680 Master: 255 State: INACTIVE Recovery Map: Recovery Node State: root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# ls dlm_state locking_state mle_state purge_list root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat mle_state Dumping MLEs for Domain: 7DA412FEB1374366B0F3C70025EB1437 P000000000000000000000000000000 BLK mas=255 new=255 evt=0 use=1 ref= 2 Maybe= Vote=3 4 5 6 7 8 9 Response= Node=3 4 5 6 7 8 9 ------------------------------------------------------------------------------------------------------------------------------------- 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 邮件! This e-mail and its attachments contain confidential information from H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 6ea06f8..3c46882 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -3354,8 +3354,6 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm, spin_unlock(&mle->spinlock); wake_up(&mle->wq); - /* Do not need events any longer, so detach from heartbeat */ - __dlm_mle_detach_hb_events(dlm, mle); __dlm_put_mle(mle); } }