From patchwork Tue Oct 6 06:00:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 7333241 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6F39A9F1D5 for ; Tue, 6 Oct 2015 06:03:54 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 862D120664 for ; Tue, 6 Oct 2015 06:03:53 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DDDFC20660 for ; Tue, 6 Oct 2015 06:03:51 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t9663UfB012868 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 6 Oct 2015 06:03:32 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t9663RIY029151 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 6 Oct 2015 06:03:27 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZjLLT-0004VA-PN; Mon, 05 Oct 2015 23:03:27 -0700 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZjLL3-0004TO-Uw for ocfs2-devel@oss.oracle.com; Mon, 05 Oct 2015 23:03:02 -0700 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t966301o027165 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 6 Oct 2015 06:03:01 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t9662x2E032187 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 6 Oct 2015 06:03:00 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.15.0.59/8.15.0.59) with SMTP id t9660kCf036310 for ; Tue, 6 Oct 2015 06:02:59 GMT Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [119.145.14.66]) by userp2040.oracle.com with ESMTP id 1xbv3dfxyc-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 06 Oct 2015 06:02:59 +0000 Received: from 172.24.1.51 (EHLO SZXEML423-HUB.china.huawei.com) ([172.24.1.51]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BOJ89097; Tue, 06 Oct 2015 14:02:42 +0800 (CST) Received: from [127.0.0.1] (10.177.22.101) by SZXEML423-HUB.china.huawei.com (10.82.67.154) with Microsoft SMTP Server id 14.3.235.1; Tue, 6 Oct 2015 14:02:39 +0800 Message-ID: <56136398.1050001@huawei.com> Date: Tue, 6 Oct 2015 14:00:56 +0800 From: Joseph Qi User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.177.22.101] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.56136405.00FE, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 9a4f2ffa1a7833520c298a6951d3e149 X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:119.145.14.64/30 ip4:58.251.152.64/30 ip4:119.145.14.93 ip4:58.251.152.93 ip4:206.16.17.74 ip4:194.213.3.16 ip4:194.213.3.17 ip4:206.16.17.72 ~all X-ServerName: szxga03-in.huawei.com X-Proofpoint-Virus-Version: vendor=nai engine=5700 definitions=7945 signatures=670638 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1507310000 definitions=main-1510060105 Cc: Mark Fasheh , "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH] ocfs2: fix race between mount and delete node/cluster X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is a race case between mount and delete node/cluster, which will lead o2hb_thread to malfunctioning dead loop. o2hb_thread { o2nm_depend_this_node(); <<<<<< race window, node may have already been deleted, and then enter the loop, o2hb thread will be malfunctioning because of no configured nodes found. while (!kthread_should_stop() && !reg->hr_unclean_stop && !reg->hr_aborted_start) { } So check the return value of o2nm_depend_this_node() is needed. If node has been deleted, do not enter the loop and let mount fail. Signed-off-by: Joseph Qi --- fs/ocfs2/cluster/heartbeat.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c index 16eff45..a224cf1 100644 --- a/fs/ocfs2/cluster/heartbeat.c +++ b/fs/ocfs2/cluster/heartbeat.c @@ -219,7 +219,8 @@ struct o2hb_region { unsigned hr_unclean_stop:1, hr_aborted_start:1, hr_item_pinned:1, - hr_item_dropped:1; + hr_item_dropped:1, + hr_node_deleted:1; /* protected by the hr_callback_sem */ struct task_struct *hr_task; @@ -1110,7 +1111,13 @@ static int o2hb_thread(void *data) set_user_nice(current, MIN_NICE); /* Pin node */ - o2nm_depend_this_node(); + ret = o2nm_depend_this_node(); + if (ret) { + mlog(ML_ERROR, "Node has been deleted, ret = %d\n", ret); + reg->hr_node_deleted = 1; + wake_up(&o2hb_steady_queue); + return 0; + } while (!kthread_should_stop() && !reg->hr_unclean_stop && !reg->hr_aborted_start) { @@ -1829,7 +1836,8 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg, spin_unlock(&o2hb_live_lock); ret = wait_event_interruptible(o2hb_steady_queue, - atomic_read(®->hr_steady_iterations) == 0); + atomic_read(®->hr_steady_iterations) == 0 || + reg->hr_node_deleted); if (ret) { atomic_set(®->hr_steady_iterations, 0); reg->hr_aborted_start = 1; @@ -1840,6 +1848,11 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region *reg, goto out3; } + if (reg->hr_node_deleted) { + ret = -EINVAL; + goto out3; + } + /* Ok, we were woken. Make sure it wasn't by drop_item() */ spin_lock(&o2hb_live_lock); hb_task = reg->hr_task;