From patchwork Wed Mar 2 07:56:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junxiao Bi X-Patchwork-Id: 8479021 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7AB599F2F0 for ; Wed, 2 Mar 2016 07:59:45 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8193F2038A for ; Wed, 2 Mar 2016 07:59:44 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DFAA12010E for ; Wed, 2 Mar 2016 07:59:42 +0000 (UTC) Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u227x7VU010322 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 2 Mar 2016 07:59:07 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u227x6gh018411 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 2 Mar 2016 07:59:07 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ab1gY-0004jX-M2; Tue, 01 Mar 2016 23:59:06 -0800 Received: from userv0022.oracle.com ([156.151.31.74]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ab1fb-0004g8-GE for ocfs2-devel@oss.oracle.com; Tue, 01 Mar 2016 23:58:07 -0800 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u227w6g8004871 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 2 Mar 2016 07:58:07 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u227w6ur007488; Wed, 2 Mar 2016 07:58:06 GMT Received: from bijx-OptiPlex-780.cn.oracle.com (/10.182.32.177) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 01 Mar 2016 23:58:06 -0800 From: Junxiao Bi To: ocfs2-devel@oss.oracle.com Date: Wed, 2 Mar 2016 15:56:11 +0800 Message-Id: <1456905372-22313-6-git-send-email-junxiao.bi@oracle.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1456905372-22313-1-git-send-email-junxiao.bi@oracle.com> References: <1456905372-22313-1-git-send-email-junxiao.bi@oracle.com> Cc: mfasheh@suse.com Subject: [Ocfs2-devel] [PATCH V2 5/6] ocfs2: o2hb: don't negotiate if last hb fail X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sometimes io error is returned when storage is down for a while. Like for iscsi device, stroage is made offline when session timeout, and this will make all io return -EIO. For this case, nodes shouldn't do negotiate timeout but should fence self. So let nodes fence self when o2hb_do_disk_heartbeat return an error, this is the same behavior with o2hb without negotiate timer. Signed-off-by: Junxiao Bi Reviewed-by: Ryan Ding Cc: Gang He Cc: rwxybh Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Signed-off-by: Andrew Morton --- fs/ocfs2/cluster/heartbeat.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c index 9f4a02ed85fd..c040fc3dd605 100644 --- a/fs/ocfs2/cluster/heartbeat.c +++ b/fs/ocfs2/cluster/heartbeat.c @@ -284,6 +284,9 @@ struct o2hb_region { /* Message key for negotiate timeout message. */ unsigned int hr_key; struct list_head hr_handler_list; + + /* last hb status, 0 for success, other value for error. */ + int hr_last_hb_status; }; struct o2hb_bio_wait_ctxt { @@ -396,6 +399,12 @@ static void o2hb_nego_timeout(struct work_struct *work) struct o2hb_region *reg; reg = container_of(work, struct o2hb_region, hr_nego_timeout_work.work); + /* don't negotiate timeout if last hb failed since it is very + * possible io failed. Should let write timeout fence self. + */ + if (reg->hr_last_hb_status) + return; + o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap)); /* lowest node as master node to make negotiate decision. */ master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0); @@ -1229,6 +1238,7 @@ static int o2hb_thread(void *data) before_hb = ktime_get_real(); ret = o2hb_do_disk_heartbeat(reg); + reg->hr_last_hb_status = ret; after_hb = ktime_get_real();