From patchwork Mon May 23 21:50:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 9132395 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7121360761 for ; Mon, 23 May 2016 21:51:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 650DB28238 for ; Mon, 23 May 2016 21:51:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5817028252; Mon, 23 May 2016 21:51:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9272928238 for ; Mon, 23 May 2016 21:51:30 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u4NLpLQV006357 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 23 May 2016 21:51:21 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u4NLpKK9010446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 23 May 2016 21:51:20 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1b4xku-0002AZ-9l; Mon, 23 May 2016 14:51:20 -0700 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1b4xkG-0001uE-4r for ocfs2-devel@oss.oracle.com; Mon, 23 May 2016 14:50:40 -0700 Received: from userp1030.oracle.com (userp1030.oracle.com [156.151.31.80]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u4NLodWj008797 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 23 May 2016 21:50:39 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by userp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u4NLodas017966 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Mon, 23 May 2016 21:50:39 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.11/8.16.0.11) with SMTP id u4NLlRQF036124 for ; Mon, 23 May 2016 21:50:39 GMT Authentication-Results: oracle.com; spf=pass smtp.mail=akpm@linux-foundation.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2030.oracle.com with ESMTP id 232h0v44wb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 23 May 2016 21:50:39 +0000 Received: from akpm3.mtv.corp.google.com (unknown [104.132.1.65]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id D427F94C; Mon, 23 May 2016 21:50:37 +0000 (UTC) Date: Mon, 23 May 2016 14:50:37 -0700 From: akpm@linux-foundation.org To: ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, junxiao.bi@oracle.com, ghe@suse.com, jlbec@evilplan.org, joseph.qi@huawei.com, mfasheh@suse.de, rwxybh@126.com, ryan.ding@oracle.com Message-ID: <57437b2d.8Lcm4Mnbx9fE7SRW%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 X-ServerName: mail.linuxfoundation.org X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:140.211.169.12/30 include:_spf.google.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8174 signatures=670720 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1605230260 Subject: [Ocfs2-devel] [patch 4/6] ocfs2: o2hb: add some user/debug log X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Virus-Scanned: ClamAV using ClamSMTP From: Junxiao Bi Subject: ocfs2: o2hb: add some user/debug log Signed-off-by: Junxiao Bi Reviewed-by: Ryan Ding Cc: Gang He Cc: rwxybh Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Signed-off-by: Andrew Morton --- fs/ocfs2/cluster/heartbeat.c | 39 +++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 7 deletions(-) diff -puN fs/ocfs2/cluster/heartbeat.c~ocfs2-o2hb-add-some-user-debug-log fs/ocfs2/cluster/heartbeat.c --- a/fs/ocfs2/cluster/heartbeat.c~ocfs2-o2hb-add-some-user-debug-log +++ a/fs/ocfs2/cluster/heartbeat.c @@ -292,6 +292,8 @@ struct o2hb_bio_wait_ctxt { int wc_error; }; +#define O2HB_NEGO_TIMEOUT_MS (O2HB_MAX_WRITE_TIMEOUT_MS/2) + enum { O2HB_NEGO_TIMEOUT_MSG = 1, O2HB_NEGO_APPROVE_MSG = 2, @@ -358,7 +360,7 @@ static void o2hb_arm_timeout(struct o2hb cancel_delayed_work(®->hr_nego_timeout_work); /* negotiate timeout must be less than write timeout. */ schedule_delayed_work(®->hr_nego_timeout_work, - msecs_to_jiffies(O2HB_MAX_WRITE_TIMEOUT_MS)/2); + msecs_to_jiffies(O2HB_NEGO_TIMEOUT_MS)); memset(reg->hr_nego_node_bitmap, 0, sizeof(reg->hr_nego_node_bitmap)); } @@ -389,7 +391,7 @@ again: static void o2hb_nego_timeout(struct work_struct *work) { unsigned long live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)]; - int master_node, i; + int master_node, i, ret; struct o2hb_region *reg; reg = container_of(work, struct o2hb_region, hr_nego_timeout_work.work); @@ -398,7 +400,12 @@ static void o2hb_nego_timeout(struct wor master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0); if (master_node == o2nm_this_node()) { - set_bit(master_node, reg->hr_nego_node_bitmap); + if (!test_bit(master_node, reg->hr_nego_node_bitmap)) { + printk(KERN_NOTICE "o2hb: node %d hb write hung for %ds on region %s (%s).\n", + o2nm_this_node(), O2HB_NEGO_TIMEOUT_MS/1000, + config_item_name(®->hr_item), reg->hr_dev_name); + set_bit(master_node, reg->hr_nego_node_bitmap); + } if (memcmp(reg->hr_nego_node_bitmap, live_node_bitmap, sizeof(reg->hr_nego_node_bitmap))) { /* check negotiate bitmap every second to do timeout @@ -410,6 +417,8 @@ static void o2hb_nego_timeout(struct wor return; } + printk(KERN_NOTICE "o2hb: all nodes hb write hung, maybe region %s (%s) is down.\n", + config_item_name(®->hr_item), reg->hr_dev_name); /* approve negotiate timeout request. */ o2hb_arm_timeout(reg); @@ -419,13 +428,23 @@ static void o2hb_nego_timeout(struct wor if (i == master_node) continue; - o2hb_send_nego_msg(reg->hr_key, + mlog(ML_HEARTBEAT, "send NEGO_APPROVE msg to node %d\n", i); + ret = o2hb_send_nego_msg(reg->hr_key, O2HB_NEGO_APPROVE_MSG, i); + if (ret) + mlog(ML_ERROR, "send NEGO_APPROVE msg to node %d fail %d\n", + i, ret); } } else { /* negotiate timeout with master node. */ - o2hb_send_nego_msg(reg->hr_key, O2HB_NEGO_TIMEOUT_MSG, - master_node); + printk(KERN_NOTICE "o2hb: node %d hb write hung for %ds on region %s (%s), negotiate timeout with node %d.\n", + o2nm_this_node(), O2HB_NEGO_TIMEOUT_MS/1000, config_item_name(®->hr_item), + reg->hr_dev_name, master_node); + ret = o2hb_send_nego_msg(reg->hr_key, O2HB_NEGO_TIMEOUT_MSG, + master_node); + if (ret) + mlog(ML_ERROR, "send NEGO_TIMEOUT msg to node %d fail %d\n", + master_node, ret); } } @@ -436,6 +455,8 @@ static int o2hb_nego_timeout_handler(str struct o2hb_nego_msg *nego_msg; nego_msg = (struct o2hb_nego_msg *)msg->buf; + printk(KERN_NOTICE "o2hb: receive negotiate timeout message from node %d on region %s (%s).\n", + nego_msg->node_num, config_item_name(®->hr_item), reg->hr_dev_name); if (nego_msg->node_num < O2NM_MAX_NODES) set_bit(nego_msg->node_num, reg->hr_nego_node_bitmap); else @@ -447,7 +468,11 @@ static int o2hb_nego_timeout_handler(str static int o2hb_nego_approve_handler(struct o2net_msg *msg, u32 len, void *data, void **ret_data) { - o2hb_arm_timeout(data); + struct o2hb_region *reg = data; + + printk(KERN_NOTICE "o2hb: negotiate timeout approved by master node on region %s (%s).\n", + config_item_name(®->hr_item), reg->hr_dev_name); + o2hb_arm_timeout(reg); return 0; }