From patchwork Thu Nov 30 22:24:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 10085725 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 47F336064E for ; Thu, 30 Nov 2017 22:25:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 386D12A1FB for ; Thu, 30 Nov 2017 22:25:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D5922A2DD; Thu, 30 Nov 2017 22:25:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 70F0C2A1FB for ; Thu, 30 Nov 2017 22:25:04 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id vAUMOpr3015809 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 30 Nov 2017 22:24:52 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vAUMOZRQ001471 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Nov 2017 22:24:35 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eKXFz-0008WD-La; Thu, 30 Nov 2017 14:24:35 -0800 Received: from userv0022.oracle.com ([156.151.31.74]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eKXFx-0008W2-Px for ocfs2-devel@oss.oracle.com; Thu, 30 Nov 2017 14:24:33 -0800 Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vAUMOWNe025290 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Thu, 30 Nov 2017 22:24:32 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vAUMN1VY003391 for ; Thu, 30 Nov 2017 22:24:32 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=akpm@linux-foundation.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2040.oracle.com with ESMTP id 2ejq2anhns-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 30 Nov 2017 22:24:32 +0000 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 182F2CD2; Thu, 30 Nov 2017 22:24:31 +0000 (UTC) Date: Thu, 30 Nov 2017 14:24:30 -0800 From: akpm@linux-foundation.org To: ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, zhang.yangB@h3c.com, jiangqi903@gmail.com, jlbec@evilplan.org, junxiao.bi@oracle.com, mfasheh@versity.com Message-ID: <5a20851e.upbuwraIKYxYTZhm%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 X-CLX-Shades: MLX X-CLX-Response: 1TFkXGBIZEQpMehcaEQpZTRdnZnIRCllJFxpxGhAadwYYHBlxHhgYEBp3Bhg aBhoRClleF2hjeREKSUYXRVhLSUZPdVpYRU5fSV5DRUQZdU9LEQpDThd9RVBcX1N5dW1OTX5PR0 ZoWB1Hc3heaxlOU0NNHRpMQREKWFwXHwQaBBsbEwcbSBpOGE5LTwUbGgQbGhoEHhIEGxAbHhofG hEKXlkXeHxDRx0RCk1cFxkSHBEKTFoXaGlNTV0RCkNaFxseGgQYGxsEGxwTBBsYEQpCXhcbEQpE XhcdEQpCRhdhWkBZARMZZV9BbREKQlwXGhEKQkUXa3x/YEUYbHNlGmQRCkJOF29vZWIccgVmZl5 hEQpCTBdrfH9gRRhsc2UaZBEKQmwXa3x/YEUYbHNlGmQRCkJAF2trbk4FGU9PfUZ+EQpCWBdifW 95AU8YGXBwexEKWlgXGxEKcGcXbh5vX118bEJHU3IQGhEKcGgXYnJ9W0BkWUUYGxgQGhEKcGgXY m1tf2FdHhIBWnsQGhEKcGgXax1nZ317RxN8TVgQGhEKcGgXaURgZ30bZnpuTx8QGhEKcGgXZW1L bB8bTl9AfVoQGhEKcGcXZ0YbHV4faX8dUnIQGhEKcGcXZxxYfEtLfmd6e1AQGhEKcEMXbF1gQn1 Dc15uZx4QHhIRCm1+FxoRClhNF0sRIA== MIME-Version: 1.0 X-PDR: PASS X-ServerName: mail.linuxfoundation.org X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 ip4:140.211.169.12/30 include:_spf.google.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8731 signatures=668635 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=0 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=283 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711300287 Subject: [Ocfs2-devel] [patch 07/11] ocfs2: fix qs_holds may could not be zero X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Virus-Scanned: ClamAV using ClamSMTP From: Zhangyang Subject: ocfs2: fix qs_holds may could not be zero In our test, We fond that when the network down, qs->qs_holds could not b= e reduce to zero, it will lead to the node can't do fence. o2net_idle_timer -> o2quo_conn_err -> qs->qs_holds++, after O2NET_QUORUM_DE= LAY_MS if qs_holds could be subtract to zero, it could do make_decision. But if there are many nodes, when one node network down which contains o2net connections may not do o2net_idle_timer at the same time. So when a o2net_node have done nn->nn_still_up, but the qs_holds is not zero. because the other o2net_node have not done nn->nn_still_up. So the first o2net_node will do o2net_idle_timer again, and the qs_holds could be add again. And the qs_holds is global variable, so it formed a loop, the node could not do o2quo_make_decision, because of qs_holds never be zero. I alter the function o2quo_conn_err, take o2quo_set_hold under control of t= he bit map qs_conn_bm. Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_7F50894FD17BEC45AAC26E5BADA6CE330C60F99A-40H3CMLB12-2DEX.srv.huawei-2D3com.com&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=CYujo6g1PiMEWNoljfzfkpq8GWBXbNNSftl3t-szE9s&s=9JBgEUTtHISAW_NA8cG1Vg9v_7vTHRok4N9hiTmUSHM&e= Signed-off-by: Yang Zhang Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Signed-off-by: Andrew Morton --- fs/ocfs2/cluster/quorum.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff -puN fs/ocfs2/cluster/quorum.c~ocfs2-fix-qs_holds-may-could-not-be-zero fs/ocfs2/cluster/quorum.c --- a/fs/ocfs2/cluster/quorum.c~ocfs2-fix-qs_holds-may-could-not-be-zero +++ a/fs/ocfs2/cluster/quorum.c @@ -314,13 +314,16 @@ void o2quo_conn_err(u8 node) node, qs->qs_connected); clear_bit(node, qs->qs_conn_bm); + /* + * Bring set hold within this judgement, in order to avoid + * qs_hold could not be zero. + */ + if (test_bit(node, qs->qs_hb_bm)) + o2quo_set_hold(qs, node); } mlog(0, "node %u, %d total\n", node, qs->qs_connected); - if (test_bit(node, qs->qs_hb_bm)) - o2quo_set_hold(qs, node); - spin_unlock(&qs->qs_lock); }