From patchwork Wed May 8 02:43:15 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guozhonghua X-Patchwork-Id: 2537291 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) by patchwork1.kernel.org (Postfix) with ESMTP id 056C23FC5A for ; Wed, 8 May 2013 02:44:29 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r482hmuE004835 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 8 May 2013 02:43:49 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r482hjle021260 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 May 2013 02:43:45 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1UZuM4-0001Iq-WB; Tue, 07 May 2013 19:43:45 -0700 Received: from acsinet21.oracle.com ([141.146.126.237]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1UZuLh-0001IN-I0; Tue, 07 May 2013 19:43:21 -0700 Received: from userp1020.oracle.com (userp1020.oracle.com [156.151.31.79]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r482hKtT020657 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 8 May 2013 02:43:21 GMT Received: from h3cedge03-ex.h3c.com (h3c.com [60.191.123.56] (may be forged)) by userp1020.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r482h1VU005511; Wed, 8 May 2013 02:43:01 GMT Received: from H3CHUB01-EX.srv.huawei-3com.com (10.63.16.181) by h3cedge03-ex.h3c.com (172.25.12.83) with Microsoft SMTP Server (TLS) id 14.1.218.12; Wed, 8 May 2013 10:43:15 +0800 Received: from H3CHUB04-EX.srv.huawei-3com.com (10.63.20.170) by H3CHUB01-EX.srv.huawei-3com.com (10.63.16.181) with Microsoft SMTP Server (TLS) id 14.1.218.12; Wed, 8 May 2013 10:43:17 +0800 Received: from H3CMLB02-EX.srv.huawei-3com.com ([fe80::399d:90f6:c5bb:9cc0]) by H3CHUB04-EX.srv.huawei-3com.com ([fe80::7967:781a:df8b:b87f%11]) with mapi id 14.01.0218.012; Wed, 8 May 2013 10:43:15 +0800 From: Guozhonghua To: "ocfs2-devel@oss.oracle.com" , "ocfs2-devel-request@oss.oracle.com" Thread-Topic: Patch request reviews, for node reconnecting with other nodes whose node number is little than local, thanks a lot. Thread-Index: Ac5LlctjEUD/IL+DSnGzWx7AK0VTXA== Date: Wed, 8 May 2013 02:43:15 +0000 Message-ID: <71604351584F6A4EBAE558C676F37CA417BD78AA@H3CMLB02-EX.srv.huawei-3com.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.153.28.108] MIME-Version: 1.0 X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=60.191.123.56 ct-class=T1 ct-vol1=0 ct-vol2=4 ct-vol3=3 ct-risk=43 ct-spam1=0 ct-spam2=0 ct-bulk=100 rcpts=2 size=11789 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.0 cv=WfUlmjdX c=1 sm=1 a=PfPtp6TY2FvOwL7zPRqZkQ==:17 a=O2wkX6l01XoA:10 a=veWoJ04SYe8A:10 a=Pkar6KgMEUUA:10 a=xqWC_Br6kY4A:10 a=UtV5FjuMAAAA:8 a=slbrsl1HAAAA:8 a=ec9uy9Y1PP8A:10 a=XdhJntv6iAtVAbuOq98A:9 a=CjuIK1q_8ugA:10 a=5v29uTVXalj NP41M:21 a=-mFOmvZ8ertGe2W2:21 a=LTSj2DBMSdhIgR8t0yoA:9 a=UiCQ7L4-1S4A:10 a=_W_S_7VecoQA:10 a=frz4AuCg-hUA:10 a=58OTPXCJR3ibxt4T:21 a=PfPtp6TY2FvOwL7zPRqZkQ==:117 X-Sendmail-CT-Classification: not spam X-Sendmail-CT-RefID: str=0001.0A090206.5189BBC8.0025:SCFSTAT4330676, ss=1, re=-4.101, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Cc: Changlimin Subject: [Ocfs2-devel] Patch request reviews, for node reconnecting with other nodes whose node number is little than local, thanks a lot. X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] Hi, everyone, I had have a test with eight nodes and find one issue. The Linux kernel version is 3.2.40. As I migrate processes from one node to another, those processes is open the files on the OCFS2 storage. Sometime one node shutdown TCP connection with that node whose node number is larger because long time without any message from it. As the TCP connection shutdown, the node whose number larger did not restart connection to the node, whose number is little and shutdown the TCP connection. So I review the code of the cluster and find it may be a bug. I changed it and have a test. Is there anybody having time to view and make sure that those changes is correct? Thanks a lot. The diff file is as below, of the file is /cluster/tcp.c: root@gzh-dev:/home/dev/test_replace/ocfs2_ko# diff -pu ocfs2-ko-3.2-compare/cluster/tcp.c ocfs2-ko-3.2/cluster/tcp.c spin_lock(&nn->nn_lock); if (!nn->nn_sc_valid) { + /** trigger reconnect with other nodes whose node number is little than local + * while they are still able to access the storage + */ + atomic_set(&nn->nn_timeout, 1); + printk(KERN_NOTICE "o2net: No connection established with " "node %u after %u.%u seconds, giving up.\n", o2net_num_from_nn(nn), ------------------------------------------------------------------------------------------------------------------------------------- ???????????????????????????????????????? ???????????????????????????????????????? ???????????????????????????????????????? ??? This e-mail and its attachments contain confidential information from H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! --- ocfs2-ko-3.2-compare/cluster/tcp.c 2012-10-29 19:33:19.534200000 +0800 +++ ocfs2-ko-3.2/cluster/tcp.c 2013-05-08 09:33:16.386277310 +0800 @@ -1699,6 +1698,10 @@ static void o2net_start_connect(struct w if (ret == -EINPROGRESS) ret = 0; + /** Reset the timeout with 0 to avoid connection again */ + if (ret == 0) { + atomic_set(&nn->nn_timeout, 0); + } out: if (ret) { printk(KERN_NOTICE "o2net: Connect attempt to " SC_NODEF_FMT @@ -1725,6 +1728,11 @@ static void o2net_connect_expired(struct