From patchwork Wed Aug 6 20:32:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 4688481 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 73051C0338 for ; Wed, 6 Aug 2014 20:33:00 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9648320145 for ; Wed, 6 Aug 2014 20:32:59 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 893A420125 for ; Wed, 6 Aug 2014 20:32:58 +0000 (UTC) Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s76KWoj8019859 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 6 Aug 2014 20:32:51 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s76KWoJ6011255 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 6 Aug 2014 20:32:50 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XF7tC-0003ri-4F; Wed, 06 Aug 2014 13:32:50 -0700 Received: from ucsinet21.oracle.com ([156.151.31.93]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XF7t6-0003r1-JU for ocfs2-devel@oss.oracle.com; Wed, 06 Aug 2014 13:32:44 -0700 Received: from userp1020.oracle.com (userp1020.oracle.com [156.151.31.79]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s76KWh0e017573 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 6 Aug 2014 20:32:44 GMT Received: from aserp2040.oracle.com (aserp2040.oracle.com [141.146.126.75]) by userp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s76KWhuN027758 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 6 Aug 2014 20:32:43 GMT Received: from pps.filterd (aserp2040.oracle.com [127.0.0.1]) by aserp2040.oracle.com (8.14.7/8.14.7) with SMTP id s76KWgtS018097 for ; Wed, 6 Aug 2014 20:32:43 GMT Received: from mail-ie0-f202.google.com (mail-ie0-f202.google.com [209.85.223.202]) by aserp2040.oracle.com with ESMTP id 1nknw4kadw-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Wed, 06 Aug 2014 20:32:40 +0000 Received: by mail-ie0-f202.google.com with SMTP id rl12so578397iec.1 for ; Wed, 06 Aug 2014 13:32:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:user-agent :mime-version:content-type:content-transfer-encoding; bh=PSbCVOKTchJkdGc1VOBgEV4oamOK4PW9yFsmD/2JVvY=; b=Y5h2Dk+LTyuWEX5SkjukPR25eCUFE3cJxI/nXiFzf2nH7KA+2j7nHWSeQXJS+ByOFj 9iStqZouOvjqyAACV+DjnHMFm4ChAiDRUWFuV6CpZzvngFIT+Sdb6LWlHo3ghJDLBu25 F+HJ1CEiopc9sSZ5EJjhS1Fvl0qrWSAsUfoPQiQrglqLCp2lmdvfj/WgGEZL+RbMIIdU 1hbo9XopdBhEt75SomLPQ0PeiJKTWdySbAfRZ9h+HYQIJS3ap7MGgzgfFJCniDQGgw4m 4FEY/BvS4VbLzvvUOLWlu4xtgU7TcZwlhRJfZGl6FReX9ZqFw8F0AnXqe6tR0VPgIG3T FvgA== X-Gm-Message-State: ALoCoQnXydDf/mCGqsz+Yklq3p+vL+IkIxXnWATWU0e7RiI0/9yUXX8msPgmlh3d73t/Irx84FO8 X-Received: by 10.182.91.97 with SMTP id cd1mr7180950obb.33.1407357127562; Wed, 06 Aug 2014 13:32:07 -0700 (PDT) Received: from corp2gmr1-2.hot.corp.google.com (corp2gmr1-2.hot.corp.google.com [172.24.189.93]) by gmr-mx.google.com with ESMTPS id z50si139947yhb.3.2014.08.06.13.32.07 for (version=TLSv1.1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 06 Aug 2014 13:32:07 -0700 (PDT) Received: from akpm3.mtv.corp.google.com (akpm3.mtv.corp.google.com [172.17.131.127]) by corp2gmr1-2.hot.corp.google.com (Postfix) with ESMTP id 692455A4586; Wed, 6 Aug 2014 13:32:07 -0700 (PDT) Received: by akpm3.mtv.corp.google.com (Postfix, from userid 25780) id 2CBDF1A0536; Wed, 6 Aug 2014 13:32:07 -0700 (PDT) Date: Wed, 06 Aug 2014 13:32:07 -0700 From: akpm@linux-foundation.org To: jlbec@evilplan.org, mfasheh@suse.com, ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org, junxiao.bi@oracle.com, joseph.qi@huawei.com, srinivas.eeda@oracle.com Message-ID: <53e290c7.CpvTdt/ZlZEYmIIE%akpm@linux-foundation.org> User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5600 definitions=7522 signatures=670497 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1408060245 Subject: [Ocfs2-devel] [patch 04/10] ocfs2: o2net: set tcp user timeout to max value X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Junxiao Bi Subject: ocfs2: o2net: set tcp user timeout to max value When tcp retransmit timeout(15mins), the connection will be closed. Pending messages may be lost during this time. So we set tcp user timeout to override the retransmit timeout to the max value. This is OK for ocfs2 since we have disk heartbeat, if peer crash, the disk heartbeat will timeout and it will be evicted, if disk heartbeat not timeout and connection idle for a long time, then this means the cluster enters split-brain state, since fence can't happen, we'd better keep the connection and wait network recover. Signed-off-by: Junxiao Bi Reviewed-by: Srinivas Eeda Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Signed-off-by: Andrew Morton Reviewed-by: Mark Fasheh --- fs/ocfs2/cluster/tcp.c | 20 ++++++++++++++++++++ fs/ocfs2/cluster/tcp.h | 1 + 2 files changed, 21 insertions(+) diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-set-tcp-user-timeout-to-max-value fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-set-tcp-user-timeout-to-max-value +++ a/fs/ocfs2/cluster/tcp.c @@ -1480,6 +1480,14 @@ static int o2net_set_nodelay(struct sock return ret; } +static int o2net_set_usertimeout(struct socket *sock) +{ + int user_timeout = O2NET_TCP_USER_TIMEOUT; + + return kernel_setsockopt(sock, SOL_TCP, TCP_USER_TIMEOUT, + (char *)&user_timeout, sizeof(user_timeout)); +} + static void o2net_initialize_handshake(void) { o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32( @@ -1663,6 +1671,12 @@ static void o2net_start_connect(struct w goto out; } + ret = o2net_set_usertimeout(sock); + if (ret) { + mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret); + goto out; + } + o2net_register_callbacks(sc->sc_sock->sk, sc); spin_lock(&nn->nn_lock); @@ -1844,6 +1858,12 @@ static int o2net_accept_one(struct socke goto out; } + ret = o2net_set_usertimeout(new_sock); + if (ret) { + mlog(ML_ERROR, "set TCP_USER_TIMEOUT failed with %d\n", ret); + goto out; + } + slen = sizeof(sin); ret = new_sock->ops->getname(new_sock, (struct sockaddr *) &sin, &slen, 1); diff -puN fs/ocfs2/cluster/tcp.h~ocfs2-o2net-set-tcp-user-timeout-to-max-value fs/ocfs2/cluster/tcp.h --- a/fs/ocfs2/cluster/tcp.h~ocfs2-o2net-set-tcp-user-timeout-to-max-value +++ a/fs/ocfs2/cluster/tcp.h @@ -63,6 +63,7 @@ typedef void (o2net_post_msg_handler_fun #define O2NET_KEEPALIVE_DELAY_MS_DEFAULT 2000 #define O2NET_IDLE_TIMEOUT_MS_DEFAULT 30000 +#define O2NET_TCP_USER_TIMEOUT 0x7fffffff /* TODO: figure this out.... */ static inline int o2net_link_down(int err, struct socket *sock)