From patchwork Fri Jul 26 04:09:38 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xue jiufei X-Patchwork-Id: 2833775 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 3B07A9F243 for ; Fri, 26 Jul 2013 04:12:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 322B72018E for ; Fri, 26 Jul 2013 04:12:51 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1BFAD20182 for ; Fri, 26 Jul 2013 04:12:50 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r6Q4BnMH028106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Jul 2013 04:11:50 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r6Q4BVdG005118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 26 Jul 2013 04:11:35 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1V2ZNL-0001AX-4k; Thu, 25 Jul 2013 21:11:31 -0700 Received: from acsinet22.oracle.com ([141.146.126.238]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1V2ZN1-00019U-6k for ocfs2-devel@oss.oracle.com; Thu, 25 Jul 2013 21:11:13 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id r6Q4B9Uj022290 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 26 Jul 2013 04:11:10 GMT Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by aserp1020.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id r6Q4B622015672 (version=TLSv1/SSLv3 cipher=DES-CBC3-SHA bits=168 verify=FAIL) for ; Fri, 26 Jul 2013 04:11:08 GMT Received: from 172.24.2.119 (EHLO szxeml213-edg.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id BFD16366; Fri, 26 Jul 2013 12:10:20 +0800 (CST) Received: from SZXEML463-HUB.china.huawei.com (10.82.67.206) by szxeml213-edg.china.huawei.com (172.24.2.30) with Microsoft SMTP Server (TLS) id 14.1.323.7; Fri, 26 Jul 2013 12:10:18 +0800 Received: from [127.0.0.1] (10.135.72.87) by szxeml463-hub.china.huawei.com (10.82.67.206) with Microsoft SMTP Server id 14.1.323.7; Fri, 26 Jul 2013 12:10:14 +0800 Message-ID: <51F1F682.4000009@huawei.com> Date: Fri, 26 Jul 2013 12:09:38 +0800 From: Xue jiufei User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.135.72.87] X-CFilter-Loop: Reflected X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=119.145.14.65 ct-class=T1 ct-vol1=0 ct-vol2=6 ct-vol3=5 ct-risk=30 ct-spam1=44 ct-spam2=0 ct-bulk=88 rcpts=1 size=2232 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=B5aAjodM c=1 sm=1 tr=0 a=qbZWUeANkjeORAZY4leFnw==:117 a=qbZWUeANkjeORAZY4leFnw==:17 a=7dTNpnL_XZEA:10 a=je8okafH2F8A:10 a=dgr5K4J83c0A:10 a=O9dq5j03pVQA:10 a=-6GgXGw8a1AA:10 a=8nJEP1OIZ-IA:10 a=i0EeH86SAAAA:8 a=fPSz9dYeOrkA:10 a =ny5G36K7HgAgc-cxQWMA:9 a=wPNLvfGTeEIA:10 a=hPjdaMEvmhQA:10 X-Sendmail-CT-Classification: not spam X-Sendmail-CT-RefID: str=0001.0A090209.51F1F6DD.0030:SCFSTAT1612107, ss=1, re=-4.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Cc: Mark Fasheh , ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH] ocfs2: dlm_request_all_locks() should deal with the status sent from target node X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list Reply-To: xuejiufei@huawei.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP dlm_request_all_locks() should deal with the status sent from target node if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall into endless loop, waiting for other nodes to send locks and DLM_RECO_DATA_DONE_MSG to me. NodeA NodeB selected as recovery master dlm_remaster_locks() ->dlm_request_all_locks() send DLM_LOCK_REQUEST_MSG to nodeA It happened that NodeA cannot alloc memory when it processes this message. dlm_request_all_locks_handler() do not queue dlm_request_all_locks_worker and returns -ENOMEM. It will never send locks andDLM_RECO_DATA_DONE_MSG to NodeB. NodeB do not deal with the status sent from nodeA, and will fall in endless loop waiting for the recovery state of NodeA to be changed. Signed-off-by: joyce --- fs/ocfs2/dlm/dlmrecovery.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 773bd32..f945502 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -787,6 +787,7 @@ static int dlm_request_all_locks(struct dlm_ctxt *dlm, u8 request_from, { struct dlm_lock_request lr; int ret; + int status; mlog(0, "\n"); @@ -800,13 +801,15 @@ static int dlm_request_all_locks(struct dlm_ctxt *dlm, u8 request_from, // send message ret = o2net_send_message(DLM_LOCK_REQUEST_MSG, dlm->key, - &lr, sizeof(lr), request_from, NULL); + &lr, sizeof(lr), request_from, &status); /* negative status is handled by caller */ if (ret < 0) mlog(ML_ERROR, "%s: Error %d send LOCK_REQUEST to node %u " "to recover dead node %u\n", dlm->name, ret, request_from, dead_node); + else + ret = status; // return from here, then // sleep until all received or error return ret;