From patchwork Fri Jun 13 08:51:33 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiangyiwen X-Patchwork-Id: 4347701 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 7EA8BBEEAA for ; Fri, 13 Jun 2014 08:54:10 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6FD382024C for ; Fri, 13 Jun 2014 08:54:09 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3DBDA201F7 for ; Fri, 13 Jun 2014 08:54:08 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s5D8rUt5018487 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 13 Jun 2014 08:53:30 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s5D8rR4d000997 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 13 Jun 2014 08:53:28 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1WvNEl-00038l-Pd; Fri, 13 Jun 2014 01:53:27 -0700 Received: from acsinet21.oracle.com ([141.146.126.237]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1WvNER-00033s-6t for ocfs2-devel@oss.oracle.com; Fri, 13 Jun 2014 01:53:08 -0700 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s5D8r6TA000022 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 13 Jun 2014 08:53:06 GMT Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s5D8r2aj031559 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=FAIL) for ; Fri, 13 Jun 2014 08:53:04 GMT Received: from 172.24.2.119 (EHLO szxeml209-edg.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BVB71867; Fri, 13 Jun 2014 16:52:59 +0800 (CST) Received: from SZXEML455-HUB.china.huawei.com (10.82.67.198) by szxeml209-edg.china.huawei.com (172.24.2.184) with Microsoft SMTP Server (TLS) id 14.3.158.1; Fri, 13 Jun 2014 16:51:43 +0800 Received: from [127.0.0.1] (10.177.16.229) by SZXEML455-HUB.china.huawei.com (10.82.67.198) with Microsoft SMTP Server id 14.3.158.1; Fri, 13 Jun 2014 16:51:39 +0800 Message-ID: <539ABB95.7020908@huawei.com> Date: Fri, 13 Jun 2014 16:51:33 +0800 From: jiangyiwen User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.177.16.229] X-CFilter-Loop: Reflected X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=119.145.14.65 ct-class=T1 ct-vol1=0 ct-vol2=6 ct-vol3=5 ct-risk=10 ct-spam1=0 ct-spam2=0 ct-bulk=91 rcpts=1 size=5189 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=cOfZ7DuN c=1 sm=1 tr=0 a=qbZWUeANkjeORAZY4leFnw==:117 a=qbZWUeANkjeORAZY4leFnw==:17 a=mEcZkHQFRz0A:10 a=j1h8Y40MyIgA:10 a=O9dq5j03pVQA:10 a=FfG2nqlV9vQA:10 a=8nJEP1OIZ-IA:10 a=i0EeH86SAAAA:8 a=Z4Rwk6OoAAAA:8 a=3u_Wc40HAuLaE6gj-z UA:9 a=2rcjvdMH-noLjmz_:21 a=H9LrNzPdKBLP-RZh:21 a=wPNLvfGTeEIA:10 a=hPjdaMEvmhQA:10 a=jbrJJM5MRmoA:10 X-Sendmail-CT-RefID: str=0001.0A010204.539ABBF2.001E:SCFSTAT1612107, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-Sendmail-CT-Classification: not spam Cc: Mark Fasheh , ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [patch V2] ocfs2: fix a tiny race when running dirop_fileop_racer X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang Signed-off-by: Joseph Qi Signed-off-by: Andrew Morton Reviewed-by: Mark Fasheh --- fs/ocfs2/namei.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++-- fs/ocfs2/ocfs2_trace.h | 2 ++ 2 files changed, 96 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 2060fc3..7011f9e 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -991,6 +991,65 @@ leave: return status; } +static int ocfs2_check_if_ancestor(struct ocfs2_super *osb, + u64 src_inode_no, u64 dest_inode_no) +{ + int ret = 0, i = 0; + u64 parent_inode_no = 0; + u64 child_inode_no = src_inode_no; + struct inode *child_inode; + +#define MAX_LOOKUP_TIMES 32 + while (1) { + child_inode = ocfs2_iget(osb, child_inode_no, 0, 0); + if (IS_ERR(child_inode)) { + ret = PTR_ERR(child_inode); + break; + } + + ret = ocfs2_inode_lock(child_inode, NULL, 0); + if (ret < 0) { + iput(child_inode); + if (ret != -ENOENT) + mlog_errno(ret); + break; + } + + ret = ocfs2_lookup_ino_from_name(child_inode, "..", 2, + &parent_inode_no); + ocfs2_inode_unlock(child_inode, 0); + iput(child_inode); + if (ret < 0) { + ret = -ENOENT; + break; + } + + if (parent_inode_no == dest_inode_no) { + ret = 1; + break; + } + + if (parent_inode_no == osb->root_inode->i_ino) { + ret = 0; + break; + } + + child_inode_no = parent_inode_no; + + if (++i >= MAX_LOOKUP_TIMES) { + mlog(ML_NOTICE, "max lookup times reached, filesystem " + "may have nested directories, " + "src inode: %llu, dest inode: %llu.\n", + (unsigned long long)src_inode_no, + (unsigned long long)dest_inode_no); + ret = 0; + break; + } + } + + return ret; +} + /* * The only place this should be used is rename! * if they have the same id, then the 1st one is the only one locked. @@ -1002,6 +1061,7 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, struct inode *inode2) { int status; + int inode1_is_ancestor, inode2_is_ancestor; struct ocfs2_inode_info *oi1 = OCFS2_I(inode1); struct ocfs2_inode_info *oi2 = OCFS2_I(inode2); struct buffer_head **tmpbh; @@ -1015,9 +1075,26 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, if (*bh2) *bh2 = NULL; - /* we always want to lock the one with the lower lockid first. */ + /* we always want to lock the one with the lower lockid first. + * and if they are nested, we lock ancestor first */ if (oi1->ip_blkno != oi2->ip_blkno) { - if (oi1->ip_blkno < oi2->ip_blkno) { + inode1_is_ancestor = ocfs2_check_if_ancestor(osb, oi2->ip_blkno, + oi1->ip_blkno); + if (inode1_is_ancestor < 0) { + status = inode1_is_ancestor; + goto bail; + } + + inode2_is_ancestor = ocfs2_check_if_ancestor(osb, oi1->ip_blkno, + oi2->ip_blkno); + if (inode2_is_ancestor < 0) { + status = inode2_is_ancestor; + goto bail; + } + + if ((inode1_is_ancestor == 1) || + (oi1->ip_blkno < oi2->ip_blkno && + inode2_is_ancestor == 0)) { /* switch id1 and id2 around */ tmpbh = bh2; bh2 = bh1; @@ -1134,6 +1211,21 @@ static int ocfs2_rename(struct inode *old_dir, goto bail; } rename_lock = 1; + + /* here we cannot guarantee the inodes haven't just been + * changed, so check if they are nested again */ + status = ocfs2_check_if_ancestor(osb, new_dir->i_ino, + old_inode->i_ino); + if (status < 0) { + mlog_errno(status); + goto bail; + } else if (status == 1) { + status = -EPERM; + trace_ocfs2_rename_not_permitted( + (unsigned long long)old_inode->i_ino, + (unsigned long long)new_dir->i_ino); + goto bail; + } } /* if old and new are the same, this'll just do one lock. */ diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h index 1b60c62..6cb019b 100644 --- a/fs/ocfs2/ocfs2_trace.h +++ b/fs/ocfs2/ocfs2_trace.h @@ -2292,6 +2292,8 @@ TRACE_EVENT(ocfs2_rename, __entry->new_len, __get_str(new_name)) ); +DEFINE_OCFS2_ULL_ULL_EVENT(ocfs2_rename_not_permitted); + TRACE_EVENT(ocfs2_rename_target_exists, TP_PROTO(int new_len, const char *new_name), TP_ARGS(new_len, new_name),