From patchwork Fri Jan 24 20:47:03 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 3535971 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D2D35C02DC for ; Fri, 24 Jan 2014 20:48:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1DBFF2015D for ; Fri, 24 Jan 2014 20:48:22 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0961920173 for ; Fri, 24 Jan 2014 20:48:15 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s0OKlmjJ022088 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 24 Jan 2014 20:47:48 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s0OKllPJ029009 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Jan 2014 20:47:48 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1W6nfH-0003lm-RR; Fri, 24 Jan 2014 12:47:47 -0800 Received: from acsinet22.oracle.com ([141.146.126.238]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1W6nec-0003hU-Dh for ocfs2-devel@oss.oracle.com; Fri, 24 Jan 2014 12:47:06 -0800 Received: from userp1020.oracle.com (userp1020.oracle.com [156.151.31.79]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s0OKl5nr002868 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 24 Jan 2014 20:47:06 GMT Received: from mail-yk0-f202.google.com (mail-yk0-f202.google.com [209.85.160.202]) by userp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s0OKl4oP002216 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Fri, 24 Jan 2014 20:47:05 GMT Received: by mail-yk0-f202.google.com with SMTP id 131so1136557ykp.1 for ; Fri, 24 Jan 2014 12:47:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:cc:from:date:mime-version :content-type:content-transfer-encoding:message-id; bh=iBuOI07JeiQqygud8wQjEdkIO93YRd+qnIWowXqnryQ=; b=h5/P1j98f8DAy7OQ54nXCl2WY13qA+/j9SjEZYcUFvuI6I/YbsgMCe9II9mXQIm/5j ZjiwKFgKqKRMu0Nct8K5LQaaNh/wpDFnt1DtOi/QtKBtshOxRgA+5GBf6EIrUSIbGN1W d4gBNWtoXaZPZ958kFI2+Q/G9BcNyqXvw52uMS3c8RUYAVyym7sWPaNCaAMk5o02o857 qm2ufE2BUu3XFfCaKl/gsHC8m2WAu6RFTMm8YXr6yxbLCYvWjXlJ3gEIyL+nUrkr3Z35 6n085BO8kSPfpRbpeF+kPeBf5VNv0UbUMiwMI+wz2Jpjl3l4XKL3xsKbdvf6m8sH1dbT x1ZQ== X-Gm-Message-State: ALoCoQlswDIACo/SzS8VWP8/qhjxA7/uxKgWJ8hB0B/KHherhBT8RljPC7kZS9TJjqp8ilSyj684 X-Received: by 10.58.144.41 with SMTP id sj9mr5636535veb.8.1390596424303; Fri, 24 Jan 2014 12:47:04 -0800 (PST) Received: from corp2gmr1-2.hot.corp.google.com (corp2gmr1-2.hot.corp.google.com [172.24.189.93]) by gmr-mx.google.com with ESMTPS id d9si148582yhl.2.2014.01.24.12.47.04 for (version=TLSv1.1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 24 Jan 2014 12:47:04 -0800 (PST) Received: from localhost.localdomain (akpm3.mtv.corp.google.com [172.17.131.127]) by corp2gmr1-2.hot.corp.google.com (Postfix) with ESMTP id AF70F5A4203; Fri, 24 Jan 2014 12:47:03 -0800 (PST) To: ocfs2-devel@oss.oracle.com From: akpm@linux-foundation.org Date: Fri, 24 Jan 2014 12:47:03 -0800 MIME-Version: 1.0 Message-Id: <20140124204703.AF70F5A4203@corp2gmr1-2.hot.corp.google.com> X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=209.85.160.202 ct-class=R6 ct-vol1=0 ct-vol2=0 ct-vol3=0 ct-risk=68 ct-spam1=0 ct-spam2=0 ct-bulk=0 rcpts=1 size=4977 X-SPF-Info: PASS::mail-yk0-f202.google.com X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=YfaEuWhf c=1 sm=1 tr=0 a=KuQAKfTBe1t0skOjdBJ24Q==:117 a=2nAkmC32ipQA:10 a=NEiEQogP1MkA:10 a=os2CZ2fo8YAA:10 a=Z4Rwk6OoAAAA:8 a=1XWaLZrsAAAA:8 a=yPCof4ZbAAAA:8 a=vfmiOUlREL4A:10 a=i0EeH86SAAAA:8 a=IXr_WNlcAAAA:8 a=iox4zFpeAAAA:8 a=OF9kw4EbaUITvTiIaxYA:9 a=wwBikDgkxkmaUOOJ:21 a=yMN2Y19zTfGsA4J3:21 a=e4xtJxf3HDoA:10 a=hPjdaMEvmhQA:10 a=T5ZRoNnfl4MA:10 a=n9GBPR9yFnkA:10 a=jbrJJM5MRmoA:10 X-Sendmail-CT-Classification: not spam X-Sendmail-CT-RefID: str=0001.0A090204.52E2D149.007A, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Cc: jiangyiwen@huawei.com, mfasheh@suse.com Subject: [Ocfs2-devel] [patch 04/11] ocfs2: fix a tiny race when running dirop_fileop_racer X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yiwen Jiang Subject: ocfs2: fix a tiny race when running dirop_fileop_racer When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang Signed-off-by: Joseph Qi Cc: Joel Becker Cc: Mark Fasheh Signed-off-by: Andrew Morton --- fs/ocfs2/namei.c | 97 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 95 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer +++ a/fs/ocfs2/namei.c @@ -954,6 +954,65 @@ leave: return status; } +static int ocfs2_check_if_ancestor(struct ocfs2_super *osb, + u64 src_inode_no, u64 dest_inode_no) +{ + int ret = 0, i = 0; + u64 parent_inode_no = 0; + u64 child_inode_no = src_inode_no; + struct inode *child_inode; + +#define MAX_LOOKUP_TIMES 32 + while (1) { + child_inode = ocfs2_iget(osb, child_inode_no, 0, 0); + if (IS_ERR(child_inode)) { + ret = PTR_ERR(child_inode); + break; + } + + ret = ocfs2_inode_lock(child_inode, NULL, 0); + if (ret < 0) { + iput(child_inode); + if (ret != -ENOENT) + mlog_errno(ret); + break; + } + + ret = ocfs2_lookup_ino_from_name(child_inode, "..", 2, + &parent_inode_no); + ocfs2_inode_unlock(child_inode, 0); + iput(child_inode); + if (ret < 0) { + ret = -ENOENT; + break; + } + + if (parent_inode_no == dest_inode_no) { + ret = 1; + break; + } + + if (parent_inode_no == osb->root_inode->i_ino) { + ret = 0; + break; + } + + child_inode_no = parent_inode_no; + + if (++i >= MAX_LOOKUP_TIMES) { + mlog(ML_NOTICE, "max lookup times reached, filesystem " + "may have nested directories, " + "src inode: %llu, dest inode: %llu.\n", + (unsigned long long)src_inode_no, + (unsigned long long)dest_inode_no); + ret = 0; + break; + } + } + + return ret; +} + /* * The only place this should be used is rename! * if they have the same id, then the 1st one is the only one locked. @@ -965,6 +1024,7 @@ static int ocfs2_double_lock(struct ocfs struct inode *inode2) { int status; + int inode1_is_ancestor, inode2_is_ancestor; struct ocfs2_inode_info *oi1 = OCFS2_I(inode1); struct ocfs2_inode_info *oi2 = OCFS2_I(inode2); struct buffer_head **tmpbh; @@ -978,9 +1038,26 @@ static int ocfs2_double_lock(struct ocfs if (*bh2) *bh2 = NULL; - /* we always want to lock the one with the lower lockid first. */ + /* we always want to lock the one with the lower lockid first. + * and if they are nested, we lock ancestor first */ if (oi1->ip_blkno != oi2->ip_blkno) { - if (oi1->ip_blkno < oi2->ip_blkno) { + inode1_is_ancestor = ocfs2_check_if_ancestor(osb, oi2->ip_blkno, + oi1->ip_blkno); + if (inode1_is_ancestor < 0) { + status = inode1_is_ancestor; + goto bail; + } + + inode2_is_ancestor = ocfs2_check_if_ancestor(osb, oi1->ip_blkno, + oi2->ip_blkno); + if (inode2_is_ancestor < 0) { + status = inode2_is_ancestor; + goto bail; + } + + if ((inode1_is_ancestor == 1) || + (oi1->ip_blkno < oi2->ip_blkno && + inode2_is_ancestor == 0)) { /* switch id1 and id2 around */ tmpbh = bh2; bh2 = bh1; @@ -1097,6 +1174,22 @@ static int ocfs2_rename(struct inode *ol goto bail; } rename_lock = 1; + + /* here we cannot guarantee the inodes haven't just been + * changed, so check if they are nested again */ + status = ocfs2_check_if_ancestor(osb, new_dir->i_ino, + old_inode->i_ino); + if (status < 0) { + mlog_errno(status); + goto bail; + } else if (status == 1) { + status = -EPERM; + mlog(ML_ERROR, "src inode %llu should not be ancestor " + "of new dir inode %llu\n", + (unsigned long long)old_inode->i_ino, + (unsigned long long)new_dir->i_ino); + goto bail; + } } /* if old and new are the same, this'll just do one lock. */