From patchwork Mon Jun 9 20:04:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 4323411 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 65E12BEEAA for ; Mon, 9 Jun 2014 20:05:00 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7FC1320204 for ; Mon, 9 Jun 2014 20:04:59 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 576FE201BB for ; Mon, 9 Jun 2014 20:04:58 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s59K4lVp008793 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 9 Jun 2014 20:04:48 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s59K4k54024853 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 9 Jun 2014 20:04:47 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Wu5oE-0000C1-My; Mon, 09 Jun 2014 13:04:46 -0700 Received: from ucsinet21.oracle.com ([156.151.31.93]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Wu5nc-000090-DK for ocfs2-devel@oss.oracle.com; Mon, 09 Jun 2014 13:04:08 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s59K47gG023025 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 9 Jun 2014 20:04:08 GMT Received: from mail-ie0-f201.google.com (mail-ie0-f201.google.com [209.85.223.201]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s59K46bF025342 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Mon, 9 Jun 2014 20:04:07 GMT Received: by mail-ie0-f201.google.com with SMTP id rl12so491659iec.0 for ; Mon, 09 Jun 2014 13:04:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:cc:from:date:mime-version :content-type:content-transfer-encoding:message-id; bh=HeRJz9VfxXiziYjWQS0irM+R0W/w4z3ypCu8rmOeuYg=; b=RzlBuJnmZMTuUqBDDes/Ziy5QM9p4vEF9phf8Bt5pSgtYx6ndAAOOq2E4eaiJnKGJO ATMCr8v48xYcMnT/maXRZsuG2So3HU5BeRBMKznaBhqw/aFCWgJf42anM4VQwmNHeMsc +2cXObbsaoQ0x5ghfyINSwV639qp49t+D7QDbnBLax2T5nCTDSK6P1G5lm7IJtf8azrZ w9HE0bZP/XtpR4yNLmBhFF5Z6NDBLTtMDWW4+yO8RlHPQ7SGhEPfBk/r+XywFF8kNoe6 yqJyZauAZQlkhoLi42tlXwIv338ehZbRlkTfCdqubYYrYKbzyVoyszH3H/YwDsN8YqWR hqCw== X-Gm-Message-State: ALoCoQmd/V5r87OpHkre+/pwCQbUJ9QWlI7YKiWB5bMCkKT64pj6D99SV738xnLnKIHTs/5+xOd/ X-Received: by 10.182.119.194 with SMTP id kw2mr3708390obb.27.1402344246372; Mon, 09 Jun 2014 13:04:06 -0700 (PDT) Received: from corp2gmr1-2.hot.corp.google.com (corp2gmr1-2.hot.corp.google.com [172.24.189.93]) by gmr-mx.google.com with ESMTPS id z50si1112720yhb.3.2014.06.09.13.04.06 for (version=TLSv1.1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 09 Jun 2014 13:04:06 -0700 (PDT) Received: from localhost.localdomain (akpm3.mtv.corp.google.com [172.17.131.127]) by corp2gmr1-2.hot.corp.google.com (Postfix) with ESMTP id C8C6A5A4A39; Mon, 9 Jun 2014 13:04:05 -0700 (PDT) To: ocfs2-devel@oss.oracle.com From: akpm@linux-foundation.org Date: Mon, 09 Jun 2014 13:04:05 -0700 MIME-Version: 1.0 Message-Id: <20140609200405.C8C6A5A4A39@corp2gmr1-2.hot.corp.google.com> X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=209.85.223.201 ct-class=R5 ct-vol1=-87 ct-vol2=7 ct-vol3=6 ct-risk=56 ct-spam1=90 ct-spam2=12 ct-bulk=8 rcpts=1 size=4979 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=YMWml32x c=1 sm=1 tr=0 a=s1ldTWT3zXePs8c4p+QLiA==:117 a=TU11sMXmJiEA:10 a=NEiEQogP1MkA:10 a=os2CZ2fo8YAA:10 a=Z4Rwk6OoAAAA:8 a=1XWaLZrsAAAA:8 a=i0EeH86SAAAA:8 a=IXr_WNlcAAAA:8 a=iox4zFpeAAAA:8 a=OF9kw4EbaUITvTiIaxYA:9 a=wwBikDgk xkmaUOOJ:21 a=yMN2Y19zTfGsA4J3:21 a=e4xtJxf3HDoA:10 a=hPjdaMEvmhQA:10 a=T5ZRoNnfl4MA:10 a=n9GBPR9yFnkA:10 a=jbrJJM5MRmoA:10 X-Sendmail-CT-RefID: str=0001.0A090203.53961337.004B, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-Sendmail-CT-Classification: not spam Cc: mfasheh@suse.com Subject: [Ocfs2-devel] [patch 6/8] ocfs2: fix a tiny race when running dirop_fileop_racer X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yiwen Jiang Subject: ocfs2: fix a tiny race when running dirop_fileop_racer When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang Signed-off-by: Joseph Qi Cc: Joel Becker Cc: Mark Fasheh Signed-off-by: Andrew Morton --- fs/ocfs2/namei.c | 97 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 95 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer +++ a/fs/ocfs2/namei.c @@ -991,6 +991,65 @@ leave: return status; } +static int ocfs2_check_if_ancestor(struct ocfs2_super *osb, + u64 src_inode_no, u64 dest_inode_no) +{ + int ret = 0, i = 0; + u64 parent_inode_no = 0; + u64 child_inode_no = src_inode_no; + struct inode *child_inode; + +#define MAX_LOOKUP_TIMES 32 + while (1) { + child_inode = ocfs2_iget(osb, child_inode_no, 0, 0); + if (IS_ERR(child_inode)) { + ret = PTR_ERR(child_inode); + break; + } + + ret = ocfs2_inode_lock(child_inode, NULL, 0); + if (ret < 0) { + iput(child_inode); + if (ret != -ENOENT) + mlog_errno(ret); + break; + } + + ret = ocfs2_lookup_ino_from_name(child_inode, "..", 2, + &parent_inode_no); + ocfs2_inode_unlock(child_inode, 0); + iput(child_inode); + if (ret < 0) { + ret = -ENOENT; + break; + } + + if (parent_inode_no == dest_inode_no) { + ret = 1; + break; + } + + if (parent_inode_no == osb->root_inode->i_ino) { + ret = 0; + break; + } + + child_inode_no = parent_inode_no; + + if (++i >= MAX_LOOKUP_TIMES) { + mlog(ML_NOTICE, "max lookup times reached, filesystem " + "may have nested directories, " + "src inode: %llu, dest inode: %llu.\n", + (unsigned long long)src_inode_no, + (unsigned long long)dest_inode_no); + ret = 0; + break; + } + } + + return ret; +} + /* * The only place this should be used is rename! * if they have the same id, then the 1st one is the only one locked. @@ -1002,6 +1061,7 @@ static int ocfs2_double_lock(struct ocfs struct inode *inode2) { int status; + int inode1_is_ancestor, inode2_is_ancestor; struct ocfs2_inode_info *oi1 = OCFS2_I(inode1); struct ocfs2_inode_info *oi2 = OCFS2_I(inode2); struct buffer_head **tmpbh; @@ -1015,9 +1075,26 @@ static int ocfs2_double_lock(struct ocfs if (*bh2) *bh2 = NULL; - /* we always want to lock the one with the lower lockid first. */ + /* we always want to lock the one with the lower lockid first. + * and if they are nested, we lock ancestor first */ if (oi1->ip_blkno != oi2->ip_blkno) { - if (oi1->ip_blkno < oi2->ip_blkno) { + inode1_is_ancestor = ocfs2_check_if_ancestor(osb, oi2->ip_blkno, + oi1->ip_blkno); + if (inode1_is_ancestor < 0) { + status = inode1_is_ancestor; + goto bail; + } + + inode2_is_ancestor = ocfs2_check_if_ancestor(osb, oi1->ip_blkno, + oi2->ip_blkno); + if (inode2_is_ancestor < 0) { + status = inode2_is_ancestor; + goto bail; + } + + if ((inode1_is_ancestor == 1) || + (oi1->ip_blkno < oi2->ip_blkno && + inode2_is_ancestor == 0)) { /* switch id1 and id2 around */ tmpbh = bh2; bh2 = bh1; @@ -1135,6 +1212,22 @@ static int ocfs2_rename(struct inode *ol goto bail; } rename_lock = 1; + + /* here we cannot guarantee the inodes haven't just been + * changed, so check if they are nested again */ + status = ocfs2_check_if_ancestor(osb, new_dir->i_ino, + old_inode->i_ino); + if (status < 0) { + mlog_errno(status); + goto bail; + } else if (status == 1) { + status = -EPERM; + mlog(ML_ERROR, "src inode %llu should not be ancestor " + "of new dir inode %llu\n", + (unsigned long long)old_inode->i_ino, + (unsigned long long)new_dir->i_ino); + goto bail; + } } /* if old and new are the same, this'll just do one lock. */