From patchwork Thu May 10 07:49:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang He X-Patchwork-Id: 10391563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3AEF660153 for ; Thu, 10 May 2018 10:03:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F4CE289B8 for ; Thu, 10 May 2018 10:03:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 138D3289BA; Thu, 10 May 2018 10:03:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_LOW autolearn=unavailable version=3.3.1 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) (using TLSv1.2 with cipher AES256-SHA256 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7F8C6289B8 for ; Thu, 10 May 2018 10:03:30 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4A7kFx2075852; Thu, 10 May 2018 07:50:33 GMT Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2hv6m4j8dn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 May 2018 07:50:33 +0000 Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w4A7oM7J006476 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 May 2018 07:50:22 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1fGgLG-0007zD-A5; Thu, 10 May 2018 00:50:22 -0700 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1fGgKg-0007qa-MY for ocfs2-devel@oss.oracle.com; Thu, 10 May 2018 00:49:54 -0700 Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w4A7njux003976 (version=TLSv1/SSLv3 cipher=AES256-SHA256 bits=256 verify=FAIL) for ; Thu, 10 May 2018 07:49:46 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4A7gFFT030009 for ; Thu, 10 May 2018 07:49:45 GMT Received: from prv1-mh.provo.novell.com (prv1-mh.provo.novell.com [137.65.248.33]) by userp2030.oracle.com with ESMTP id 2hvb8exw1f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 10 May 2018 07:49:42 +0000 Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Thu, 10 May 2018 01:49:39 -0600 Message-Id: <5AF3F989020000F90001A482@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.0.0 Date: Thu, 10 May 2018 01:49:29 -0600 From: "Gang He" To: , , , "ge changwei" , "Lei Chen" , References: <20180510053230.17217-1-lchen@suse.com> In-Reply-To: <20180510053230.17217-1-lchen@suse.com> Mime-Version: 1.0 Content-Disposition: inline X-CLX-Shades: MLX X-CLX-Response: 1TFkXGxMTEQpMehcZGhEKWU0XZ2ZyEQpZSRcacRoQGncGGx0TcRsYGRAadwY YGgYaEQpZXhdobnkRCklGF0VYS0lGT3VaWEVOX0leQ0VEGXVPSxEKQ04XQEtgYk1cYFlfc25seh 1TGn9DW1NfQn5fB0dzYUx1fUIRClhcFx8EGgQbGRkHS05PTE4cHxMFGxoEGxoaBB4SBBsQGx4aH xoRCl5ZF3lhcB5TEQpNXBcYHR0RCkxaF3tpQk17EQpFWRdvaxEKTF8XegUFBQUFBQUFBW8RCkxG F2xraxEKQ1oXGxkdBBwfBBgeEgQZGREKQl4XGxEKRF4XHBEKREkXGBEKQkYXYlBQfElAfl9STmI RCkJcFxoRCkJFF21aWB1cQ0NOb0ZjEQpCThdnHmAac0VMYUcBRBEKQkwXa0ljQmVjE1tBU2ERCk JsF2lbTl59SUdbbFlhEQpCQBdgHlteX09FGQVCZREKQlgXYn1veQFPGBlwcHsRCk1eFxsRClpYF xgRCnBnF2d+HRIfTE9IS3lyEB4aEQpwaBdmeHJHcllkZR1pRBAbEhEKcGgXaHptZB55UF1+H28Q GxIRCnBoF2Uca3gcXRpSW1NhEBsSEQpwaBdsXRJiWhJeSWF+HxAZGhEKcGgXbm8bZWVDQBNYU2s QGxIRCnB/F2ZpRE1afhJIU2dMEBsYEhEKcF8XbEtoQHJvGGh7bWAQHRoRCnB9F2RYGXliXF9Pbm sFEB0aEQpwbBdnfEdybX1OeAEeUxAZGhEKbX4XGxEKWE0XSxEg X-PDR: ERROR X-Source-IP: 137.65.248.33 X-ServerName: prv1-mh.provo.novell.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 include:novell.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8888 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=30 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=199 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805100078 X-Spam: Clean Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: ocfs2_inode_lock_tracker does not distinguish lock level X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8888 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805100079 X-Virus-Scanned: ClamAV using ClamSMTP Hello Joseph, Changwei and Jun/Alex, Please help to take a look at this patch, since the previous patch really has vulnerability in logic, although our test cases did not hit it. Thanks Gang >>> Larry Chen 2018/5/10 13:32 >>> ocfs2_inode_lock_tracker as a variant of ocfs2_inode_lock, is used to prevent deadlock due to recursive lock acquisition. But this function does not distinguish whether the requested level is EX or PR. If a RP lock has been attained, this function will immediately return success afterwards even an EX lock is requested. But actually the return value does not mean that the process got a EX lock, because ocfs2_inode_lock has not been called. When taking lock levels into account, we face some different situations. 1. no lock is held In this case, just lock the inode and return 0 2. We are holding a lock For this situation, things diverges into several cases wanted holding what to do ex ex see 2.1 below ex pr see 2.2 below pr ex see 2.1 below pr pr see 2.1 below 2.1 lock level that is been held is compatible with the wanted level, so no lock action will be tacken. 2.2 Otherwise, an upgrade is needed, but it is forbidden. Reason why upgrade within a process is forbidden is that lock upgrade may cause dead lock. The following illustrate how it happens. process 1 process 2 ocfs2_inode_lock_tracker(ex=0) <====== ocfs2_inode_lock_tracker(ex=1) ocfs2_inode_lock_tracker(ex=1) Signed-off-by: Larry Chen Reviewed-by: Gang He --- fs/ocfs2/dlmglue.c | 119 +++++++++++++++++++++++++++++++++++++++-------------- fs/ocfs2/dlmglue.h | 1 + 2 files changed, 90 insertions(+), 30 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 97a972efab83..68728de12864 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -788,35 +788,34 @@ static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, spin_unlock(&lockres->l_lock); } -static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, - struct ocfs2_lock_holder *oh) -{ - spin_lock(&lockres->l_lock); - list_del(&oh->oh_list); - spin_unlock(&lockres->l_lock); - - put_pid(oh->oh_owner_pid); -} - -static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +static struct ocfs2_lock_holder * +ocfs2_pid_holder(struct ocfs2_lock_res *lockres, + struct pid *pid) { struct ocfs2_lock_holder *oh; - struct pid *pid; - /* look in the list of holders for one with the current task as owner */ spin_lock(&lockres->l_lock); - pid = task_pid(current); list_for_each_entry(oh, &lockres->l_holders, oh_list) { if (oh->oh_owner_pid == pid) { spin_unlock(&lockres->l_lock); - return 1; + return oh; } } spin_unlock(&lockres->l_lock); + return NULL; +} - return 0; +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + spin_lock(&lockres->l_lock); + list_del(&oh->oh_list); + spin_unlock(&lockres->l_lock); + + put_pid(oh->oh_owner_pid); } + static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres, int level) { @@ -2610,34 +2609,93 @@ void ocfs2_inode_unlock(struct inode *inode, * * return < 0 on error, return == 0 if there's no lock holder on the stack * before this call, return == 1 if this call would be a recursive locking. + * return == -1 if this lock attempt will cause an upgrade which is forbidden. + * + * When taking lock levels into account,we face some different situations. + * + * 1. no lock is held + * In this case, just lock the inode as requested and return 0 + * + * 2. We are holding a lock + * For this situation, things diverges into several cases + * + * wanted holding what to do + * ex ex see 2.1 below + * ex pr see 2.2 below + * pr ex see 2.1 below + * pr pr see 2.1 below + * + * 2.1 lock level that is been held is compatible + * with the wanted level, so no lock action will be tacken. + * + * 2.2 Otherwise, an upgrade is needed, but it is forbidden. + * + * Reason why upgrade within a process is forbidden is that + * lock upgrade may cause dead lock. The following illustrates + * how it happens. + * + * thread on node1 thread on node2 + * ocfs2_inode_lock_tracker(ex=0) + * + * <====== ocfs2_inode_lock_tracker(ex=1) + * + * ocfs2_inode_lock_tracker(ex=1) */ int ocfs2_inode_lock_tracker(struct inode *inode, struct buffer_head **ret_bh, int ex, struct ocfs2_lock_holder *oh) { - int status; - int arg_flags = 0, has_locked; + int status = 0; struct ocfs2_lock_res *lockres; + struct ocfs2_lock_holder *tmp_oh; + struct pid *pid = task_pid(current); + lockres = &OCFS2_I(inode)->ip_inode_lockres; - has_locked = ocfs2_is_locked_by_me(lockres); - /* Just get buffer head if the cluster lock has been taken */ - if (has_locked) - arg_flags = OCFS2_META_LOCK_GETBH; + tmp_oh = ocfs2_pid_holder(lockres, pid); - if (likely(!has_locked || ret_bh)) { - status = ocfs2_inode_lock_full(inode, ret_bh, ex, arg_flags); + if (!tmp_oh) { + /* + * This corresponds to the case 1. + * We haven't got any lock before. + */ + status = ocfs2_inode_lock_full(inode, ret_bh, ex, 0); if (status < 0) { if (status != -ENOENT) mlog_errno(status); return status; } - } - if (!has_locked) + + oh->oh_ex = ex; ocfs2_add_holder(lockres, oh); + return 0; + } - return has_locked; + if (unlikely(ex && !tmp_oh->oh_ex)) { + /* + * case 2.2 upgrade may cause dead lock, forbid it. + */ + mlog(ML_ERROR, "Recursive locking is not permitted to " + "upgrade to EX level from PR level.\n"); + dump_stack(); + return -EINVAL; + } + + /* + * case 2.1 OCFS2_META_LOCK_GETBH flag make ocfs2_inode_lock_full. + * ignore the lock level and just update it. + */ + if (ret_bh) { + status = ocfs2_inode_lock_full(inode, ret_bh, ex, + OCFS2_META_LOCK_GETBH); + if (status < 0) { + if (status != -ENOENT) + mlog_errno(status); + return status; + } + } + return tmp_oh ? 1 : 0; } void ocfs2_inode_unlock_tracker(struct inode *inode, @@ -2649,12 +2707,13 @@ void ocfs2_inode_unlock_tracker(struct inode *inode, lockres = &OCFS2_I(inode)->ip_inode_lockres; /* had_lock means that the currect process already takes the cluster - * lock previously. If had_lock is 1, we have nothing to do here, and - * it will get unlocked where we got the lock. + * lock previously. + * If had_lock is 1, we have nothing to do here. + * If had_lock is 0, we will release the lock. */ if (!had_lock) { + ocfs2_inode_unlock(inode, oh->oh_ex); ocfs2_remove_holder(lockres, oh); - ocfs2_inode_unlock(inode, ex); } } diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h index 256e0a9067b8..4ec1c828f6e0 100644 --- a/fs/ocfs2/dlmglue.h +++ b/fs/ocfs2/dlmglue.h @@ -96,6 +96,7 @@ struct ocfs2_trim_fs_info { struct ocfs2_lock_holder { struct list_head oh_list; struct pid *oh_owner_pid; + int oh_ex; }; /* ocfs2_inode_lock_full() 'arg_flags' flags */