From patchwork Fri Aug 14 04:31:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 7012181 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 074D39F402 for ; Fri, 14 Aug 2015 04:38:44 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CD62520515 for ; Fri, 14 Aug 2015 04:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 95638203B4 for ; Fri, 14 Aug 2015 04:38:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752402AbbHNEik (ORCPT ); Fri, 14 Aug 2015 00:38:40 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:57649 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751399AbbHNEij (ORCPT ); Fri, 14 Aug 2015 00:38:39 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out02.mta.xmission.com with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQ6lL-0005cu-3k; Thu, 13 Aug 2015 22:38:39 -0600 Received: from 67-3-205-173.omah.qwest.net ([67.3.205.173] helo=x220.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ZQ6lI-0006ii-UT; Thu, 13 Aug 2015 22:38:38 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linux Containers Cc: linux-fsdevel@vger.kernel.org, Al Viro , Andy Lutomirski , "Serge E. Hallyn" , Richard Weinberger , Andrey Vagin , Jann Horn , Willy Tarreau , Omar Sandoval , Miklos Szeredi , Linus Torvalds , "J. Bruce Fields" References: <871tncuaf6.fsf@x220.int.ebiederm.org> <87mw5xq7lt.fsf@x220.int.ebiederm.org> <87a8yqou41.fsf_-_@x220.int.ebiederm.org> <874moq9oyb.fsf_-_@x220.int.ebiederm.org> <871tfkawu9.fsf_-_@x220.int.ebiederm.org> <87egjk9i61.fsf_-_@x220.int.ebiederm.org> <20150810043637.GC14139@ZenIV.linux.org.uk> <877foymrwt.fsf@x220.int.ebiederm.org> <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> Date: Thu, 13 Aug 2015 23:31:47 -0500 In-Reply-To: <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> (Eric W. Biederman's message of "Thu, 13 Aug 2015 23:29:23 -0500") Message-ID: <87fv3mjxsc.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 X-XM-AID: U2FsdGVkX1+w2/FJIUQkG72HJiyaptMMSD///zwiv3E= X-SA-Exim-Connect-IP: 67.3.205.173 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Linux Containers X-Spam-Relay-Country: X-Spam-Timing: total 1593 ms - load_scoreonly_sql: 0.10 (0.0%), signal_user_changed: 8 (0.5%), b_tie_ro: 4.4 (0.3%), parse: 2.0 (0.1%), extract_message_metadata: 18 (1.1%), get_uri_detail_list: 5 (0.3%), tests_pri_-1000: 7 (0.4%), tests_pri_-950: 1.47 (0.1%), tests_pri_-900: 1.21 (0.1%), tests_pri_-400: 36 (2.2%), check_bayes: 34 (2.2%), b_tokenize: 13 (0.8%), b_tok_get_all: 11 (0.7%), b_comp_prob: 2.9 (0.2%), b_tok_touch_all: 6 (0.3%), b_finish: 0.76 (0.0%), tests_pri_0: 1503 (94.3%), tests_pri_500: 11 (0.7%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH review 3/8] dcache: Clearly separate the two directory rename cases in d_splice_alias X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There are two scenarios that can result in an alias being found in d_splice_alias. The first scenario is a disconnected dentry being looked up and reconnected (common with nfs file handles, and open_by_handle_at). The second scenario is a lookup on a directory lazily discovering that it was renamed on the remote filesystem. The locking challenge for handling these two scenarios is that the instance of lookup calling d_splice_alias has already taken the dentry->d_parent->d_inode->i_mutex. If the mutex was not held the locking we could just do: rename_mutex = &inode->i_sb->s_vfs_rename_mutex; mutex_lock(rename_mutex); if (d_ancestor(alias, dentry)) { mutex_unlock(rename_mutex); pr_warn_ratelimited(...); return -ELOOP; } m1 = &dentry->d_parent->d_inode->i_mutex; mutex_lock_nested(m1, I_MUTEX_PARENT); if (!IS_ROOT(alias) && (alias->d_parent != dentry->d_parent)) { m2 = &alias->d_parent->d_inode->i_mutex; mutex_lock_nested(m2, I_MUTEX_PARENT2); } d_move(alias, dentry); if (m2) mutex_unlock(m2); mutex_unlock(m1); mutex_unlock(rename_mutex); Which is essentially lock_rename and unlock_reaname with added handling of the IS_ROOT case. The problem is that as the lookup locking stands today grabbing the s_vfs_rename_mutex must use mutex_trylock which can fail, so for reliability reasons we need to avoid using the rename_mutex as much as possible. For the case where a disconnected dentry is being connected (aka the IS_ROOT case) the d_ancestor test can be placed under the rename_lock removing any chance that taking locks will cause a failure. For the lazy rename case things are trickier because when dentry->d_parent and alias->d_parent are not equal the code need to take the s_vfs_rename_mutex, or risk the d_ancestor call in lock_rename being inaccurate. As s_vfs_rename_mutex is take first there are no locking ordering issues with taking alias->d_parent->d_inode->i_mutex. Furthermore as games with rename_lock are unnecessary d_move instead of __d_move can be called. Sleeping in d_splice_alias is not something new as security_d_instantiate is a sleeping function. Compared to the current implementation this change introduces a function for each case __d_rename_alias and __d_connect_alias and moves the acquisition of rename_lock down into those functions. In the case of __d_rename_alias which was __d_unalias this allows the existing lock ordering rules to be followed much more closely removing the need for a trylock when acquiring alias->d_parent->d_inode->i_mutex, and allowing the use of d_move. A common helper that prints the warning message when a loop is detected is factored out into d_alias_is_ancestor so that code does not need to be duplicated in both cases. Signed-off-by: "Eric W. Biederman" --- fs/dcache.c | 111 ++++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 67 insertions(+), 44 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 53b7f1e63beb..c1eece74621f 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2711,41 +2711,77 @@ struct dentry *d_ancestor(struct dentry *p1, struct dentry *p2) return NULL; } +static struct dentry *d_alias_is_ancestor(struct dentry *alias, + struct dentry *dentry) +{ + struct dentry *ancestor = d_ancestor(alias, dentry); + if (ancestor) { + pr_warn_ratelimited( + "VFS: Lookup of '%s' in %s %s would have caused loop\n", + dentry->d_name.name, + dentry->d_sb->s_type->name, + dentry->d_sb->s_id); + } + return ancestor; +} + /* * This helper attempts to cope with remotely renamed directories * * It assumes that the caller is already holding - * dentry->d_parent->d_inode->i_mutex, and rename_lock + * dentry->d_parent->d_inode->i_mutex * * Note: If ever the locking in lock_rename() changes, then please * remember to update this too... */ -static int __d_unalias(struct inode *inode, - struct dentry *dentry, struct dentry *alias) +static int __d_rename_alias(struct dentry *alias, struct dentry *dentry) { - struct mutex *m1 = NULL, *m2 = NULL; - int ret = -ESTALE; + struct mutex *rename_mutex = NULL, *parent_mutex = NULL; - /* If alias and dentry share a parent, then no extra locks required */ - if (alias->d_parent == dentry->d_parent) - goto out_unalias; + /* Holding dentry->d_parent->d_inode->i_mutex guarantees this + * that the equality or inequality alias->d_parent and + * dentry->d_parent remains stable. + */ + if (alias->d_parent != dentry->d_parent) { + /* See lock_rename() */ + rename_mutex = &dentry->d_sb->s_vfs_rename_mutex; + if (!mutex_trylock(rename_mutex)) + return -ESTALE; + + if (d_alias_is_ancestor(alias, dentry)) { + mutex_unlock(rename_mutex); + return -ELOOP; + } + parent_mutex = &alias->d_parent->d_inode->i_mutex; + mutex_lock_nested(parent_mutex, I_MUTEX_PARENT2); + } + d_move(alias, dentry); + if (parent_mutex) { + mutex_unlock(parent_mutex); + mutex_unlock(rename_mutex); + } + return 0; +} - /* See lock_rename() */ - if (!mutex_trylock(&dentry->d_sb->s_vfs_rename_mutex)) - goto out_err; - m1 = &dentry->d_sb->s_vfs_rename_mutex; - if (!mutex_trylock(&alias->d_parent->d_inode->i_mutex)) - goto out_err; - m2 = &alias->d_parent->d_inode->i_mutex; -out_unalias: +/* + * This helper connects disconnected dentries. + * + * It assumes that the caller is already holding + * dentry->d_parent->d_inode->i_mutex + * + */ +static int __d_connect_alias(struct inode *inode, + struct dentry *alias, struct dentry *dentry) +{ + write_seqlock(&rename_lock); + if (unlikely(d_alias_is_ancestor(alias, dentry))) { + write_sequnlock(&rename_lock); + return -ELOOP; + } __d_move(alias, dentry, false); - ret = 0; -out_err: - if (m2) - mutex_unlock(m2); - if (m1) - mutex_unlock(m1); - return ret; + write_sequnlock(&rename_lock); + security_d_instantiate(alias, inode); + return 0; } /** @@ -2786,30 +2822,17 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) if (S_ISDIR(inode->i_mode)) { struct dentry *new = __d_find_any_alias(inode); if (unlikely(new)) { + int err; /* The reference to new ensures it remains an alias */ spin_unlock(&inode->i_lock); - write_seqlock(&rename_lock); - if (unlikely(d_ancestor(new, dentry))) { - write_sequnlock(&rename_lock); + + if (!IS_ROOT(new)) + err = __d_rename_alias(new, dentry); + else + err = __d_connect_alias(inode, new, dentry); + if (err) { dput(new); - new = ERR_PTR(-ELOOP); - pr_warn_ratelimited( - "VFS: Lookup of '%s' in %s %s" - " would have caused loop\n", - dentry->d_name.name, - inode->i_sb->s_type->name, - inode->i_sb->s_id); - } else if (!IS_ROOT(new)) { - int err = __d_unalias(inode, dentry, new); - write_sequnlock(&rename_lock); - if (err) { - dput(new); - new = ERR_PTR(err); - } - } else { - __d_move(new, dentry, false); - write_sequnlock(&rename_lock); - security_d_instantiate(new, inode); + new = ERR_PTR(err); } iput(inode); return new;