From patchwork Fri Nov 10 04:45:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10052371 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9CB1E603FA for ; Fri, 10 Nov 2017 04:46:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EDB12B24F for ; Fri, 10 Nov 2017 04:46:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 838BB2B251; Fri, 10 Nov 2017 04:46:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 107AA2B24F for ; Fri, 10 Nov 2017 04:46:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755888AbdKJEpv (ORCPT ); Thu, 9 Nov 2017 23:45:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:52835 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755858AbdKJEpu (ORCPT ); Thu, 9 Nov 2017 23:45:50 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 2FDACAD3A; Fri, 10 Nov 2017 04:45:49 +0000 (UTC) From: NeilBrown To: Linus Torvalds Date: Fri, 10 Nov 2017 15:45:41 +1100 Cc: Al Viro , linux-fsdevel , Linux Kernel Mailing List Subject: Re: [PATCH 3/3] VFS: close race between getcwd() and d_move() In-Reply-To: References: <151019756744.30101.3832608128627682973.stgit@noble> <151019772763.30101.16040338743875884111.stgit@noble> <8760ajf6al.fsf@notabene.neil.brown.name> Message-ID: <871sl6eo7e.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Nov 09 2017, Linus Torvalds wrote: > On Thu, Nov 9, 2017 at 2:14 PM, NeilBrown wrote: >> On Thu, Nov 09 2017, Linus Torvalds wrote: >>> >>> How nasty would it be to just expand the calls to __d_drop/__d_rehash >>> into __d_move itself, and take both has list locks at the same time >>> (with the usual ordering and checking if it's the same list, of >>> course). >> >> something like this? > > Yes. > > This looks nicer to me. Partly because I hate those "pass flags to > functions that modify their behavior" kinds of patches. I'd rather see > just straight-line unconditional code with some possible duplication. ... > > I also do wonder if we can avoid all the unhash/rehash games entirely > (and avoid the hash list locking) if it turns out that the dentry and > target hash lists are the same. I'm not convinced. I haven't actually tried it, but the matrix of possibilities seems a little large. The source dentry may or may not be hashed (not in the "disconnected IS_ROOT" case), and the target may or may not want to be rehashed afterwards (depending on 'exchange'). We could skip the lock for an exchange if they both had the same hash, but not for a simple move. However your description of what it was that you didn't like gave me an idea - I can take the same approach as my original, but not pass flags around. I quite like how this turned out. Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add ___d_rehash() without the BUG_ON() and call that from __d_rehash? Thanks, NeilBrown From: NeilBrown Date: Fri, 10 Nov 2017 15:20:06 +1100 Subject: [PATCH] VFS: close race between getcwd() and d_move() d_move() will call __d_drop() and then __d_rehash() on the dentry being moved. This creates a small window when the dentry appears to be unhashed. Many tests of d_unhashed() are made under ->d_lock and so are safe from racing with this window, but some aren't. In particular, getcwd() calls d_unlinked() (which calls d_unhashed()) without d_lock protection, so it can race. This races has been seen in practice with lustre, which uses d_move() as part of name lookup. See: https://jira.hpdd.intel.com/browse/LU-9735 It could race with a regular rename(), and result in ENOENT instead of either the 'before' or 'after' name. The race can be demonstrated with a simple program which has two threads, one renaming a directory back and forth while another calls getcwd() within that directory: it should never fail, but does. See: https://patchwork.kernel.org/patch/9455345/ We could fix this race by taking d_lock and rechecking when d_unhashed() reports true. Alternately when can remove the window, which is the approach this patch takes. ___d_drop() is introduce which does *not* clear d_hash.pprev so the dentry still appears to be hashed. __d_drop() calls ___d_drop(), then clears d_hash.pprev. __d_move() now uses ___d_drop() and only clears d_hash.pprev when not rehashing. Signed-off-by: NeilBrown --- fs/dcache.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index f90141387f01..8c83543f5065 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -468,9 +468,11 @@ static void dentry_lru_add(struct dentry *dentry) * d_drop() is used mainly for stuff that wants to invalidate a dentry for some * reason (NFS timeouts or autofs deletes). * - * __d_drop requires dentry->d_lock. + * __d_drop requires dentry->d_lock + * ___d_drop doesn't mark dentry as "unhashed" + * (dentry->d_hash.pprev will be LIST_POISON2, not NULL). */ -void __d_drop(struct dentry *dentry) +static void ___d_drop(struct dentry *dentry) { if (!d_unhashed(dentry)) { struct hlist_bl_head *b; @@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry) hlist_bl_lock(b); __hlist_bl_del(&dentry->d_hash); - dentry->d_hash.pprev = NULL; hlist_bl_unlock(b); /* After this call, in-progress rcu-walk path lookup will fail. */ write_seqcount_invalidate(&dentry->d_seq); } } +void __d_drop(struct dentry *dentry) { + ___d_drop(dentry); + dentry->d_hash.pprev = NULL; +} EXPORT_SYMBOL(__d_drop); void d_drop(struct dentry *dentry) @@ -2381,7 +2386,7 @@ EXPORT_SYMBOL(d_delete); static void __d_rehash(struct dentry *entry) { struct hlist_bl_head *b = d_hash(entry->d_name.hash); - BUG_ON(!d_unhashed(entry)); + hlist_bl_lock(b); hlist_bl_add_head_rcu(&entry->d_hash, b); hlist_bl_unlock(b); @@ -2818,9 +2823,9 @@ static void __d_move(struct dentry *dentry, struct dentry *target, write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED); /* unhash both */ - /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ - __d_drop(dentry); - __d_drop(target); + /* ___d_drop does write_seqcount_barrier, but they're OK to nest. */ + ___d_drop(dentry); + ___d_drop(target); /* Switch the names.. */ if (exchange) @@ -2832,6 +2837,8 @@ static void __d_move(struct dentry *dentry, struct dentry *target, __d_rehash(dentry); if (exchange) __d_rehash(target); + else + target->d_hash.pprev = NULL; /* ... and switch them in the tree */ if (IS_ROOT(dentry)) {