From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955452
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 74837ECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:17:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244709AbiHZCRC (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:17:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35924 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244036AbiHZCQ7 (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:16:59 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2001:67c:2178:6::1d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8D1ACB5C6;
        Thu, 25 Aug 2022 19:16:55 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id 4F4E7205B5;
        Fri, 26 Aug 2022 02:16:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480214;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=flcbf5eCoYGsPYcUMbhPHTTjfstyi64QPT7HETfyuxs=;
        b=wy/gZVhkTlsEKEZUR1s+RoPDJW63GyDvAf7TyZ0tpXCQjBJ13etQk3h6A/Dqjozj9jdjIf
        EXzMKqYoAUAon5Q01cslO9qBwZGEuEfsr2q6jjseuKbNdITnTS5Kb159YPUm+MxmmdXEyg
        kheyR1KnOAPhwmoUL94vuuCIx9jOlGE=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480214;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=flcbf5eCoYGsPYcUMbhPHTTjfstyi64QPT7HETfyuxs=;
        b=1RlWoEWVyZwFXEQ3/53hLDv0nM/rCErQcGeycJFC8vrfUeO4mcHShCEsAtCeuNOmvZk3g7
        aAz/r0pm72lv5sAg==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7139713A65;
        Fri, 26 Aug 2022 02:16:51 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id qB24ChMtCGNSMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:16:51 +0000
Subject: [PATCH 01/10] VFS: support parallel updates in the one directory.
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984370.25420.13019217727422217511.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

From: NeilBrown <neilb@suse.com>

Some filesystems can support parallel modifications to a directory,
either because the modification happen on a remote server which does its
own locking (e.g.  NFS) or because they can internally lock just a part
of a directory (e.g.  many local filesystems, with a bit of work - the
lustre project has patches for ext4 to support concurrent updates).

To allow this, we introduce VFS support for parallel modification:
unlink (including rmdir) and create.  Parallel rename is added in a
later patch.

If a filesystem supports parallel modification in directories, it sets
FS_PAR_DIR_UPDATE on the file_system_type.  lookup_open() and the new
lookup_hash_update() notice the flag and take a shared lock on the
directory, and rely on a lock-bit in d_flags, much like parallel lookup
relies on DCACHE_PAR_LOOKUP.  The lock bit is always set even if an
exclusive lock was taken, though in that case it will always be
uncontended.

__lookup_hash() is enhanced to use d_alloc_parallel() if it was told
that a shared lock was taken - by being given a non-NULL *wq.  This is
needed as the shared lock does not prevent races with lookups.

The new DCACHE_PAR_UPDATE is the most significant bit in d_flags, so we
have nearly run out of flags.  0x08000000 is still unused ....  for now.

Once a dentry for the target name has been obtained, DCACHE_PAR_UPDATE
is set on it, waiting if necessary if it is already set.  Once this is
set, the thread has exclusive access to the name and can call into the
filesystem to perform the required action.

We use wait_var_event() to wait for DCACHE_PAR_UPDATE to be cleared rather
than trying to overload the wq used for DCACHE_PAR_LOOKUP.

Some filesystems do *not* complete the lookup that precedes a create,
but leave the dentry d_in_lookup() and unhashed, so often a dentry will
have both DCACHE_PAR_LOOKUP and DCACHE_PAR_UPDATE set at the same time.
To allow for this, we need the 'wq' that is passed to d_alloc_parallel()
(and used when DCACHE_PAR_LOOKUP is cleared), to continue to exist until
the creation is complete, which may be after filename_create()
completes.  So a wq now needs to be passed to filename_create() when
parallel modification is attempted.

Some callers, such as kern_path(), perform a lookup for create but don't
pass in a wq and changing all call sites to do so would be churn for
little gain.  So if no wq is provided an exclusive lock is taken and
d_alloc() is used to allocate the dentry.  A new done_path_create_wq()
can be told if a wq was provided for create, so the correct unlock can
be used.

Waiting for DCACHE_PAR_UPDATE can sometimes result in the dentry
changing - either through unlink or rename.  This requires that the
lookup be retried.  This in turn requires that the wq be reused.  The
use of DECLARE_WAITQUEUE() in d_wait_lookup() and the omission of
remove_wait_queue() means that after the wake_up_all() in
__d_lookup_down(), the wait_queue_head list will be undefined.  So we
change d_wait_lookup() to use DEFINE_WAIT(), so that
autoremove_wake_function() is used for wakeups, and the wait_queue_head
remains well defined.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/dcache.c            |   66 ++++++++++++
 fs/namei.c             |  268 ++++++++++++++++++++++++++++++++++++++----------
 include/linux/dcache.h |   28 +++++
 include/linux/fs.h     |    5 +
 include/linux/namei.h  |   16 +++
 5 files changed, 321 insertions(+), 62 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bb0c4d0038db..d6bfa49b143b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1765,6 +1765,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	struct dentry *dentry;
 	char *dname;
 	int err;
+	static struct lock_class_key __key;
 
 	dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru,
 				      GFP_KERNEL);
@@ -1820,6 +1821,8 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_child);
 	d_set_d_op(dentry, dentry->d_sb->s_d_op);
 
+	lockdep_init_map(&dentry->d_update_map, "DENTRY_PAR_UPDATE", &__key, 0);
+
 	if (dentry->d_op && dentry->d_op->d_init) {
 		err = dentry->d_op->d_init(dentry);
 		if (err) {
@@ -2626,7 +2629,7 @@ static inline void end_dir_add(struct inode *dir, unsigned int n,
 static void d_wait_lookup(struct dentry *dentry)
 {
 	if (d_in_lookup(dentry)) {
-		DECLARE_WAITQUEUE(wait, current);
+		DEFINE_WAIT(wait);
 		add_wait_queue(dentry->d_wait, &wait);
 		do {
 			set_current_state(TASK_UNINTERRUPTIBLE);
@@ -3274,6 +3277,67 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+/**
+ * d_lock_update_nested - lock a dentry before updating
+ * @dentry: the dentry to be locked
+ * @base:   the parent, or %NULL
+ * @name:   the name in that parent, or %NULL
+ * @subclass: lockdep locking class.
+ *
+ * Lock a dentry in a directory on which a shared-lock may be held, and
+ * on which parallel updates are permitted.
+ * If the base and name are given, then on success the dentry will still
+ * have that base and name - it will not have raced with rename.
+ * On success, a positive dentry will still be hashed, ensuring there
+ * was no race with unlink.
+ * If they are not given, then on success the dentry will be negative,
+ * which again ensures no race with rename, or unlink.
+ *
+ * @subclass uses the I_MUTEX_ classes.
+ * If the parent directory is locked with I_MUTEX_NORMAL, use I_MUTEX_NORMAL.
+ * If the parent is locked with I_MUTEX_PARENT, I_MUTEX_PARENT2 or
+ * I_MUTEX_CHILD, use I_MUTEX_PARENT or, for the second in a rename,
+ * I_MUTEX_PARENT2.
+ */
+bool d_lock_update_nested(struct dentry *dentry,
+			  struct dentry *base, const struct qstr *name,
+			  int subclass)
+{
+	bool ret = true;
+
+	lock_acquire_exclusive(&dentry->d_update_map, subclass,
+			       0, NULL, _THIS_IP_);
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_flags & DCACHE_PAR_UPDATE)
+		___wait_var_event(&dentry->d_flags,
+				  !(dentry->d_flags & DCACHE_PAR_UPDATE),
+				  TASK_UNINTERRUPTIBLE, 0, 0,
+				  (spin_unlock(&dentry->d_lock),
+				   schedule(),
+				   spin_lock(&dentry->d_lock))
+			);
+	rcu_read_lock(); /* for d_same_name() */
+	if (d_unhashed(dentry) && d_is_positive(dentry)) {
+		/* name was unlinked while we waited */
+		ret = false;
+	} else if (base &&
+		 (dentry->d_parent != base ||
+		  dentry->d_name.hash != name->hash ||
+		  !d_same_name(dentry, base, name))) {
+		/* dentry was renamed - possibly silly-rename */
+		ret = false;
+	} else if (!base && d_is_positive(dentry)) {
+		ret = false;
+	} else {
+		dentry->d_flags |= DCACHE_PAR_UPDATE;
+	}
+	rcu_read_unlock();
+	spin_unlock(&dentry->d_lock);
+	if (!ret)
+		lock_map_release(&dentry->d_update_map);
+	return ret;
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
diff --git a/fs/namei.c b/fs/namei.c
index 53b4bc094db2..c008dfd01e30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1574,14 +1574,14 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 }
 
 /*
- * Parent directory has inode locked exclusive.  This is one
- * and only case when ->lookup() gets called on non in-lookup
- * dentries - as the matter of fact, this only gets called
- * when directory is guaranteed to have no in-lookup children
- * at all.
+ * Parent directory has inode locked exclusive, or possibly shared if wq
+ * is given.  In the later case the fs has explicitly allowed concurrent
+ * updates in this directory.  This is the one and only case
+ * when ->lookup() may be called on a non in-lookup dentry.
  */
 static struct dentry *__lookup_hash(const struct qstr *name,
-		struct dentry *base, unsigned int flags)
+				    struct dentry *base, unsigned int flags,
+				    wait_queue_head_t *wq)
 {
 	struct dentry *dentry = lookup_dcache(name, base, flags);
 	struct dentry *old;
@@ -1594,18 +1594,103 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 	if (unlikely(IS_DEADDIR(dir)))
 		return ERR_PTR(-ENOENT);
 
-	dentry = d_alloc(base, name);
+	if (wq)
+		dentry = d_alloc_parallel(base, name, wq);
+	else
+		dentry = d_alloc(base, name);
 	if (unlikely(!dentry))
 		return ERR_PTR(-ENOMEM);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	if (wq && !d_in_lookup(dentry))
+		/* Must have raced with another thread doing the lookup */
+		return dentry;
 
 	old = dir->i_op->lookup(dir, dentry, flags);
 	if (unlikely(old)) {
+		d_lookup_done(dentry);
 		dput(dentry);
 		dentry = old;
 	}
 	return dentry;
 }
 
+/*
+ * Parent directory (base) is not locked.  We take either an exclusive
+ * or shared lock depending on the fs preference, then do a lookup,
+ * and then set the DCACHE_PAR_UPDATE bit on the child if a shared lock
+ * was taken on the parent.
+ */
+static struct dentry *lookup_hash_update(
+	const struct qstr *name,
+	struct dentry *base, unsigned int flags,
+	wait_queue_head_t *wq)
+{
+	struct dentry *dentry;
+	struct inode *dir = base->d_inode;
+	int err;
+
+	if (wq && IS_PAR_UPDATE(dir))
+		inode_lock_shared_nested(dir, I_MUTEX_PARENT);
+	else
+		inode_lock_nested(dir, I_MUTEX_PARENT);
+
+retry:
+	dentry = __lookup_hash(name, base, flags, wq);
+	if (IS_ERR(dentry)) {
+		err = PTR_ERR(dentry);
+		goto out_err;
+	}
+	if (!d_lock_update_nested(dentry, base, name, I_MUTEX_PARENT)) {
+		/*
+		 * Failed to get lock due to race with unlink or rename
+		 * - try again
+		 */
+		d_lookup_done(dentry);
+		dput(dentry);
+		goto retry;
+	}
+	return dentry;
+
+out_err:
+	if (wq && IS_PAR_UPDATE(dir))
+		inode_unlock_shared(dir);
+	else
+		inode_unlock(dir);
+	return ERR_PTR(err);
+}
+
+static int lookup_one_common(struct user_namespace *mnt_userns,
+			     const char *name, struct dentry *base, int len,
+			     struct qstr *this);
+
+struct dentry *lookup_hash_update_len(const char *name, int nlen,
+				      struct path *path, unsigned int flags,
+				      wait_queue_head_t *wq)
+{
+	struct qstr this;
+	int err = lookup_one_common(mnt_user_ns(path->mnt), name,
+				    path->dentry, nlen, &this);
+	if (err)
+		return ERR_PTR(err);
+	return lookup_hash_update(&this, path->dentry, flags, wq);
+}
+EXPORT_SYMBOL(lookup_hash_update_len);
+
+static void done_path_update(struct path *path, struct dentry *dentry,
+			     bool with_wq)
+{
+	struct inode *dir = path->dentry->d_inode;
+
+	d_lookup_done(dentry);
+	d_unlock_update(dentry);
+	if (IS_PAR_UPDATE(dir) && with_wq)
+		inode_unlock_shared(dir);
+	else
+		inode_unlock(dir);
+}
+
 static struct dentry *lookup_fast(struct nameidata *nd)
 {
 	struct dentry *dentry, *parent = nd->path.dentry;
@@ -2570,7 +2655,7 @@ static struct dentry *__kern_path_locked(struct filename *name, struct path *pat
 		return ERR_PTR(-EINVAL);
 	}
 	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
-	d = __lookup_hash(&last, path->dentry, 0);
+	d = __lookup_hash(&last, path->dentry, 0, NULL);
 	if (IS_ERR(d)) {
 		inode_unlock(path->dentry->d_inode);
 		path_put(path);
@@ -3271,11 +3356,24 @@ static struct dentry *atomic_open(struct nameidata *nd, struct dentry *dentry,
 	if (nd->flags & LOOKUP_DIRECTORY)
 		open_flag |= O_DIRECTORY;
 
+	/*
+	 * dentry is negative or d_in_lookup().  In case this is a
+	 * shared-lock create we need to set DCACHE_PAR_UPDATE to ensure
+	 * exclusion.
+	 */
+	if (open_flag & O_CREAT) {
+		if (!d_lock_update(dentry, NULL, NULL))
+			/* already exists, non-atomic open */
+			return dentry;
+	}
+
 	file->f_path.dentry = DENTRY_NOT_SET;
 	file->f_path.mnt = nd->path.mnt;
 	error = dir->i_op->atomic_open(dir, dentry, file,
 				       open_to_namei_flags(open_flag), mode);
 	d_lookup_done(dentry);
+	if ((open_flag & O_CREAT))
+		d_unlock_update(dentry);
 	if (!error) {
 		if (file->f_mode & FMODE_OPENED) {
 			if (unlikely(dentry != file->f_path.dentry)) {
@@ -3327,6 +3425,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	int error, create_error = 0;
 	umode_t mode = op->mode;
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	bool have_par_update = false;
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -3400,6 +3499,12 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 			dentry = res;
 		}
 	}
+	/*
+	 * If dentry is negative and this is a create we need to get
+	 * DCACHE_PAR_UPDATE.
+	 */
+	if ((open_flag & O_CREAT) && !dentry->d_inode)
+		have_par_update = d_lock_update(dentry, NULL, NULL);
 
 	/* Negative dentry, just create the file */
 	if (!dentry->d_inode && (open_flag & O_CREAT)) {
@@ -3419,9 +3524,13 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		error = create_error;
 		goto out_dput;
 	}
+	if (have_par_update)
+		d_unlock_update(dentry);
 	return dentry;
 
 out_dput:
+	if (have_par_update)
+		d_unlock_update(dentry);
 	dput(dentry);
 	return ERR_PTR(error);
 }
@@ -3434,6 +3543,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 	bool got_write = false;
 	struct dentry *dentry;
 	const char *res;
+	bool shared;
 
 	nd->flags |= op->intent;
 
@@ -3474,14 +3584,15 @@ static const char *open_last_lookups(struct nameidata *nd,
 		 * dropping this one anyway.
 		 */
 	}
-	if (open_flag & O_CREAT)
+	shared = !(open_flag & O_CREAT) || IS_PAR_UPDATE(dir->d_inode);
+	if (!shared)
 		inode_lock(dir->d_inode);
 	else
 		inode_lock_shared(dir->d_inode);
 	dentry = lookup_open(nd, file, op, got_write);
 	if (!IS_ERR(dentry) && (file->f_mode & FMODE_CREATED))
 		fsnotify_create(dir->d_inode, dentry);
-	if (open_flag & O_CREAT)
+	if (!shared)
 		inode_unlock(dir->d_inode);
 	else
 		inode_unlock_shared(dir->d_inode);
@@ -3750,41 +3861,29 @@ struct file *do_file_open_root(const struct path *root,
 	return file;
 }
 
-static struct dentry *filename_create(int dfd, struct filename *name,
-				      struct path *path, unsigned int lookup_flags)
+static struct dentry *filename_create_one(struct qstr *last, struct path *path,
+					  unsigned int lookup_flags,
+					  wait_queue_head_t *wq)
 {
-	struct dentry *dentry = ERR_PTR(-EEXIST);
-	struct qstr last;
+	struct dentry *dentry;
 	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
 	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
 	unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
-	int type;
 	int err2;
 	int error;
 
-	error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
-	if (error)
-		return ERR_PTR(error);
-
-	/*
-	 * Yucky last component or no last component at all?
-	 * (foo/., foo/.., /////)
-	 */
-	if (unlikely(type != LAST_NORM))
-		goto out;
-
 	/* don't fail immediately if it's r/o, at least try to report other errors */
 	err2 = mnt_want_write(path->mnt);
 	/*
 	 * Do the final lookup.  Suppress 'create' if there is a trailing
 	 * '/', and a directory wasn't requested.
 	 */
-	if (last.name[last.len] && !want_dir)
+	if (last->name[last->len] && !want_dir)
 		create_flags = 0;
-	inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
-	dentry = __lookup_hash(&last, path->dentry, reval_flag | create_flags);
+	dentry = lookup_hash_update(last, path->dentry,
+				    reval_flag | create_flags,  wq);
 	if (IS_ERR(dentry))
-		goto unlock;
+		goto drop_write;
 
 	error = -EEXIST;
 	if (d_is_positive(dentry))
@@ -3806,14 +3905,56 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	}
 	return dentry;
 fail:
+	done_path_update(path, dentry, !!wq);
 	dput(dentry);
 	dentry = ERR_PTR(error);
-unlock:
-	inode_unlock(path->dentry->d_inode);
+drop_write:
 	if (!err2)
 		mnt_drop_write(path->mnt);
+	return dentry;
+}
+
+struct dentry *filename_create_one_len(const char *name, int nlen,
+				       struct path *path,
+				       unsigned int lookup_flags,
+				       wait_queue_head_t *wq)
+{
+	struct qstr this;
+	int err;
+
+	err = lookup_one_common(mnt_user_ns(path->mnt), name,
+				path->dentry, nlen, &this);
+	if (err)
+		return ERR_PTR(err);
+	return filename_create_one(&this, path, lookup_flags, wq);
+}
+EXPORT_SYMBOL(filename_create_one_len);
+
+static struct dentry *filename_create(int dfd, struct filename *name,
+				      struct path *path, unsigned int lookup_flags,
+				      wait_queue_head_t *wq)
+{
+	struct dentry *dentry = ERR_PTR(-EEXIST);
+	struct qstr last;
+	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
+	int type;
+	int error;
+
+	error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
+	if (error)
+		return ERR_PTR(error);
+
+	/*
+	 * Yucky last component or no last component at all?
+	 * (foo/., foo/.., /////)
+	 */
+	if (unlikely(type != LAST_NORM))
+		goto out;
+
+	dentry = filename_create_one(&last, path, lookup_flags, wq);
 out:
-	path_put(path);
+	if (IS_ERR(dentry))
+		path_put(path);
 	return dentry;
 }
 
@@ -3821,27 +3962,28 @@ struct dentry *kern_path_create(int dfd, const char *pathname,
 				struct path *path, unsigned int lookup_flags)
 {
 	struct filename *filename = getname_kernel(pathname);
-	struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
+	struct dentry *res = filename_create(dfd, filename, path, lookup_flags,
+					     NULL);
 
 	putname(filename);
 	return res;
 }
 EXPORT_SYMBOL(kern_path_create);
 
-void done_path_create(struct path *path, struct dentry *dentry)
+void done_path_create_wq(struct path *path, struct dentry *dentry, bool with_wq)
 {
+	done_path_update(path, dentry, with_wq);
 	dput(dentry);
-	inode_unlock(path->dentry->d_inode);
 	mnt_drop_write(path->mnt);
 	path_put(path);
 }
-EXPORT_SYMBOL(done_path_create);
+EXPORT_SYMBOL(done_path_create_wq);
 
 inline struct dentry *user_path_create(int dfd, const char __user *pathname,
-				struct path *path, unsigned int lookup_flags)
+				       struct path *path, unsigned int lookup_flags)
 {
 	struct filename *filename = getname(pathname);
-	struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
+	struct dentry *res = filename_create(dfd, filename, path, lookup_flags, NULL);
 
 	putname(filename);
 	return res;
@@ -3921,12 +4063,13 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 	struct path path;
 	int error;
 	unsigned int lookup_flags = 0;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	error = may_mknod(mode);
 	if (error)
 		goto out1;
 retry:
-	dentry = filename_create(dfd, name, &path, lookup_flags);
+	dentry = filename_create(dfd, name, &path, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto out1;
@@ -3954,7 +4097,7 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 			break;
 	}
 out2:
-	done_path_create(&path, dentry);
+	done_path_create_wq(&path, dentry, true);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
@@ -4023,9 +4166,10 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_DIRECTORY;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 retry:
-	dentry = filename_create(dfd, name, &path, lookup_flags);
+	dentry = filename_create(dfd, name, &path, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto out_putname;
@@ -4038,7 +4182,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
 				  mode);
 	}
-	done_path_create(&path, dentry);
+	done_path_create_wq(&path, dentry, true);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
@@ -4122,6 +4266,7 @@ int do_rmdir(int dfd, struct filename *name)
 	struct qstr last;
 	int type;
 	unsigned int lookup_flags = 0;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 retry:
 	error = filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
 	if (error)
@@ -4143,8 +4288,7 @@ int do_rmdir(int dfd, struct filename *name)
 	if (error)
 		goto exit2;
 
-	inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
-	dentry = __lookup_hash(&last, path.dentry, lookup_flags);
+	dentry = lookup_hash_update(&last, path.dentry, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit3;
@@ -4158,9 +4302,9 @@ int do_rmdir(int dfd, struct filename *name)
 	mnt_userns = mnt_user_ns(path.mnt);
 	error = vfs_rmdir(mnt_userns, path.dentry->d_inode, dentry);
 exit4:
+	done_path_update(&path, dentry, true);
 	dput(dentry);
 exit3:
-	inode_unlock(path.dentry->d_inode);
 	mnt_drop_write(path.mnt);
 exit2:
 	path_put(&path);
@@ -4185,13 +4329,14 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
  * @dentry:	victim
  * @delegated_inode: returns victim inode, if the inode is delegated.
  *
- * The caller must hold dir->i_mutex.
+ * The caller must either hold a write-lock on dir->i_rwsem, or
+ * a have atomically set DCACHE_PAR_UPDATE, or both.
  *
  * If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and
  * return a reference to the inode in delegated_inode.  The caller
  * should then break the delegation on that inode and retry.  Because
  * breaking a delegation may take a long time, the caller should drop
- * dir->i_mutex before doing so.
+ * dir->i_rwsem before doing so.
  *
  * Alternatively, a caller may pass NULL for delegated_inode.  This may
  * be appropriate for callers that expect the underlying filesystem not
@@ -4250,9 +4395,11 @@ EXPORT_SYMBOL(vfs_unlink);
 
 /*
  * Make sure that the actual truncation of the file will occur outside its
- * directory's i_mutex.  Truncate can take a long time if there is a lot of
+ * directory's i_rwsem.  Truncate can take a long time if there is a lot of
  * writeout happening, and we don't want to prevent access to the directory
  * while waiting on the I/O.
+ * If the filesystem permits (IS_PAR_UPDATE()), we take a shared lock on the
+ * directory and set DCACHE_PAR_UPDATE to get exclusive access to the dentry.
  */
 int do_unlinkat(int dfd, struct filename *name)
 {
@@ -4264,6 +4411,7 @@ int do_unlinkat(int dfd, struct filename *name)
 	struct inode *inode = NULL;
 	struct inode *delegated_inode = NULL;
 	unsigned int lookup_flags = 0;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 retry:
 	error = filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
 	if (error)
@@ -4276,9 +4424,9 @@ int do_unlinkat(int dfd, struct filename *name)
 	error = mnt_want_write(path.mnt);
 	if (error)
 		goto exit2;
+
 retry_deleg:
-	inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
-	dentry = __lookup_hash(&last, path.dentry, lookup_flags);
+	dentry = lookup_hash_update(&last, path.dentry, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
 		struct user_namespace *mnt_userns;
@@ -4297,9 +4445,9 @@ int do_unlinkat(int dfd, struct filename *name)
 		error = vfs_unlink(mnt_userns, path.dentry->d_inode, dentry,
 				   &delegated_inode);
 exit3:
+		done_path_update(&path, dentry, true);
 		dput(dentry);
 	}
-	inode_unlock(path.dentry->d_inode);
 	if (inode)
 		iput(inode);	/* truncate the inode here */
 	inode = NULL;
@@ -4388,13 +4536,14 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
 	struct dentry *dentry;
 	struct path path;
 	unsigned int lookup_flags = 0;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (IS_ERR(from)) {
 		error = PTR_ERR(from);
 		goto out_putnames;
 	}
 retry:
-	dentry = filename_create(newdfd, to, &path, lookup_flags);
+	dentry = filename_create(newdfd, to, &path, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto out_putnames;
@@ -4407,7 +4556,7 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
 		error = vfs_symlink(mnt_userns, path.dentry->d_inode, dentry,
 				    from->name);
 	}
-	done_path_create(&path, dentry);
+	done_path_create_wq(&path, dentry, true);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
@@ -4536,6 +4685,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 	struct inode *delegated_inode = NULL;
 	int how = 0;
 	int error;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
 		error = -EINVAL;
@@ -4559,7 +4709,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 		goto out_putnames;
 
 	new_dentry = filename_create(newdfd, new, &new_path,
-					(how & LOOKUP_REVAL));
+				     (how & LOOKUP_REVAL), &wq);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto out_putpath;
@@ -4577,7 +4727,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 	error = vfs_link(old_path.dentry, mnt_userns, new_path.dentry->d_inode,
 			 new_dentry, &delegated_inode);
 out_dput:
-	done_path_create(&new_path, new_dentry);
+	done_path_create_wq(&new_path, new_dentry, true);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
 		if (!error) {
@@ -4847,7 +4997,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 retry_deleg:
 	trap = lock_rename(new_path.dentry, old_path.dentry);
 
-	old_dentry = __lookup_hash(&old_last, old_path.dentry, lookup_flags);
+	old_dentry = __lookup_hash(&old_last, old_path.dentry,
+				   lookup_flags, NULL);
 	error = PTR_ERR(old_dentry);
 	if (IS_ERR(old_dentry))
 		goto exit3;
@@ -4855,7 +5006,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	error = -ENOENT;
 	if (d_is_negative(old_dentry))
 		goto exit4;
-	new_dentry = __lookup_hash(&new_last, new_path.dentry, lookup_flags | target_flags);
+	new_dentry = __lookup_hash(&new_last, new_path.dentry,
+				   lookup_flags | target_flags, NULL);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto exit4;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 92c78ed02b54..d71c12907617 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -13,7 +13,9 @@
 #include <linux/rcupdate.h>
 #include <linux/lockref.h>
 #include <linux/stringhash.h>
+#include <linux/sched.h>
 #include <linux/wait.h>
+#include <linux/wait_bit.h>
 
 struct path;
 struct vfsmount;
@@ -96,6 +98,8 @@ struct dentry {
 	unsigned long d_time;		/* used by d_revalidate */
 	void *d_fsdata;			/* fs-specific data */
 
+	/* lockdep tracking of DCACHE_PAR_UPDATE locks */
+	struct lockdep_map		d_update_map;
 	union {
 		struct list_head d_lru;		/* LRU list */
 		wait_queue_head_t *d_wait;	/* in-lookup ones only */
@@ -211,6 +215,7 @@ struct dentry_operations {
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
 #define DCACHE_NORCU			0x40000000 /* No RCU delay for freeing */
+#define DCACHE_PAR_UPDATE		0x80000000 /* Being created or unlinked with shared lock */
 
 extern seqlock_t rename_lock;
 
@@ -598,4 +603,27 @@ struct name_snapshot {
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
 
+bool d_lock_update_nested(struct dentry *dentry,
+			  struct dentry *base, const struct qstr *name,
+			  int subclass);
+static inline bool d_lock_update(struct dentry *dentry,
+				 struct dentry *base, const struct qstr *name)
+{
+	/* 0 is I_MUTEX_NORMAL, but fs.h might not be included */
+	return d_lock_update_nested(dentry, base, name, 0);
+}
+
+static inline void d_unlock_update(struct dentry *dentry)
+{
+	if (IS_ERR_OR_NULL(dentry))
+		return;
+	if (dentry->d_flags & DCACHE_PAR_UPDATE) {
+		lock_map_release(&dentry->d_update_map);
+		spin_lock(&dentry->d_lock);
+		dentry->d_flags &= ~DCACHE_PAR_UPDATE;
+		spin_unlock(&dentry->d_lock);
+		wake_up_var(&dentry->d_flags);
+	}
+}
+
 #endif	/* __LINUX_DCACHE_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..a9a48306806a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2524,12 +2524,13 @@ int sync_inode_metadata(struct inode *inode, int wait);
 struct file_system_type {
 	const char *name;
 	int fs_flags;
-#define FS_REQUIRES_DEV		1 
+#define FS_REQUIRES_DEV		1
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
-#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
+#define FS_ALLOW_IDMAP		32	/* FS has been updated to handle vfs idmappings. */
+#define FS_PAR_DIR_UPDATE	BIT(6)	/* Allow parallel directory updates */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index caeb08a98536..b7a123b489b1 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -61,7 +61,14 @@ extern int kern_path(const char *, unsigned, struct path *);
 
 extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
 extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
-extern void done_path_create(struct path *, struct dentry *);
+extern struct dentry *lookup_hash_update_len(const char *name, int nlen,
+					     struct path *path, unsigned int flags,
+					     wait_queue_head_t *wq);
+extern void done_path_create_wq(struct path *, struct dentry *, bool);
+static inline void done_path_create(struct path *path, struct dentry *dentry)
+{
+	done_path_create_wq(path, dentry, false);
+}
 extern struct dentry *kern_path_locked(const char *, struct path *);
 
 extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
@@ -75,6 +82,13 @@ struct dentry *lookup_one_unlocked(struct user_namespace *mnt_userns,
 struct dentry *lookup_one_positive_unlocked(struct user_namespace *mnt_userns,
 					    const char *name,
 					    struct dentry *base, int len);
+struct dentry *filename_create_one_len(const char *name, int nlen,
+				       struct path *path,
+				       unsigned int lookup_flags,
+				       wait_queue_head_t *wq);
+static inline bool IS_PAR_UPDATE(struct inode *dir) {
+	return !!(dir->i_sb->s_type->fs_flags & FS_PAR_DIR_UPDATE);
+}
 
 extern int follow_down_one(struct path *);
 extern int follow_down(struct path *);

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955453
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7B3C1ECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:17:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244736AbiHZCRW (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:17:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36070 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244739AbiHZCRS (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:17:18 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEAB9CB5D9;
        Thu, 25 Aug 2022 19:17:11 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id 60F5A20781;
        Fri, 26 Aug 2022 02:17:10 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480230;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=IAy5RIlK+BKXr3TtbnQGrDhGvJvfQnV3J6gfvsVI2SI=;
        b=Y6yj7S4oNnQdmlZED4Sv2odlp45fzx1VpVUavURqmBrMe1lS8cFAWL7Vr+fDDacDg86PBP
        xlvopoqEdpGN90wF/FxrX0wXM+JlFXMjdgunwCGmCPt3lUXRcZ1z+75W46z72RDtladjCl
        Xmgy1l2RHbiBbHMnt2HUgbQqm5i7H48=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480230;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=IAy5RIlK+BKXr3TtbnQGrDhGvJvfQnV3J6gfvsVI2SI=;
        b=Ua0xxjneF59o45AsOnrN7VuFr9bOCgaBvcGGtWeZBzHmtOxoOIbXZjZ66XIWEE++e7f+I0
        QZ6LBjQNCDAYrTAg==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4057313A65;
        Fri, 26 Aug 2022 02:16:58 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id J8SJOhotCGNZMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:16:58 +0000
Subject: [PATCH 02/10] VFS: move EEXIST and ENOENT tests into
 lookup_hash_update()
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984373.25420.5973159222199992210.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

Moving common error handling into lookup_hash_update() simplifies
callers.
A future patch will export this functionality to nfsd, and the more code
we put in the interface, the less code will be needed in nfsd.

EEXIST is returned if LOOKUP_EXCL is set and dentry is positivie.
ENOENT is returned if LOOKUP_CREAT is NOT set and dentry is negative.

This involves seting LOOKUP_EXCL in cases where it wasn't before.
In particular, when creating a non-dir named "foo/", we want the
EEXIST error, but don't want to actually create anything.
Some filesystems assume LOOKUP_EXCL implies LOOKUP_CREATE, so ensure it
does when calling ->lookup().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   58 ++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c008dfd01e30..09c2d007814a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1562,7 +1562,13 @@ static struct dentry *lookup_dcache(const struct qstr *name,
 {
 	struct dentry *dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dentry, flags);
+		int error;
+		/* Some filesystems assume EXCL -> CREATE, so make
+		 * sure it does.
+		 */
+		if (!(flags & LOOKUP_CREATE))
+			flags &= ~LOOKUP_EXCL;
+		error = d_revalidate(dentry, flags);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
@@ -1621,6 +1627,8 @@ static struct dentry *__lookup_hash(const struct qstr *name,
  * or shared lock depending on the fs preference, then do a lookup,
  * and then set the DCACHE_PAR_UPDATE bit on the child if a shared lock
  * was taken on the parent.
+ * If LOOKUP_EXCL, name should not already exist, else -EEXIST
+ * If not LOOKUP_CREATE, name should already exist, else -ENOENT
  */
 static struct dentry *lookup_hash_update(
 	const struct qstr *name,
@@ -1651,6 +1659,24 @@ static struct dentry *lookup_hash_update(
 		dput(dentry);
 		goto retry;
 	}
+	if (flags & LOOKUP_EXCL) {
+		if (d_is_positive(dentry)) {
+			d_lookup_done(dentry);
+			d_unlock_update(dentry);
+			dput(dentry);
+			err = -EEXIST;
+			goto out_err;
+		}
+	}
+	if (!(flags & LOOKUP_CREATE)) {
+		if (!dentry->d_inode) {
+			d_lookup_done(dentry);
+			d_unlock_update(dentry);
+			dput(dentry);
+			err = -ENOENT;
+			goto out_err;
+		}
+	}
 	return dentry;
 
 out_err:
@@ -3868,7 +3894,7 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 	struct dentry *dentry;
 	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
 	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
-	unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
+	unsigned int create_flag = LOOKUP_CREATE;
 	int err2;
 	int error;
 
@@ -3879,26 +3905,16 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 	 * '/', and a directory wasn't requested.
 	 */
 	if (last->name[last->len] && !want_dir)
-		create_flags = 0;
+		/* Name was foo/bar/ but not creating a directory, so
+		 * we won't try to create - result will be either -ENOENT
+		 * or -EEXIST.
+		 */
+		create_flag = 0;
 	dentry = lookup_hash_update(last, path->dentry,
-				    reval_flag | create_flags,  wq);
+				    reval_flag | create_flag | LOOKUP_EXCL, wq);
 	if (IS_ERR(dentry))
 		goto drop_write;
 
-	error = -EEXIST;
-	if (d_is_positive(dentry))
-		goto fail;
-
-	/*
-	 * Special case - lookup gave negative, but... we had foo/bar/
-	 * From the vfs_mknod() POV we just have a negative dentry -
-	 * all is fine. Let's be bastards - you had / on the end, you've
-	 * been asking for (non-existent) directory. -ENOENT for you.
-	 */
-	if (unlikely(!create_flags)) {
-		error = -ENOENT;
-		goto fail;
-	}
 	if (unlikely(err2)) {
 		error = err2;
 		goto fail;
@@ -4292,10 +4308,6 @@ int do_rmdir(int dfd, struct filename *name)
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto exit3;
-	if (!dentry->d_inode) {
-		error = -ENOENT;
-		goto exit4;
-	}
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit4;
@@ -4435,8 +4447,6 @@ int do_unlinkat(int dfd, struct filename *name)
 		if (last.name[last.len])
 			goto slashes;
 		inode = dentry->d_inode;
-		if (d_is_negative(dentry))
-			goto slashes;
 		ihold(inode);
 		error = security_path_unlink(&path, dentry);
 		if (error)

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955454
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 726F5ECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:17:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244767AbiHZCRb (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:17:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36240 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244746AbiHZCRX (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:17:23 -0400
Received: from smtp-out1.suse.de (smtp-out1.suse.de
 [IPv6:2001:67c:2178:6::1c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D8F6CB5DB;
        Thu, 25 Aug 2022 19:17:21 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out1.suse.de (Postfix) with ESMTPS id A1F1E2298A;
        Fri, 26 Aug 2022 02:17:19 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480239;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=5oQjdZ4RU13qv7ERg7wUf7fqXZRo2xGMUxfTKvystxA=;
        b=K8TMCl0dzuN7diDZUyyOSvTgk3rs7dP3YNTF/SkXDwRx2jhMqqd1XITc+aMCww1u3YCgO2
        1hX4VSOkpnyo/Q0+Zam4eRai05jEnIZ7ujrI/AQ8CbnMyGgGtpe1/C4qs2hIOgtCQwJn67
        yUz0RKZwyVHtS4oscsZ4CCz8lFGwCtk=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480239;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=5oQjdZ4RU13qv7ERg7wUf7fqXZRo2xGMUxfTKvystxA=;
        b=WulEif6gsEbqX0y/kzjG+IfUQfXjh9oKaMAMkjof9HLt6hnuZiAZf05zE06CrWk+oOtwAC
        EPNS7aEswe1OMWAQ==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 48AD913A65;
        Fri, 26 Aug 2022 02:17:17 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id Pc1dAi0tCGNwMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:17:17 +0000
Subject: [PATCH 03/10] VFS: move want_write checks into lookup_hash_update()
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984374.25420.3477094952897986387.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

mnt_want_write() is always called before lookup_hash_update(), so
we can simplify the code by moving the call into that function.

If lookup_hash_update() succeeds, it now will have claimed the
want_write lock.  If it fails, the want_write lock isn't held.

To allow this, lookup_hash_update() now receives a 'struct path *'
instead of 'struct dentry *'

Note that when creating a name, any error from mnt_want_write() does not
get reported unless there is no other error.  For unlink/rmdir though,
an error from mnt_want_write() is immediately fatal - overriding ENOENT
for example.  This behaviour seems strange, but this patch is careful to
preserve it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   72 ++++++++++++++++++++++++++----------------------------------
 1 file changed, 31 insertions(+), 41 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 09c2d007814a..73c3319a1703 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1632,12 +1632,22 @@ static struct dentry *__lookup_hash(const struct qstr *name,
  */
 static struct dentry *lookup_hash_update(
 	const struct qstr *name,
-	struct dentry *base, unsigned int flags,
+	struct path *path, unsigned int flags,
 	wait_queue_head_t *wq)
 {
 	struct dentry *dentry;
+	struct dentry *base = path->dentry;
 	struct inode *dir = base->d_inode;
-	int err;
+	int err, err2;
+
+	/* For create, don't fail immediately if it's r/o,
+	 * at least try to report other errors.
+	 * For unlink/rmdir where LOOKUP_REVAl is the only
+	 * flag, fail immediately if r/o.
+	 */
+	err2 = mnt_want_write(path->mnt);
+	if (err2 && (flags & ~LOOKUP_REVAL) == 0)
+		return ERR_PTR(err2);
 
 	if (wq && IS_PAR_UPDATE(dir))
 		inode_lock_shared_nested(dir, I_MUTEX_PARENT);
@@ -1677,6 +1687,13 @@ static struct dentry *lookup_hash_update(
 			goto out_err;
 		}
 	}
+	if (err2) {
+		err = err2;
+		d_lookup_done(dentry);
+		d_unlock_update(dentry);
+		dput(dentry);
+		goto out_err;
+	}
 	return dentry;
 
 out_err:
@@ -1684,6 +1701,8 @@ static struct dentry *lookup_hash_update(
 		inode_unlock_shared(dir);
 	else
 		inode_unlock(dir);
+	if (!err2)
+		mnt_drop_write(path->mnt);
 	return ERR_PTR(err);
 }
 
@@ -1700,7 +1719,7 @@ struct dentry *lookup_hash_update_len(const char *name, int nlen,
 				    path->dentry, nlen, &this);
 	if (err)
 		return ERR_PTR(err);
-	return lookup_hash_update(&this, path->dentry, flags, wq);
+	return lookup_hash_update(&this, path, flags, wq);
 }
 EXPORT_SYMBOL(lookup_hash_update_len);
 
@@ -3891,15 +3910,10 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 					  unsigned int lookup_flags,
 					  wait_queue_head_t *wq)
 {
-	struct dentry *dentry;
 	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
 	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
 	unsigned int create_flag = LOOKUP_CREATE;
-	int err2;
-	int error;
 
-	/* don't fail immediately if it's r/o, at least try to report other errors */
-	err2 = mnt_want_write(path->mnt);
 	/*
 	 * Do the final lookup.  Suppress 'create' if there is a trailing
 	 * '/', and a directory wasn't requested.
@@ -3910,24 +3924,9 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 		 * or -EEXIST.
 		 */
 		create_flag = 0;
-	dentry = lookup_hash_update(last, path->dentry,
-				    reval_flag | create_flag | LOOKUP_EXCL, wq);
-	if (IS_ERR(dentry))
-		goto drop_write;
-
-	if (unlikely(err2)) {
-		error = err2;
-		goto fail;
-	}
-	return dentry;
-fail:
-	done_path_update(path, dentry, !!wq);
-	dput(dentry);
-	dentry = ERR_PTR(error);
-drop_write:
-	if (!err2)
-		mnt_drop_write(path->mnt);
-	return dentry;
+	return lookup_hash_update(last, path,
+				  reval_flag | create_flag | LOOKUP_EXCL,
+				  wq);
 }
 
 struct dentry *filename_create_one_len(const char *name, int nlen,
@@ -4300,23 +4299,18 @@ int do_rmdir(int dfd, struct filename *name)
 		goto exit2;
 	}
 
-	error = mnt_want_write(path.mnt);
-	if (error)
-		goto exit2;
-
-	dentry = lookup_hash_update(&last, path.dentry, lookup_flags, &wq);
+	dentry = lookup_hash_update(&last, &path, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
-		goto exit3;
+		goto exit2;
 	error = security_path_rmdir(&path, dentry);
 	if (error)
-		goto exit4;
+		goto exit3;
 	mnt_userns = mnt_user_ns(path.mnt);
 	error = vfs_rmdir(mnt_userns, path.dentry->d_inode, dentry);
-exit4:
+exit3:
 	done_path_update(&path, dentry, true);
 	dput(dentry);
-exit3:
 	mnt_drop_write(path.mnt);
 exit2:
 	path_put(&path);
@@ -4433,12 +4427,8 @@ int do_unlinkat(int dfd, struct filename *name)
 	if (type != LAST_NORM)
 		goto exit2;
 
-	error = mnt_want_write(path.mnt);
-	if (error)
-		goto exit2;
-
 retry_deleg:
-	dentry = lookup_hash_update(&last, path.dentry, lookup_flags, &wq);
+	dentry = lookup_hash_update(&last, &path, lookup_flags, &wq);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
 		struct user_namespace *mnt_userns;
@@ -4457,6 +4447,7 @@ int do_unlinkat(int dfd, struct filename *name)
 exit3:
 		done_path_update(&path, dentry, true);
 		dput(dentry);
+		mnt_drop_write(path.mnt);
 	}
 	if (inode)
 		iput(inode);	/* truncate the inode here */
@@ -4466,7 +4457,6 @@ int do_unlinkat(int dfd, struct filename *name)
 		if (!error)
 			goto retry_deleg;
 	}
-	mnt_drop_write(path.mnt);
 exit2:
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955455
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E817CECAAD4
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:17:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244750AbiHZCRf (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:17:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36412 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244762AbiHZCR3 (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:17:29 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2001:67c:2178:6::1d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 807C9CAC92;
        Thu, 25 Aug 2022 19:17:28 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id 2D466205B5;
        Fri, 26 Aug 2022 02:17:27 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480247;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=zX1J6cQNA1PNLcbJb0A3ip/on9dJ42qAkxah5LchdGc=;
        b=ZMrYY+B+4FbMMl74JGtpR8KBHV10aX6Pl3/aWU4ILDlNGEfxh18N+y9wj6oqgbV7G5SXUi
        jW/w+7+2MtyBCIPHBkQaJCNGRSdFp/AO1diIhOAGmzMACAd0rezrxfpDtM0tbLdP6IYwC7
        1SKuagjidoYpY4ahMfO7JvL7s206Ogg=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480247;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=zX1J6cQNA1PNLcbJb0A3ip/on9dJ42qAkxah5LchdGc=;
        b=repLqx9CgqqKBvMLIzvMWpUVrKAFvZCRNtRMpv8QTE+BahyB8jr3JcSStdoBWP8ZQi4GIo
        QXzlI0Yuxhg8eVDw==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A66F313A65;
        Fri, 26 Aug 2022 02:17:24 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id 4sQ4GDQtCGN8MQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:17:24 +0000
Subject: [PATCH 04/10] VFS: move dput() and mnt_drop_write() into
 done_path_update()
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984374.25420.5286418066446165003.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

All calls to done_path_update() are followed by the same two calls, so
merge them in.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 73c3319a1703..8df09c19f2b0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1734,6 +1734,8 @@ static void done_path_update(struct path *path, struct dentry *dentry,
 		inode_unlock_shared(dir);
 	else
 		inode_unlock(dir);
+	dput(dentry);
+	mnt_drop_write(path->mnt);
 }
 
 static struct dentry *lookup_fast(struct nameidata *nd)
@@ -3988,8 +3990,6 @@ EXPORT_SYMBOL(kern_path_create);
 void done_path_create_wq(struct path *path, struct dentry *dentry, bool with_wq)
 {
 	done_path_update(path, dentry, with_wq);
-	dput(dentry);
-	mnt_drop_write(path->mnt);
 	path_put(path);
 }
 EXPORT_SYMBOL(done_path_create_wq);
@@ -4310,8 +4310,6 @@ int do_rmdir(int dfd, struct filename *name)
 	error = vfs_rmdir(mnt_userns, path.dentry->d_inode, dentry);
 exit3:
 	done_path_update(&path, dentry, true);
-	dput(dentry);
-	mnt_drop_write(path.mnt);
 exit2:
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {
@@ -4446,8 +4444,6 @@ int do_unlinkat(int dfd, struct filename *name)
 				   &delegated_inode);
 exit3:
 		done_path_update(&path, dentry, true);
-		dput(dentry);
-		mnt_drop_write(path.mnt);
 	}
 	if (inode)
 		iput(inode);	/* truncate the inode here */

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955456
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 58D7CECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:17:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244795AbiHZCRv (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:17:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36520 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244753AbiHZCRf (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:17:35 -0400
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC9ECCB5C6;
        Thu, 25 Aug 2022 19:17:34 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out1.suse.de (Postfix) with ESMTPS id 8075D37C48;
        Fri, 26 Aug 2022 02:17:33 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480253;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=7+8viLpougDqoZ3c2irip0hIncJfW0LZqVa6iHVhTNQ=;
        b=tE7Ar2PfLeRF49+BfC+zIylyt66pdi+70EsThmDh3Eh9ym4yC1G2qufHiowPkUbLi/ru+1
        3sQu/zDLA8k9yDwHY3w/UrXWkYUAOiL/ldYI3hR0HWTWzjViFGrerRgd6AVsRa6fWTsm21
        i22kGXZnn0M0l//G7LkRhTtVJbkzMZE=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480253;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=7+8viLpougDqoZ3c2irip0hIncJfW0LZqVa6iHVhTNQ=;
        b=TdJUMPaapoWi/O8DM8cNqlIzQ+Zq6g6x3W9NCoKJZScnStTOwJqEi/sTviwRQ3uz8dhs6v
        Jt7NSgDcyF1WKyBg==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E289413A65;
        Fri, 26 Aug 2022 02:17:30 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id XlaBJjotCGOAMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:17:30 +0000
Subject: [PATCH 05/10] VFS: export done_path_update()
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984375.25420.10074672773757634204.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

This function will be useful to nfsd, so export it.  As most callers
will want "with_wq" to be true, rename the current done_path_update() to
__done_path_update() and add a new done_path_update() which sets with_wq
to true.

done_path_create_wq() is now simple enough to be "static inline"
rather than an explicit export.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |   16 +++++-----------
 include/linux/namei.h |   16 ++++++++++++++--
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8df09c19f2b0..13f8ac9721be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1723,8 +1723,8 @@ struct dentry *lookup_hash_update_len(const char *name, int nlen,
 }
 EXPORT_SYMBOL(lookup_hash_update_len);
 
-static void done_path_update(struct path *path, struct dentry *dentry,
-			     bool with_wq)
+void __done_path_update(struct path *path, struct dentry *dentry,
+			bool with_wq)
 {
 	struct inode *dir = path->dentry->d_inode;
 
@@ -1737,6 +1737,7 @@ static void done_path_update(struct path *path, struct dentry *dentry,
 	dput(dentry);
 	mnt_drop_write(path->mnt);
 }
+EXPORT_SYMBOL(__done_path_update);
 
 static struct dentry *lookup_fast(struct nameidata *nd)
 {
@@ -3987,13 +3988,6 @@ struct dentry *kern_path_create(int dfd, const char *pathname,
 }
 EXPORT_SYMBOL(kern_path_create);
 
-void done_path_create_wq(struct path *path, struct dentry *dentry, bool with_wq)
-{
-	done_path_update(path, dentry, with_wq);
-	path_put(path);
-}
-EXPORT_SYMBOL(done_path_create_wq);
-
 inline struct dentry *user_path_create(int dfd, const char __user *pathname,
 				       struct path *path, unsigned int lookup_flags)
 {
@@ -4309,7 +4303,7 @@ int do_rmdir(int dfd, struct filename *name)
 	mnt_userns = mnt_user_ns(path.mnt);
 	error = vfs_rmdir(mnt_userns, path.dentry->d_inode, dentry);
 exit3:
-	done_path_update(&path, dentry, true);
+	done_path_update(&path, dentry);
 exit2:
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {
@@ -4443,7 +4437,7 @@ int do_unlinkat(int dfd, struct filename *name)
 		error = vfs_unlink(mnt_userns, path.dentry->d_inode, dentry,
 				   &delegated_inode);
 exit3:
-		done_path_update(&path, dentry, true);
+		done_path_update(&path, dentry);
 	}
 	if (inode)
 		iput(inode);	/* truncate the inode here */
diff --git a/include/linux/namei.h b/include/linux/namei.h
index b7a123b489b1..b1a210a51210 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -64,11 +64,23 @@ extern struct dentry *user_path_create(int, const char __user *, struct path *,
 extern struct dentry *lookup_hash_update_len(const char *name, int nlen,
 					     struct path *path, unsigned int flags,
 					     wait_queue_head_t *wq);
-extern void done_path_create_wq(struct path *, struct dentry *, bool);
+extern void __done_path_update(struct path *, struct dentry *, bool);
+static inline void done_path_update(struct path *path, struct dentry *dentry)
+{
+	__done_path_update(path, dentry, true);
+}
 static inline void done_path_create(struct path *path, struct dentry *dentry)
 {
-	done_path_create_wq(path, dentry, false);
+	__done_path_update(path, dentry, false);
+	path_put(path);
 }
+static inline void done_path_create_wq(struct path *path, struct dentry *dentry,
+				       bool with_wq)
+{
+	__done_path_update(path, dentry, with_wq);
+	path_put(path);
+}
+
 extern struct dentry *kern_path_locked(const char *, struct path *);
 
 extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955457
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B7069ECAAA2
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:18:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244758AbiHZCSQ (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:18:16 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36852 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244753AbiHZCSI (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:18:08 -0400
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2541BCC311;
        Thu, 25 Aug 2022 19:18:06 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out1.suse.de (Postfix) with ESMTPS id BBAD337C4A;
        Fri, 26 Aug 2022 02:18:04 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480284;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=ohYWj3ZdjfHRt86tXw/kKwWt3bxrD9pOKcmPMT9kEcY=;
        b=z81KLXkWXjeVBfHywZ5OItrPtCTOdh0H2vDMGa059trs3yytcOa7MZ4qF/0FEck6ub4U4L
        21VmstNx2VMVeUbj543idTCoQiYfqzflz6CyG0D0qu9Yh7nK3kEG+yVeh0PxGSlsZmPyBo
        a+yAhYf+iu04SXCN3I/Cy+2/XgjrY4s=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480284;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=ohYWj3ZdjfHRt86tXw/kKwWt3bxrD9pOKcmPMT9kEcY=;
        b=RBnzKXXcnp+eFTWlUXfN8d1xPIgGW+K+Z/PABCEIwyqKgEGHhZ3Evi+vmE/4Fb8jasls96
        r/8hFVU5rUZmi6DQ==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2627213A65;
        Fri, 26 Aug 2022 02:17:43 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id JNYFNEctCGOSMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:17:43 +0000
Subject: [PATCH 06/10] VFS: support concurrent renames.
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984375.25420.13018600986239729815.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

Allow object can now be renamed from or to a directory in which a create
or unlink is concurrently happening.

Two or more renames with the one directory can also be concurrent.
s_vfs_rename_mutex still serialises lookups for cross-directory renames,
but the renames themselves can proceed concurrently.

A core part of this change is introducing lock_rename_lookup()
which both locks the directories and performs the lookups.
If the filesystem supports shared-lock updates and a wq is provided,
shared locks are used on directories, otherwise exclusive locks.
DCACHE_PAR_UPDATE is always set on the found dentries.

unlock_rename_lookup() performs appropriate unlocking.  It needs to be
told if a wq was provided to lock_rename_lookup().

As we may use alloc_dentry_parallel() which can block, we need to be
careful of the case where both names are the same, in the same
directory.  If the first ->lookup() chooses not to complete the lookup -
as may happen with LOOKUP_RENAME_TARGET - then the second will block.
LOOKUP_RENAME_TARGET is only expected on the first name listed, so we
make sure to lookup the second name given first.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |  221 ++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/namei.h |   10 ++
 2 files changed, 208 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 13f8ac9721be..a7c458cc787c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3156,6 +3156,187 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
 }
 EXPORT_SYMBOL(unlock_rename);
 
+static struct dentry *lock_rename_lookup(struct dentry *p1, struct dentry *p2,
+					 struct dentry **d1p, struct dentry **d2p,
+					 struct qstr *last1, struct qstr *last2,
+					 unsigned int flags1, unsigned int flags2,
+					 wait_queue_head_t *wq)
+{
+	struct dentry *p;
+	struct dentry *d1, *d2;
+	bool ok1, ok2;
+	bool shared = wq && IS_PAR_UPDATE(p1->d_inode);
+
+	if (p1 == p2) {
+		if (shared)
+			inode_lock_shared_nested(p1->d_inode, I_MUTEX_PARENT);
+		else
+			inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
+	retry:
+		/* last1 is expected to be target so and might be looked up
+		 * lazily.  So look up last2 first to avoid the second look up
+		 * waiting for the first.
+		 */
+		d2 = __lookup_hash(last2, p2, flags2, wq);
+		if (IS_ERR(d2))
+			goto out_unlock_2;
+		d1 = __lookup_hash(last1, p1, flags1, wq);
+		if (IS_ERR(d1))
+			goto out_unlock_1;
+		*d1p = d1; *d2p = d2;
+
+		if (d1 < d2) {
+			ok1 = d_lock_update_nested(d1, p1, last1,
+						   I_MUTEX_PARENT);
+			ok2 = d_lock_update_nested(d2, p2, last2,
+						   I_MUTEX_PARENT2);
+		} else if (d1 > d2) {
+			ok2 = d_lock_update_nested(d2, p2, last2,
+						   I_MUTEX_PARENT);
+			ok1 = d_lock_update_nested(d1, p1, last1,
+						   I_MUTEX_PARENT2);
+		} else {
+			/* d1 == d2 !! */
+			ok1 = d_lock_update_nested(d1, p1, last1,
+						   I_MUTEX_PARENT);
+			ok2 = ok1;
+		}
+		if (!ok1 || !ok2) {
+			if (ok1)
+				d_unlock_update(d1);
+			if (ok2)
+				d_unlock_update(d2);
+			dput(d1);
+			dput(d2);
+			goto retry;
+		}
+		return NULL;
+	out_unlock_1:
+		d_lookup_done(d2);
+		dput(d2);
+		d2 = d1;
+	out_unlock_2:
+		if (shared)
+			inode_unlock_shared(p1->d_inode);
+		else
+			inode_unlock(p1->d_inode);
+		return d1;
+	}
+
+	mutex_lock(&p1->d_sb->s_vfs_rename_mutex);
+
+	if ((p = d_ancestor(p2, p1)) != NULL) {
+		if (shared) {
+			inode_lock_shared_nested(p2->d_inode, I_MUTEX_PARENT);
+			inode_lock_shared_nested(p1->d_inode, I_MUTEX_CHILD);
+		} else {
+			inode_lock_nested(p2->d_inode, I_MUTEX_PARENT);
+			inode_lock_nested(p1->d_inode, I_MUTEX_CHILD);
+		}
+	} else if ((p = d_ancestor(p1, p2)) != NULL) {
+		if (shared) {
+			inode_lock_shared_nested(p1->d_inode, I_MUTEX_PARENT);
+			inode_lock_shared_nested(p2->d_inode, I_MUTEX_CHILD);
+		} else {
+			inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
+			inode_lock_nested(p2->d_inode, I_MUTEX_CHILD);
+		}
+	} else {
+		if (shared) {
+			inode_lock_shared_nested(p1->d_inode, I_MUTEX_PARENT);
+			inode_lock_shared_nested(p2->d_inode, I_MUTEX_PARENT2);
+		} else {
+			inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
+			inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
+		}
+	}
+retry2:
+	d1 = __lookup_hash(last1, p1, flags1, wq);
+	if (IS_ERR(d1))
+		goto unlock_out_3;
+	d2 = __lookup_hash(last2, p2, flags2, wq);
+	if (IS_ERR(d2))
+		goto unlock_out_4;
+
+	if (d1 < d2) {
+		ok1 = d_lock_update_nested(d1, p1, last1, I_MUTEX_PARENT);
+		ok2 = d_lock_update_nested(d2, p2, last2, I_MUTEX_PARENT2);
+	} else {
+		ok2 = d_lock_update_nested(d2, p2, last2, I_MUTEX_PARENT);
+		ok1 = d_lock_update_nested(d1, p1, last1, I_MUTEX_PARENT2);
+	}
+	if (!ok1 || !ok2) {
+		if (ok1)
+			d_unlock_update(d1);
+		if (ok2)
+			d_unlock_update(d2);
+		dput(d1);
+		dput(d2);
+		goto retry2;
+	}
+	*d1p = d1;
+	*d2p = d2;
+	return p;
+unlock_out_4:
+	d_lookup_done(d1);
+	dput(d1);
+	d1 = d2;
+unlock_out_3:
+	if (shared) {
+		inode_unlock_shared(p1->d_inode);
+		inode_unlock_shared(p2->d_inode);
+	} else {
+		inode_unlock(p1->d_inode);
+		inode_unlock(p2->d_inode);
+	}
+	mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);
+	return d1;
+}
+
+struct dentry *lock_rename_lookup_one(struct dentry *p1, struct dentry *p2,
+				      struct dentry **d1p, struct dentry **d2p,
+				      const char *name1, int nlen1,
+				      const char *name2, int nlen2,
+				      unsigned int flags1, unsigned int flags2,
+				      wait_queue_head_t *wq)
+{
+	struct qstr this1, this2;
+	int err;
+
+	err = lookup_one_common(&init_user_ns, name1, p1, nlen1, &this1);
+	if (err)
+		return ERR_PTR(err);
+	err = lookup_one_common(&init_user_ns, name2, p2, nlen2, &this2);
+	if (err)
+		return ERR_PTR(err);
+	return lock_rename_lookup(p1, p2, d1p, d2p, &this1, &this2,
+				  flags1, flags2, wq);
+}
+EXPORT_SYMBOL(lock_rename_lookup_one);
+
+void unlock_rename_lookup(struct dentry *p1, struct dentry *p2,
+			  struct dentry *d1, struct dentry *d2,
+			  bool with_wq)
+{
+	bool shared = with_wq && IS_PAR_UPDATE(p1->d_inode);
+	d_lookup_done(d1);
+	d_lookup_done(d2);
+	d_unlock_update(d1);
+	if (d2 != d1)
+		d_unlock_update(d2);
+	if (shared) {
+		inode_unlock_shared(p1->d_inode);
+		if (p1 != p2) {
+			inode_unlock_shared(p2->d_inode);
+			mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);
+		}
+	} else
+		unlock_rename(p1, p2);
+	dput(d1);
+	dput(d2);
+}
+EXPORT_SYMBOL(unlock_rename_lookup);
+
 /**
  * mode_strip_umask - handle vfs umask stripping
  * @dir:	parent directory of the new inode
@@ -4945,6 +5126,7 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
 	bool should_retry = false;
 	int error = -EINVAL;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
 		goto put_names;
@@ -4985,58 +5167,53 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 		goto exit2;
 
 retry_deleg:
-	trap = lock_rename(new_path.dentry, old_path.dentry);
-
-	old_dentry = __lookup_hash(&old_last, old_path.dentry,
-				   lookup_flags, NULL);
-	error = PTR_ERR(old_dentry);
-	if (IS_ERR(old_dentry))
+	trap = lock_rename_lookup(new_path.dentry, old_path.dentry,
+				  &new_dentry, &old_dentry,
+				  &new_last, &old_last,
+				  lookup_flags | target_flags, lookup_flags,
+				  &wq);
+	if (IS_ERR(trap))
 		goto exit3;
 	/* source must exist */
 	error = -ENOENT;
 	if (d_is_negative(old_dentry))
 		goto exit4;
-	new_dentry = __lookup_hash(&new_last, new_path.dentry,
-				   lookup_flags | target_flags, NULL);
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
-		goto exit4;
 	error = -EEXIST;
 	if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
-		goto exit5;
+		goto exit4;
 	if (flags & RENAME_EXCHANGE) {
 		error = -ENOENT;
 		if (d_is_negative(new_dentry))
-			goto exit5;
+			goto exit4;
 
 		if (!d_is_dir(new_dentry)) {
 			error = -ENOTDIR;
 			if (new_last.name[new_last.len])
-				goto exit5;
+				goto exit4;
 		}
 	}
 	/* unless the source is a directory trailing slashes give -ENOTDIR */
 	if (!d_is_dir(old_dentry)) {
 		error = -ENOTDIR;
 		if (old_last.name[old_last.len])
-			goto exit5;
+			goto exit4;
 		if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
-			goto exit5;
+			goto exit4;
 	}
 	/* source should not be ancestor of target */
 	error = -EINVAL;
 	if (old_dentry == trap)
-		goto exit5;
+		goto exit4;
 	/* target should not be an ancestor of source */
 	if (!(flags & RENAME_EXCHANGE))
 		error = -ENOTEMPTY;
 	if (new_dentry == trap)
-		goto exit5;
+		goto exit4;
 
 	error = security_path_rename(&old_path, old_dentry,
 				     &new_path, new_dentry, flags);
 	if (error)
-		goto exit5;
+		goto exit4;
 
 	rd.old_dir	   = old_path.dentry->d_inode;
 	rd.old_dentry	   = old_dentry;
@@ -5047,12 +5224,10 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
 	rd.delegated_inode = &delegated_inode;
 	rd.flags	   = flags;
 	error = vfs_rename(&rd);
-exit5:
-	dput(new_dentry);
 exit4:
-	dput(old_dentry);
+	unlock_rename_lookup(new_path.dentry, old_path.dentry, new_dentry, old_dentry,
+			     true);
 exit3:
-	unlock_rename(new_path.dentry, old_path.dentry);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
 		if (!error)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index b1a210a51210..29756921f69b 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -108,6 +108,16 @@ extern int follow_up(struct path *);
 
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
+extern struct dentry *lock_rename_lookup_one(
+	struct dentry *p1, struct dentry *p2,
+	struct dentry **d1p, struct dentry **d2p,
+	const char *name1, int nlen1,
+	const char *name2, int nlen2,
+	unsigned int flags1, unsigned int flags2,
+	wait_queue_head_t *wq);
+extern void unlock_rename_lookup(struct dentry *p1, struct dentry *p2,
+				 struct dentry *d1, struct dentry *d2,
+				 bool withwq);
 
 extern int __must_check nd_jump_link(struct path *path);
 

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955458
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6F5B9ECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:18:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235364AbiHZCSe (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:18:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37010 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244434AbiHZCS3 (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:18:29 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2775CB5F4;
        Thu, 25 Aug 2022 19:18:26 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id 371EE20890;
        Fri, 26 Aug 2022 02:18:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480305;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=eX7h+oQ/Kzd8hTzgXuaUbKeoxaFf+xWQvUHsGE389GE=;
        b=mqoJ2oie/UllcHJdvpPaYORL608KG8WsiDrCNMNaVkhqP8ELNE+EDHDbQsa3lzki19sahM
        /KO+ww+ZFSi9aRYweLWsOtD1k/chmhtTd82WCKf2mdrdcwr9Hk3Og46xwh1k2oDS+en8uZ
        KEx8CTCz1Cv/mWpZMcczmbATGLIuPwU=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480305;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=eX7h+oQ/Kzd8hTzgXuaUbKeoxaFf+xWQvUHsGE389GE=;
        b=2zM7rPGe8THfQd9Q9vl/0npmgyA90lJtxXnARsi9Xkf8g+ScN03vsfz46cTsoBunTaARPm
        WDvTC4+1nxgm3LDA==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D107313A65;
        Fri, 26 Aug 2022 02:18:22 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id RfFZI24tCGO8MQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:18:22 +0000
Subject: [PATCH 07/10] VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate()
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984376.25420.10279547885121512294.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

->d_revalidate() functions do not, as a rule, block renames (though 9p
does).  Several access the parent of the dentry often, but not always,
taking a reference to be sure it doesn't disappear.
It is in general possible for d_revalidate to race with rename so the
directory the d_revalidate works with may not be the parent of the
dentry by the time the call finishes.

The exact consequence of this will vary between filesystem and may not
be problematic.  However it seems safest to provide exclusion between
d_revalidate and other operations that change the dentry such as rename,
when doing so is inexpensive

The introduction of DCACHE_PAR_UPDATE does make this easy.  d_revalidate
can set this flag, or wait if it is already set.  This ensures there
will be no race, without needing to lock the whole directory.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a7c458cc787c..ef994239fa7c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -852,10 +852,37 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 
 static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
-		return dentry->d_op->d_revalidate(dentry, flags);
-	else
+	int status;
+
+	if (!unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
 		return 1;
+
+	lock_acquire_exclusive(&dentry->d_update_map, I_MUTEX_NORMAL,
+			       0, NULL, _THIS_IP_);
+	spin_lock(&dentry->d_lock);
+	if (dentry->d_flags & DCACHE_PAR_UPDATE) {
+		/* Some other thread owns this dentry */
+		if (flags & LOOKUP_RCU) {
+			spin_unlock(&dentry->d_lock);
+			lock_map_release(&dentry->d_update_map);
+			return -ECHILD;
+		}
+		___wait_var_event(&dentry->d_flags,
+				  !(dentry->d_flags & DCACHE_PAR_UPDATE),
+				  TASK_UNINTERRUPTIBLE, 0, 0,
+				  (spin_unlock(&dentry->d_lock),
+				   schedule(),
+				   spin_lock(&dentry->d_lock))
+			);
+	}
+	dentry->d_flags |= DCACHE_PAR_UPDATE;
+	spin_unlock(&dentry->d_lock);
+
+	status = dentry->d_op->d_revalidate(dentry, flags);
+
+	d_unlock_update(dentry);
+
+	return status;
 }
 
 /**

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955459
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9BF9BECAAD1
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:18:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230259AbiHZCSk (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:18:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37010 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244779AbiHZCSf (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:18:35 -0400
Received: from smtp-out1.suse.de (smtp-out1.suse.de
 [IPv6:2001:67c:2178:6::1c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D1A8CC307;
        Thu, 25 Aug 2022 19:18:33 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out1.suse.de (Postfix) with ESMTPS id 93DB42245B;
        Fri, 26 Aug 2022 02:18:31 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480311;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=e/OQ71GnjI/3L/tS+ak62PjchNjosq9EXd1NsoX0Vxs=;
        b=XSdZOFmDAB/wQMJXXzRPpn3gg+X1cWIaC6KW+ZcEWufeU8cvFw1gJn9f0BH3oe79vLSIrn
        I27fyoTPgwBLe8NQ07B/yRb1sRg30CpEbVoPG5wSaZj3piiHATMxPi6gzyhSx2mVY4OJkA
        7tfT68Xn6xBeT6ozLNldRdLLSCR8gXM=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480311;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=e/OQ71GnjI/3L/tS+ak62PjchNjosq9EXd1NsoX0Vxs=;
        b=TE3w9pqIP5IP3pyMWSv+2HkpMuU9YiizpNInCKv3rl7OZkts84dFoDETzH0h1rD8XaOK5i
        Fw9KysyBSCMxODAg==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BDDA513A65;
        Fri, 26 Aug 2022 02:18:28 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id 0c+sHXQtCGPDMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:18:28 +0000
Subject: [PATCH 08/10] NFSD: allow parallel creates from nfsd
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984376.25420.3784384336816172144.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

Rather than getting write access, locking the directory, and performing
a lookup, use
  filename_create_one_len() for create, or
  lookup_hash_update_len() for delete, or
  lock_rename_lookup_one() for rename
which combine all these steps and handle shared locking for concurrent
updates where appropriate.

Functions which call these need to allocate a workqueue_head on the
stack which will be used if the name does not yet exist in the dcache.
Any other thread that wants to look up the name will wait on this
workqueue_head until the NFS operation completes.

As the directory lock is sometimes a shared lock, fh_fill_pre_attrs()
needs to determin if the lock was exclusive, so that when attributes are
returned to an NFSv4 client, that fact can be reported.

Note that there is only one filesystem that will, in this patch series,
allow shared locks for create/unlink and that is NFS (for re-export).
NFS does not support atomic pre/post attributes, so there is no loss in
not providing them.

When some other filesystem supports concurrent updates, we might need to
consider if the pre/post attributes are more important than the
parallelism.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3proc.c |   28 +++-----
 fs/nfsd/nfs4proc.c |   29 ++++-----
 fs/nfsd/nfsfh.c    |    9 +++
 fs/nfsd/nfsproc.c  |   29 ++++-----
 fs/nfsd/vfs.c      |  177 ++++++++++++++++++++--------------------------------
 5 files changed, 112 insertions(+), 160 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index a41cca619338..05e7907940ea 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -234,13 +234,15 @@ static __be32
 nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  struct svc_fh *resfhp, struct nfsd3_createargs *argp)
 {
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct iattr *iap = &argp->attrs;
-	struct dentry *parent, *child;
 	struct nfsd_attrs attrs = {
 		.na_iattr	= iap,
 	};
 	__u32 v_mtime, v_atime;
+	struct dentry *child;
 	struct inode *inode;
+	struct path path;
 	__be32 status;
 	int host_err;
 
@@ -253,20 +255,15 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (status != nfs_ok)
 		return status;
 
-	parent = fhp->fh_dentry;
-	inode = d_inode(parent);
+	path.dentry = fhp->fh_dentry;
+	path.mnt = fhp->fh_export->ex_path.mnt;
+	inode = d_inode(path.dentry);
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		return nfserrno(host_err);
+	child = filename_create_one_len(argp->name, argp->len,
+					&path, 0, &wq);
 
-	inode_lock_nested(inode, I_MUTEX_PARENT);
-
-	child = lookup_one_len(argp->name, parent, argp->len);
-	if (IS_ERR(child)) {
-		status = nfserrno(PTR_ERR(child));
-		goto out;
-	}
+	if (IS_ERR(child))
+		return nfserrno(PTR_ERR(child));
 
 	if (d_really_is_negative(child)) {
 		status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
@@ -342,10 +339,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
 
 out:
-	inode_unlock(inode);
-	if (child && !IS_ERR(child))
-		dput(child);
-	fh_drop_write(fhp);
+	done_path_update(&path, child);
 	return status;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a72ab97f77ef..78e20b8fb2b4 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -235,16 +235,17 @@ static __be32
 nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  struct svc_fh *resfhp, struct nfsd4_open *open)
 {
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct iattr *iap = &open->op_iattr;
 	struct nfsd_attrs attrs = {
 		.na_iattr	= iap,
 		.na_seclabel	= &open->op_label,
 	};
-	struct dentry *parent, *child;
 	__u32 v_mtime, v_atime;
+	struct dentry *child;
 	struct inode *inode;
+	struct path path;
 	__be32 status;
-	int host_err;
 
 	if (isdotent(open->op_fname, open->op_fnamelen))
 		return nfserr_exist;
@@ -254,23 +255,20 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (status != nfs_ok)
 		return status;
-	parent = fhp->fh_dentry;
-	inode = d_inode(parent);
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		return nfserrno(host_err);
+	path.dentry = fhp->fh_dentry;
+	path.mnt = fhp->fh_export->ex_path.mnt;
 
 	if (is_create_with_attrs(open))
 		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
+	inode = d_inode(path.dentry);
 
-	inode_lock_nested(inode, I_MUTEX_PARENT);
+	child = filename_create_one_len(open->op_fname,
+					open->op_fnamelen,
+					&path, 0, &wq);
 
-	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
-	if (IS_ERR(child)) {
-		status = nfserrno(PTR_ERR(child));
-		goto out;
-	}
+	if (IS_ERR(child))
+		return nfserrno(PTR_ERR(child));
 
 	if (d_really_is_negative(child)) {
 		status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
@@ -375,11 +373,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (attrs.na_aclerr)
 		open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
 out:
-	inode_unlock(inode);
+	done_path_update(&path, child);
 	nfsd_attrs_free(&attrs);
-	if (child && !IS_ERR(child))
-		dput(child);
-	fh_drop_write(fhp);
 	return status;
 }
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index a5b71526cee0..5f024e6649d9 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -11,6 +11,7 @@
 #include <linux/exportfs.h>
 
 #include <linux/sunrpc/svcauth_gss.h>
+#include <linux/namei.h>
 #include "nfsd.h"
 #include "vfs.h"
 #include "auth.h"
@@ -627,6 +628,14 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
 	if (fhp->fh_no_wcc || fhp->fh_pre_saved)
 		return;
 
+	if (!IS_PAR_UPDATE(fhp->fh_dentry->d_inode) &&
+	    inode_trylock_shared(fhp->fh_dentry->d_inode)) {
+		/* only have a shared lock */
+		inode_unlock_shared(fhp->fh_dentry->d_inode);
+		fhp->fh_no_atomic_attr = true;
+		fhp->fh_no_wcc = true;
+	}
+
 	inode = d_inode(fhp->fh_dentry);
 	err = fh_getattr(fhp, &stat);
 	if (err) {
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 7381972f1677..e91aff4949e8 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -260,6 +260,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 {
 	struct nfsd_createargs *argp = rqstp->rq_argp;
 	struct nfsd_diropres *resp = rqstp->rq_resp;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	svc_fh		*dirfhp = &argp->fh;
 	svc_fh		*newfhp = &resp->fh;
 	struct iattr	*attr = &argp->attrs;
@@ -268,8 +269,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	};
 	struct inode	*inode;
 	struct dentry	*dchild;
+	struct path	path;
 	int		type, mode;
-	int		hosterr;
 	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
 
 	dprintk("nfsd: CREATE   %s %.*s\n",
@@ -285,26 +286,21 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	resp->status = nfserr_exist;
 	if (isdotent(argp->name, argp->len))
 		goto done;
-	hosterr = fh_want_write(dirfhp);
-	if (hosterr) {
-		resp->status = nfserrno(hosterr);
-		goto done;
-	}
 
-	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
-	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
+	path.dentry = dirfhp->fh_dentry;
+	path.mnt = dirfhp->fh_export->ex_path.mnt;
+	dchild = filename_create_one_len(argp->name, argp->len, &path, 0, &wq);
 	if (IS_ERR(dchild)) {
 		resp->status = nfserrno(PTR_ERR(dchild));
-		goto out_unlock;
+		goto done;
 	}
 	fh_init(newfhp, NFS_FHSIZE);
 	resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
 	if (!resp->status && d_really_is_negative(dchild))
 		resp->status = nfserr_noent;
-	dput(dchild);
 	if (resp->status) {
 		if (resp->status != nfserr_noent)
-			goto out_unlock;
+			goto out_done;
 		/*
 		 * If the new file handle wasn't verified, we can't tell
 		 * whether the file exists or not. Time to bail ...
@@ -313,7 +309,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		if (!newfhp->fh_dentry) {
 			printk(KERN_WARNING 
 				"nfsd_proc_create: file handle not verified\n");
-			goto out_unlock;
+			goto out_done;
 		}
 	}
 
@@ -347,7 +343,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 								 newfhp->fh_dentry,
 								 NFSD_MAY_WRITE|NFSD_MAY_LOCAL_ACCESS);
 					if (resp->status && resp->status != nfserr_rofs)
-						goto out_unlock;
+						goto out_done;
 				}
 			} else
 				type = S_IFREG;
@@ -384,7 +380,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		/* Make sure the type and device matches */
 		resp->status = nfserr_exist;
 		if (inode && inode_wrong_type(inode, type))
-			goto out_unlock;
+			goto out_done;
 	}
 
 	resp->status = nfs_ok;
@@ -406,9 +402,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 						    (time64_t)0);
 	}
 
-out_unlock:
-	inode_unlock(dirfhp->fh_dentry->d_inode);
-	fh_drop_write(dirfhp);
+out_done:
+	done_path_update(&path, dchild);
 done:
 	fh_put(dirfhp);
 	if (resp->status != nfs_ok)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9f486b788ed0..9b1e633977e0 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1346,7 +1346,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	    char *fname, int flen, struct nfsd_attrs *attrs,
 	    int type, dev_t rdev, struct svc_fh *resfhp)
 {
-	struct dentry	*dentry, *dchild = NULL;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	struct dentry	*dchild = NULL;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1357,33 +1359,24 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err)
 		return err;
 
-	dentry = fhp->fh_dentry;
-
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		return nfserrno(host_err);
+	path.dentry = fhp->fh_dentry;
+	path.mnt = fhp->fh_export->ex_path.mnt;
 
-	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
-	dchild = lookup_one_len(fname, dentry, flen);
+	dchild = filename_create_one_len(fname, flen, &path, 0, &wq);
 	host_err = PTR_ERR(dchild);
 	if (IS_ERR(dchild)) {
 		err = nfserrno(host_err);
-		goto out_unlock;
+		goto out;
 	}
 	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
-	/*
-	 * We unconditionally drop our ref to dchild as fh_compose will have
-	 * already grabbed its own ref for it.
-	 */
-	dput(dchild);
-	if (err)
-		goto out_unlock;
-	fh_fill_pre_attrs(fhp);
-	err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type,
-				 rdev, resfhp);
-	fh_fill_post_attrs(fhp);
-out_unlock:
-	inode_unlock(dentry->d_inode);
+	if (!err) {
+		fh_fill_pre_attrs(fhp);
+		err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type,
+					 rdev, resfhp);
+		fh_fill_post_attrs(fhp);
+	}
+	done_path_update(&path, dchild);
+out:
 	return err;
 }
 
@@ -1442,15 +1435,17 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
 __be32
 nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	     char *fname, int flen,
-	     char *path, struct nfsd_attrs *attrs,
+	     char *lpath, struct nfsd_attrs *attrs,
 	     struct svc_fh *resfhp)
 {
-	struct dentry	*dentry, *dnew;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	struct dentry	*dnew;
+	struct path	path;
 	__be32		err, cerr;
 	int		host_err;
 
 	err = nfserr_noent;
-	if (!flen || path[0] == '\0')
+	if (!flen || lpath[0] == '\0')
 		goto out;
 	err = nfserr_exist;
 	if (isdotent(fname, flen))
@@ -1460,33 +1455,27 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err)
 		goto out;
 
-	host_err = fh_want_write(fhp);
-	if (host_err) {
-		err = nfserrno(host_err);
-		goto out;
-	}
+	path.dentry = fhp->fh_dentry;
+	path.mnt = fhp->fh_export->ex_path.mnt;
 
-	dentry = fhp->fh_dentry;
-	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
-	dnew = lookup_one_len(fname, dentry, flen);
+	dnew = filename_create_one_len(fname, flen, &path, 0, &wq);
 	if (IS_ERR(dnew)) {
 		err = nfserrno(PTR_ERR(dnew));
-		inode_unlock(dentry->d_inode);
-		goto out_drop_write;
+		goto out;
 	}
 	fh_fill_pre_attrs(fhp);
-	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
+	host_err = vfs_symlink(mnt_user_ns(path.mnt), d_inode(path.dentry),
+			       dnew, lpath);
 	err = nfserrno(host_err);
 	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
 	if (!err)
 		nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
 	fh_fill_post_attrs(fhp);
-	inode_unlock(dentry->d_inode);
+	done_path_update(&path, dnew);
 	if (!err)
 		err = nfserrno(commit_metadata(fhp));
-	dput(dnew);
-	if (err==0) err = cerr;
-out_drop_write:
+	if (err==0)
+		err = cerr;
 	fh_drop_write(fhp);
 out:
 	return err;
@@ -1500,8 +1489,10 @@ __be32
 nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 				char *name, int len, struct svc_fh *tfhp)
 {
-	struct dentry	*ddir, *dnew, *dold;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	struct dentry	*dold, *dnew;
 	struct inode	*dirp;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1521,31 +1512,24 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	if (isdotent(name, len))
 		goto out;
 
-	host_err = fh_want_write(tfhp);
-	if (host_err) {
-		err = nfserrno(host_err);
-		goto out;
-	}
-
-	ddir = ffhp->fh_dentry;
-	dirp = d_inode(ddir);
-	inode_lock_nested(dirp, I_MUTEX_PARENT);
+	path.dentry = ffhp->fh_dentry;
+	path.mnt = ffhp->fh_export->ex_path.mnt;
+	dirp = d_inode(path.dentry);
 
-	dnew = lookup_one_len(name, ddir, len);
+	dnew = filename_create_one_len(name, len, &path, 0, &wq);
 	if (IS_ERR(dnew)) {
 		err = nfserrno(PTR_ERR(dnew));
-		goto out_unlock;
+		goto out;
 	}
 
 	dold = tfhp->fh_dentry;
 
 	err = nfserr_noent;
 	if (d_really_is_negative(dold))
-		goto out_dput;
+		goto out_done;
 	fh_fill_pre_attrs(ffhp);
 	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
-	fh_fill_post_attrs(ffhp);
-	inode_unlock(dirp);
+
 	if (!host_err) {
 		err = nfserrno(commit_metadata(ffhp));
 		if (!err)
@@ -1556,17 +1540,11 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 		else
 			err = nfserrno(host_err);
 	}
-	dput(dnew);
-out_drop_write:
-	fh_drop_write(tfhp);
+out_done:
+	fh_fill_post_attrs(ffhp);
+	done_path_update(&path, dnew);
 out:
 	return err;
-
-out_dput:
-	dput(dnew);
-out_unlock:
-	inode_unlock(dirp);
-	goto out_drop_write;
 }
 
 static void
@@ -1602,6 +1580,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	__be32		err;
 	int		host_err;
 	bool		close_cached = false;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
 	if (err)
@@ -1627,40 +1606,36 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		goto out;
 	}
 
-	trap = lock_rename(tdentry, fdentry);
+	trap = lock_rename_lookup_one(tdentry, fdentry, &ndentry, &odentry,
+				      tname, tlen, fname, flen, 0, 0, &wq);
+	host_err = PTR_ERR(trap);
+	if (IS_ERR(trap))
+		goto out_nfserr;
 	fh_fill_pre_attrs(ffhp);
 	fh_fill_pre_attrs(tfhp);
 
-	odentry = lookup_one_len(fname, fdentry, flen);
-	host_err = PTR_ERR(odentry);
-	if (IS_ERR(odentry))
-		goto out_nfserr;
-
 	host_err = -ENOENT;
 	if (d_really_is_negative(odentry))
-		goto out_dput_old;
+		goto out_unlock;
 	host_err = -EINVAL;
 	if (odentry == trap)
-		goto out_dput_old;
+		goto out_unlock;
 
-	ndentry = lookup_one_len(tname, tdentry, tlen);
-	host_err = PTR_ERR(ndentry);
-	if (IS_ERR(ndentry))
-		goto out_dput_old;
 	host_err = -ENOTEMPTY;
 	if (ndentry == trap)
-		goto out_dput_new;
+		goto out_unlock;
 
 	host_err = -EXDEV;
 	if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
-		goto out_dput_new;
+		goto out_unlock;
 	if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
-		goto out_dput_new;
+		goto out_unlock;
 
 	if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
 	    nfsd_has_cached_files(ndentry)) {
 		close_cached = true;
-		goto out_dput_old;
+		dget(ndentry);
+		goto out_unlock;
 	} else {
 		struct renamedata rd = {
 			.old_mnt_userns	= &init_user_ns,
@@ -1677,18 +1652,14 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 				host_err = commit_metadata(ffhp);
 		}
 	}
- out_dput_new:
-	dput(ndentry);
- out_dput_old:
-	dput(odentry);
- out_nfserr:
-	err = nfserrno(host_err);
-
 	if (!close_cached) {
 		fh_fill_post_attrs(ffhp);
 		fh_fill_post_attrs(tfhp);
 	}
-	unlock_rename(tdentry, fdentry);
+ out_unlock:
+	unlock_rename_lookup(tdentry, fdentry, ndentry, odentry, true);
+ out_nfserr:
+	err = nfserrno(host_err);
 	fh_drop_write(ffhp);
 
 	/*
@@ -1715,9 +1686,11 @@ __be32
 nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 				char *fname, int flen)
 {
-	struct dentry	*dentry, *rdentry;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	struct dentry	*rdentry;
 	struct inode	*dirp;
 	struct inode	*rinode;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1728,24 +1701,16 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (err)
 		goto out;
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_nfserr;
+	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.dentry = fhp->fh_dentry;
 
-	dentry = fhp->fh_dentry;
-	dirp = d_inode(dentry);
-	inode_lock_nested(dirp, I_MUTEX_PARENT);
+	rdentry = lookup_hash_update_len(fname, flen, &path, 0, &wq);
+	dirp = d_inode(path.dentry);
 
-	rdentry = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(rdentry);
 	if (IS_ERR(rdentry))
-		goto out_unlock;
+		goto out_nfserr;
 
-	if (d_really_is_negative(rdentry)) {
-		dput(rdentry);
-		host_err = -ENOENT;
-		goto out_unlock;
-	}
 	rinode = d_inode(rdentry);
 	ihold(rinode);
 
@@ -1762,14 +1727,11 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	}
 	fh_fill_post_attrs(fhp);
 
-	inode_unlock(dirp);
+	done_path_update(&path, rdentry);
 	if (!host_err)
 		host_err = commit_metadata(fhp);
-	dput(rdentry);
 	iput(rinode);    /* truncate the inode here */
 
-out_drop_write:
-	fh_drop_write(fhp);
 out_nfserr:
 	if (host_err == -EBUSY) {
 		/* name is mounted-on. There is no perfect
@@ -1784,9 +1746,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	}
 out:
 	return err;
-out_unlock:
-	inode_unlock(dirp);
-	goto out_drop_write;
 }
 
 /*

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955460
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 31188ECAAA2
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:18:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S244806AbiHZCS4 (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:18:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37708 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S244782AbiHZCSk (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:18:40 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2001:67c:2178:6::1d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EAB6CB5FD;
        Thu, 25 Aug 2022 19:18:39 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id BE6E720890;
        Fri, 26 Aug 2022 02:18:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480317;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=Ub7ZQmWhkMwCrPt6XLaYYoLXvYl81gYHBE+E0mqaQY8=;
        b=akHDCDoZIvpZtrkKD5N2+PdNkzbpNDBUFlhF0ssPNMkkY0g5N6tu9oXIiP/uhonLdETaNE
        8n1Up45lp7/dW1S/RUkIXdZd0PBKJ90k+5N6OScf/HTaTmeR+hAp9p0sAAsDjtsFvUVXV2
        Mq/hJJ4Hifx5GeVMPHnEuqoqTDpLzss=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480317;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=Ub7ZQmWhkMwCrPt6XLaYYoLXvYl81gYHBE+E0mqaQY8=;
        b=hQowVta/FFtn3ne1nv7wifMpnj+WbEhlmoT1GTmPtYB+QEYNPoq+vF6k0/2hS2hmbwfsJV
        AXDBSUeZ61NgVEBA==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 59D3413A65;
        Fri, 26 Aug 2022 02:18:35 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id fKxlBnstCGPLMQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:18:35 +0000
Subject: [PATCH 09/10] VFS: add LOOKUP_SILLY_RENAME
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984377.25420.5747334898411663007.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

When performing a "silly rename" to avoid removing a file that is still
open, we need to perform a lookup in a directory that is already locked.

In order to allow common functions to be used for this lookup, introduce
LOOKUP_SILLY_RENAME which affirms that the directory is already locked
and that the vfsmnt is already writable.

When LOOKUP_SILLY_RENAME is set, path->mnt can be NULL.  As
i_op->rename() doesn't make the vfsmnt available, this is unavoidable.
So we ensure that a NULL ->mnt isn't fatal.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/dcache.c           |    3 +-
 fs/namei.c            |   88 +++++++++++++++++++++++++++++--------------------
 include/linux/namei.h |    9 +++--
 3 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index d6bfa49b143b..9bf346a9de52 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3297,7 +3297,8 @@ EXPORT_SYMBOL(d_tmpfile);
  * If the parent directory is locked with I_MUTEX_NORMAL, use I_MUTEX_NORMAL.
  * If the parent is locked with I_MUTEX_PARENT, I_MUTEX_PARENT2 or
  * I_MUTEX_CHILD, use I_MUTEX_PARENT or, for the second in a rename,
- * I_MUTEX_PARENT2.
+ * I_MUTEX_PARENT2.  When a third name is needed, as with "silly-rename"
+ * I_MUTEX_CHILD is used.
  */
 bool d_lock_update_nested(struct dentry *dentry,
 			  struct dentry *base, const struct qstr *name,
diff --git a/fs/namei.c b/fs/namei.c
index ef994239fa7c..c9bbff120bf9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1650,12 +1650,14 @@ static struct dentry *__lookup_hash(const struct qstr *name,
 }
 
 /*
- * Parent directory (base) is not locked.  We take either an exclusive
- * or shared lock depending on the fs preference, then do a lookup,
- * and then set the DCACHE_PAR_UPDATE bit on the child if a shared lock
- * was taken on the parent.
+ * Without LOOKUP_SILLY_RENAME parent directory (base) is not locked.
+ * We take either an exclusive or shared lock depending on the fs
+ * preference, then do a lookup, and then set the DCACHE_PAR_UPDATE bit
+ * on the child if a shared lock was taken on the parent.
  * If LOOKUP_EXCL, name should not already exist, else -EEXIST
  * If not LOOKUP_CREATE, name should already exist, else -ENOENT
+ * If LOOKUP_SILLY_RENAME, don't require write access.  In this case
+ * path->mnt may be NULL.
  */
 static struct dentry *lookup_hash_update(
 	const struct qstr *name,
@@ -1665,21 +1667,26 @@ static struct dentry *lookup_hash_update(
 	struct dentry *dentry;
 	struct dentry *base = path->dentry;
 	struct inode *dir = base->d_inode;
-	int err, err2;
-
-	/* For create, don't fail immediately if it's r/o,
-	 * at least try to report other errors.
-	 * For unlink/rmdir where LOOKUP_REVAl is the only
-	 * flag, fail immediately if r/o.
-	 */
-	err2 = mnt_want_write(path->mnt);
-	if (err2 && (flags & ~LOOKUP_REVAL) == 0)
-		return ERR_PTR(err2);
+	int err, err2 = 0;
+	int class;
+
+	if (!(flags & LOOKUP_SILLY_RENAME)) {
+		/* For create, don't fail immediately if it's r/o,
+		 * at least try to report other errors.
+		 * For unlink/rmdir where LOOKUP_REVAl is the only
+		 * flag, fail immediately if r/o.
+		 */
+		err2 = mnt_want_write(path->mnt);
+		if (err2 && (flags & ~LOOKUP_REVAL) == 0)
+			return ERR_PTR(err2);
 
-	if (wq && IS_PAR_UPDATE(dir))
-		inode_lock_shared_nested(dir, I_MUTEX_PARENT);
-	else
-		inode_lock_nested(dir, I_MUTEX_PARENT);
+		if (wq && IS_PAR_UPDATE(dir))
+			inode_lock_shared_nested(dir, I_MUTEX_PARENT);
+		else
+			inode_lock_nested(dir, I_MUTEX_PARENT);
+		class = I_MUTEX_PARENT;
+	} else
+		class = I_MUTEX_CHILD;
 
 retry:
 	dentry = __lookup_hash(name, base, flags, wq);
@@ -1687,7 +1694,7 @@ static struct dentry *lookup_hash_update(
 		err = PTR_ERR(dentry);
 		goto out_err;
 	}
-	if (!d_lock_update_nested(dentry, base, name, I_MUTEX_PARENT)) {
+	if (!d_lock_update_nested(dentry, base, name, class)) {
 		/*
 		 * Failed to get lock due to race with unlink or rename
 		 * - try again
@@ -1724,12 +1731,14 @@ static struct dentry *lookup_hash_update(
 	return dentry;
 
 out_err:
-	if (wq && IS_PAR_UPDATE(dir))
-		inode_unlock_shared(dir);
-	else
-		inode_unlock(dir);
-	if (!err2)
-		mnt_drop_write(path->mnt);
+	if (!(flags & LOOKUP_SILLY_RENAME)) {
+		if (wq && IS_PAR_UPDATE(dir))
+			inode_unlock_shared(dir);
+		else
+			inode_unlock(dir);
+		if (!err2)
+			mnt_drop_write(path->mnt);
+	}
 	return ERR_PTR(err);
 }
 
@@ -1751,18 +1760,22 @@ struct dentry *lookup_hash_update_len(const char *name, int nlen,
 EXPORT_SYMBOL(lookup_hash_update_len);
 
 void __done_path_update(struct path *path, struct dentry *dentry,
-			bool with_wq)
+			bool with_wq, unsigned int flags)
 {
 	struct inode *dir = path->dentry->d_inode;
 
 	d_lookup_done(dentry);
 	d_unlock_update(dentry);
-	if (IS_PAR_UPDATE(dir) && with_wq)
-		inode_unlock_shared(dir);
-	else
-		inode_unlock(dir);
-	dput(dentry);
-	mnt_drop_write(path->mnt);
+	if (flags & LOOKUP_SILLY_RENAME) {
+		dput(dentry);
+	} else {
+		if (IS_PAR_UPDATE(dir) && with_wq)
+			inode_unlock_shared(dir);
+		else
+			inode_unlock(dir);
+		dput(dentry);
+		mnt_drop_write(path->mnt);
+	}
 }
 EXPORT_SYMBOL(__done_path_update);
 
@@ -4122,7 +4135,8 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 					  wait_queue_head_t *wq)
 {
 	bool want_dir = lookup_flags & LOOKUP_DIRECTORY;
-	unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
+	unsigned int flags = lookup_flags & (LOOKUP_REVAL |
+					     LOOKUP_SILLY_RENAME);
 	unsigned int create_flag = LOOKUP_CREATE;
 
 	/*
@@ -4136,7 +4150,7 @@ static struct dentry *filename_create_one(struct qstr *last, struct path *path,
 		 */
 		create_flag = 0;
 	return lookup_hash_update(last, path,
-				  reval_flag | create_flag | LOOKUP_EXCL,
+				  flags | create_flag | LOOKUP_EXCL,
 				  wq);
 }
 
@@ -4146,10 +4160,12 @@ struct dentry *filename_create_one_len(const char *name, int nlen,
 				       wait_queue_head_t *wq)
 {
 	struct qstr this;
+	struct user_namespace *uns = &init_user_ns;
 	int err;
 
-	err = lookup_one_common(mnt_user_ns(path->mnt), name,
-				path->dentry, nlen, &this);
+	if (path->mnt)
+		uns = mnt_user_ns(path->mnt);
+	err = lookup_one_common(uns, name, path->dentry, nlen, &this);
 	if (err)
 		return ERR_PTR(err);
 	return filename_create_one(&this, path, lookup_flags, wq);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 29756921f69b..92a62b04a83d 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -21,6 +21,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_FOLLOW		0x0001	/* follow links at the end */
 #define LOOKUP_DIRECTORY	0x0002	/* require a directory */
 #define LOOKUP_AUTOMOUNT	0x0004  /* force terminal automount */
+#define LOOKUP_SILLY_RENAME	0x0008	/* Directory already locked, don't lock again */
 #define LOOKUP_EMPTY		0x4000	/* accept empty path [user_... only] */
 #define LOOKUP_DOWN		0x8000	/* follow mounts in the starting point */
 #define LOOKUP_MOUNTPOINT	0x0080	/* follow mounts in the end */
@@ -64,20 +65,20 @@ extern struct dentry *user_path_create(int, const char __user *, struct path *,
 extern struct dentry *lookup_hash_update_len(const char *name, int nlen,
 					     struct path *path, unsigned int flags,
 					     wait_queue_head_t *wq);
-extern void __done_path_update(struct path *, struct dentry *, bool);
+extern void __done_path_update(struct path *, struct dentry *, bool, unsigned int);
 static inline void done_path_update(struct path *path, struct dentry *dentry)
 {
-	__done_path_update(path, dentry, true);
+	__done_path_update(path, dentry, true, 0);
 }
 static inline void done_path_create(struct path *path, struct dentry *dentry)
 {
-	__done_path_update(path, dentry, false);
+	__done_path_update(path, dentry, false, 0);
 	path_put(path);
 }
 static inline void done_path_create_wq(struct path *path, struct dentry *dentry,
 				       bool with_wq)
 {
-	__done_path_update(path, dentry, with_wq);
+	__done_path_update(path, dentry, with_wq, 0);
 	path_put(path);
 }
 

From patchwork Fri Aug 26 02:10:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: NeilBrown <neilb@suse.de>
X-Patchwork-Id: 12955461
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3704AECAAA3
	for <linux-fsdevel@archiver.kernel.org>;
 Fri, 26 Aug 2022 02:19:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243013AbiHZCTk (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 25 Aug 2022 22:19:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39256 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230181AbiHZCTi (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 25 Aug 2022 22:19:38 -0400
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2001:67c:2178:6::1d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72FEBCC33E;
        Thu, 25 Aug 2022 19:19:27 -0700 (PDT)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by smtp-out2.suse.de (Postfix) with ESMTPS id BFAAF20684;
        Fri, 26 Aug 2022 02:19:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
        t=1661480365;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=5652YvS1xa2Qe9hIcsZbOxfW+QuG+ADDYX81GrYMHlk=;
        b=X6xQvAOc2qsbJ9MzbPY5xErByR+Hc8rrq8ln5j1eTMSAruTCjtgxaly7c0cdb6S1bwXVt1
        CGXvoKr3WP/+itGQ6cxMZwG7uWC3Ua9PyiX7V13zlb7ExTwixUfYF+X9HhQl2BvTRf3vwH
        gJ567FdpvwSb8DHYrMr1rxu2RJf4LZo=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
        s=susede2_ed25519; t=1661480365;
        h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
         mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=5652YvS1xa2Qe9hIcsZbOxfW+QuG+ADDYX81GrYMHlk=;
        b=LHTjuaxJ4Z7DaUu11xbnGvvoLrpn49/MxSTlK6Km37JbDRklg9K+0IhpIQV+KVyoQIdXRN
        oN0P/aQGRhkA4DDg==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de
 [192.168.254.74])
        (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
         key-exchange X25519 server-signature ECDSA (P-521) server-digest
 SHA512)
        (No client certificate requested)
        by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9E18813A65;
        Fri, 26 Aug 2022 02:19:22 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
        by imap2.suse-dmz.suse.de with ESMTPSA
        id /lZSF6otCGP7MQAAMHmgww
        (envelope-from <neilb@suse.de>); Fri, 26 Aug 2022 02:19:22 +0000
Subject: [PATCH 10/10] NFS: support parallel updates in the one directory.
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@zeniv.linux.org.uk>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Daire Byrne <daire@dneg.com>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Date: Fri, 26 Aug 2022 12:10:43 +1000
Message-ID: <166147984378.25420.4023980607067991846.stgit@noble.brown>
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
References: <166147828344.25420.13834885828450967910.stgit@noble.brown>
User-Agent: StGit/1.5
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

NFS can easily support parallel updates as the locking is done on the
server, so this patch enables parallel updates for NFS.

Handling of silly-rename - both for unlink and for the rename target -
requires some care as we need to get an exclusive lock on the chosen
silly name and for rename we need to keep the original target name
locked after it has been renamed to the silly name.

So nfs_sillyrename() now uses d_exchange() to swap the target and the
silly name after the silly-rename has happened on the server, and the
silly dentry - which now has the name of the target - is returned.

For unlink(), this is immediately unlocked and discarded with a call to
nfs_sillyrename_finish().  For rename it is kept locked until the
originally requested rename completes.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/dcache.c         |    5 ++++-
 fs/nfs/dir.c        |   28 ++++++++++++++++------------
 fs/nfs/fs_context.c |    6 ++++--
 fs/nfs/internal.h   |    3 ++-
 fs/nfs/unlink.c     |   51 ++++++++++++++++++++++++++++++++++++++-------------
 5 files changed, 64 insertions(+), 29 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9bf346a9de52..a5eaab16d39f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3056,7 +3056,9 @@ void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
 	write_seqlock(&rename_lock);
 
 	WARN_ON(!dentry1->d_inode);
-	WARN_ON(!dentry2->d_inode);
+	/* Allow dentry2 to be negative, so we can do a rename
+	 * but keep both names locked (DCACHE_PAR_UPDATE)
+	 */
 	WARN_ON(IS_ROOT(dentry1));
 	WARN_ON(IS_ROOT(dentry2));
 
@@ -3064,6 +3066,7 @@ void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
 
 	write_sequnlock(&rename_lock);
 }
+EXPORT_SYMBOL(d_exchange);
 
 /**
  * d_ancestor - search for an ancestor
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5d6c2ddc7ea6..fbb608fbe6bf 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1935,8 +1935,12 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 	/*
 	 * If we're doing an exclusive create, optimize away the lookup
 	 * but don't hash the dentry.
+	 * A silly_rename is marked exclusive, but we need to do an
+	 * explicit lookup.
 	 */
-	if (nfs_is_exclusive_create(dir, flags) || flags & LOOKUP_RENAME_TARGET)
+	if ((nfs_is_exclusive_create(dir, flags) ||
+	     flags & LOOKUP_RENAME_TARGET) &&
+	    !(flags & LOOKUP_SILLY_RENAME))
 		return NULL;
 
 	res = ERR_PTR(-ENOMEM);
@@ -2472,10 +2476,14 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
 	spin_lock(&dentry->d_lock);
 	if (d_count(dentry) > 1 && !test_bit(NFS_INO_PRESERVE_UNLINKED,
 					     &NFS_I(d_inode(dentry))->flags)) {
+		struct dentry *silly;
+
 		spin_unlock(&dentry->d_lock);
 		/* Start asynchronous writeout of the inode */
 		write_inode_now(d_inode(dentry), 0);
-		error = nfs_sillyrename(dir, dentry);
+		silly = nfs_sillyrename(dir, dentry);
+		error = PTR_ERR_OR_ZERO(silly);
+		nfs_sillyrename_finish(dir, silly);
 		goto out;
 	}
 	/* We must prevent any concurrent open until the unlink
@@ -2685,16 +2693,12 @@ int nfs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
 
 			spin_unlock(&new_dentry->d_lock);
 
-			/* copy the target dentry's name */
-			dentry = d_alloc(new_dentry->d_parent,
-					 &new_dentry->d_name);
-			if (!dentry)
-				goto out;
-
 			/* silly-rename the existing target ... */
-			err = nfs_sillyrename(new_dir, new_dentry);
-			if (err)
+			dentry = nfs_sillyrename(new_dir, new_dentry);
+			if (IS_ERR(dentry)) {
+				err = PTR_ERR(dentry);
 				goto out;
+			}
 
 			new_dentry = dentry;
 			new_inode = NULL;
@@ -2750,9 +2754,9 @@ int nfs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
 	} else if (error == -ENOENT)
 		nfs_dentry_handle_enoent(old_dentry);
 
-	/* new dentry created? */
 	if (dentry)
-		dput(dentry);
+		nfs_sillyrename_finish(new_dir, dentry);
+
 	return error;
 }
 EXPORT_SYMBOL_GPL(nfs_rename);
diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 4da701fd1424..7133ca9433d2 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -1577,7 +1577,8 @@ struct file_system_type nfs_fs_type = {
 	.init_fs_context	= nfs_init_fs_context,
 	.parameters		= nfs_fs_parameters,
 	.kill_sb		= nfs_kill_super,
-	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
+				  FS_PAR_DIR_UPDATE,
 };
 MODULE_ALIAS_FS("nfs");
 EXPORT_SYMBOL_GPL(nfs_fs_type);
@@ -1589,7 +1590,8 @@ struct file_system_type nfs4_fs_type = {
 	.init_fs_context	= nfs_init_fs_context,
 	.parameters		= nfs_fs_parameters,
 	.kill_sb		= nfs_kill_super,
-	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
+	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
+				  FS_PAR_DIR_UPDATE,
 };
 MODULE_ALIAS_FS("nfs4");
 MODULE_ALIAS("nfs4");
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 27c720d71b4e..3a7fd30a8e29 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -611,7 +611,8 @@ extern struct rpc_task *
 nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
 		 struct dentry *old_dentry, struct dentry *new_dentry,
 		 void (*complete)(struct rpc_task *, struct nfs_renamedata *));
-extern int nfs_sillyrename(struct inode *dir, struct dentry *dentry);
+extern struct dentry *nfs_sillyrename(struct inode *dir, struct dentry *dentry);
+extern void nfs_sillyrename_finish(struct inode *dir, struct dentry *dentry);
 
 /* direct.c */
 void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 9697cd5d2561..c8a718f09fe6 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -428,6 +428,10 @@ nfs_complete_sillyrename(struct rpc_task *task, struct nfs_renamedata *data)
  *
  * The final cleanup is done during dentry_iput.
  *
+ * We exchange the original with the new (silly) dentries, and return
+ * the new dentry which will have the original name.  This ensures that
+ * the target name remains locked until the rename completes.
+ *
  * (Note: NFSv4 is stateful, and has opens, so in theory an NFSv4 server
  * could take responsibility for keeping open files referenced.  The server
  * would also need to ensure that opened-but-deleted files were kept over
@@ -436,7 +440,7 @@ nfs_complete_sillyrename(struct rpc_task *task, struct nfs_renamedata *data)
  * use to advertise that it does this; some day we may take advantage of
  * it.))
  */
-int
+struct dentry *
 nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 {
 	static unsigned int sillycounter;
@@ -445,6 +449,8 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 	struct dentry *sdentry;
 	struct inode *inode = d_inode(dentry);
 	struct rpc_task *task;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	struct path path = {};
 	int            error = -EBUSY;
 
 	dfprintk(VFS, "NFS: silly-rename(%pd2, ct=%d)\n",
@@ -459,10 +465,11 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 
 	fileid = NFS_FILEID(d_inode(dentry));
 
+	path.dentry = d_find_alias(dir);
 	sdentry = NULL;
 	do {
 		int slen;
-		dput(sdentry);
+
 		sillycounter++;
 		slen = scnprintf(silly, sizeof(silly),
 				SILLYNAME_PREFIX "%0*llx%0*x",
@@ -472,14 +479,19 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 		dfprintk(VFS, "NFS: trying to rename %pd to %s\n",
 				dentry, silly);
 
-		sdentry = lookup_one_len(silly, dentry->d_parent, slen);
-		/*
-		 * N.B. Better to return EBUSY here ... it could be
-		 * dangerous to delete the file while it's in use.
-		 */
-		if (IS_ERR(sdentry))
-			goto out;
-	} while (d_inode(sdentry) != NULL); /* need negative lookup */
+		sdentry = filename_create_one_len(silly, slen,
+						  &path,
+						  LOOKUP_CREATE | LOOKUP_EXCL |
+						  LOOKUP_SILLY_RENAME,
+						  &wq);
+	} while (PTR_ERR_OR_ZERO(sdentry) == -EEXIST);
+	dput(path.dentry);
+	/*
+	 * N.B. Better to return EBUSY here ... it could be
+	 * dangerous to delete the file while it's in use.
+	 */
+	if (IS_ERR(sdentry))
+		goto out;
 
 	ihold(inode);
 
@@ -513,7 +525,7 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 						     NFS_INO_INVALID_CTIME |
 						     NFS_INO_REVAL_FORCED);
 		spin_unlock(&inode->i_lock);
-		d_move(dentry, sdentry);
+		d_exchange(dentry, sdentry);
 		break;
 	case -ERESTARTSYS:
 		/* The result of the rename is unknown. Play it safe by
@@ -524,7 +536,20 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
 	rpc_put_task(task);
 out_dput:
 	iput(inode);
-	dput(sdentry);
+	if (!error)
+		return sdentry;
+
+	d_lookup_done(sdentry);
+	__done_path_update(&path, sdentry, true, LOOKUP_SILLY_RENAME);
 out:
-	return error;
+	return ERR_PTR(error);
+}
+
+void nfs_sillyrename_finish(struct inode *dir, struct dentry *dentry)
+{
+	struct path path = { .dentry = d_find_alias(dir) };
+
+	if (!IS_ERR(dentry))
+		__done_path_update(&path, dentry, true, LOOKUP_SILLY_RENAME);
+	dput(path.dentry);
 }