diff mbox series

[02/19] VFS: use global wait-queue table for d_alloc_parallel()

Message ID 20250206054504.2950516-3-neilb@suse.de (mailing list archive)
State New
Headers show
Series RFC: Allow concurrent and async changes in a directory | expand

Commit Message

NeilBrown Feb. 6, 2025, 5:42 a.m. UTC
d_alloc_parallel() currently requires a wait_queue_head to be passed in.
This must have a life time which extends until the lookup is completed.

Future proposed patches will use d_alloc_parallel() for names being
created/unlinked etc.  Some filesystems combine lookup with create
making a longer code path that the wq needs to live for.  If it is still
to be allocated on-stack this can be cumbersome.

This patch replaces the on-stack wqs with a global array of wqs which
are used as needed.  A wq is NOT allocated when a dentry is first
created but only when a second thread attempts to use the same name and
so is forced to wait.  At this moment a wq is chosen using the
least-significant bits on the task's pid and that wq is assigned to
->d_wait.  The ->d_lock is then dropped and the task waits.

When the dentry is finally moved out of "in_lookup" a wake up is only
sent if ->d_wait is not NULL.  This avoids an (uncontended) spin
lock/unlock which saves a couple of atomic operations in a common case.

The wake up passes the dentry that the wake up is for as the "key" and
the waiter will only wake processes waiting on the same key.  This means
that when these global waitqueues are shared (which is inevitable
though unlikely to be frequent), a task will not be woken prematurely.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/afs/dir_silly.c      |  4 +--
 fs/dcache.c             | 69 +++++++++++++++++++++++++++++++++--------
 fs/fuse/readdir.c       |  3 +-
 fs/namei.c              |  6 ++--
 fs/nfs/dir.c            |  6 ++--
 fs/nfs/unlink.c         |  3 +-
 fs/proc/base.c          |  3 +-
 fs/proc/proc_sysctl.c   |  3 +-
 fs/smb/client/readdir.c |  3 +-
 include/linux/dcache.h  |  3 +-
 include/linux/nfs_xdr.h |  1 -
 11 files changed, 67 insertions(+), 37 deletions(-)

Comments

Al Viro Feb. 7, 2025, 7:32 p.m. UTC | #1
1) what's wrong with using middle bits of dentry as index?  What the hell
is that thing about pid for?

2) part in d_add_ci() might be worth a comment re d_lookup_done() coming
for the original dentry, no matter what.

3) the dance with conditional __wake_up() is worth a helper, IMO.
NeilBrown Feb. 10, 2025, 4:58 a.m. UTC | #2
On Sat, 08 Feb 2025, Al Viro wrote:
> 1) what's wrong with using middle bits of dentry as index?  What the hell
> is that thing about pid for?

That does "hell" have to do with it?

All we need here is a random number.  Preferably a cheap random number.
pid is cheap and quite random.
The dentry pointer would be even cheaper (no mem access) providing it
doesn't cost much to get the randomness out.  I considered hash_ptr()
but thought that was more code that it was worth.

Do you have a formula for selecting the "middle" bits in a way that is
expected to still give good randomness?

> 
> 2) part in d_add_ci() might be worth a comment re d_lookup_done() coming
> for the original dentry, no matter what.

I think the previous code deserved explanation more than the new, but
maybe I missed something.
In each case, d_wait_lookup() will wait for the given dentry to no
longer be d_in_lookup() which means waiting for DCACHE_PAR_LOOKUP to be
cleared.  The only place which clears DCACHE_PAR_LOOKUP is
__d_lookup_unhash_wake(). which always wakes the target.
In the previous code it would wake both the non-case-exact dentry and
the case-exact dentry waiters but they would go back to sleep if their
DCACHE_PAR_LOOKUP hadn't been cleared, so no interesting behaviour.
Reusing the wq from one to the other is a sensible simplification, but
not something we need any reminder of once it is no longer needed.

Would sort of comment would you add?

> 
> 3) the dance with conditional __wake_up() is worth a helper, IMO.
> 

I tried to explain that in the commit message bug I agree it deserves to
be in the code too.
I have added:

	/* ->d_wait is only set if some thread is actually waiting.
	 * If we find it is NULL - the common case - then there was no
	 * contention and there are no waiters to be woken.
	 */

and 
	/* Don't set a wait_queue until someone is actually waiting */
before
	new->d_wait = NULL;
in d_alloc_parallel().

Thanks,
NeilBrown
Al Viro Feb. 10, 2025, 5:15 a.m. UTC | #3
On Mon, Feb 10, 2025 at 03:58:02PM +1100, NeilBrown wrote:
> On Sat, 08 Feb 2025, Al Viro wrote:
> > 1) what's wrong with using middle bits of dentry as index?  What the hell
> > is that thing about pid for?
> 
> That does "hell" have to do with it?
> 
> All we need here is a random number.  Preferably a cheap random number.
> pid is cheap and quite random.
> The dentry pointer would be even cheaper (no mem access) providing it
> doesn't cost much to get the randomness out.  I considered hash_ptr()
> but thought that was more code that it was worth.
> 
> Do you have a formula for selecting the "middle" bits in a way that is
> expected to still give good randomness?

((unsigned long) dentry / L1_CACHE_BYTES) % <table size>

Bits just over the cacheline size should have uniform distribution...

> > 2) part in d_add_ci() might be worth a comment re d_lookup_done() coming
> > for the original dentry, no matter what.
> 
> I think the previous code deserved explanation more than the new, but
> maybe I missed something.
> In each case, d_wait_lookup() will wait for the given dentry to no
> longer be d_in_lookup() which means waiting for DCACHE_PAR_LOOKUP to be
> cleared.  The only place which clears DCACHE_PAR_LOOKUP is
> __d_lookup_unhash_wake(). which always wakes the target.
> In the previous code it would wake both the non-case-exact dentry and
> the case-exact dentry waiters but they would go back to sleep if their
> DCACHE_PAR_LOOKUP hadn't been cleared, so no interesting behaviour.
> Reusing the wq from one to the other is a sensible simplification, but
> not something we need any reminder of once it is no longer needed.

It's not just about the wakeups; any in-lookup dentry should be taken
out of in-lookup hash before it gets dropped.
 
> > 3) the dance with conditional __wake_up() is worth a helper, IMO.

I mean an inlined helper function.
NeilBrown Feb. 11, 2025, 11:35 p.m. UTC | #4
On Mon, 10 Feb 2025, Al Viro wrote:
> On Mon, Feb 10, 2025 at 03:58:02PM +1100, NeilBrown wrote:
> > On Sat, 08 Feb 2025, Al Viro wrote:
> > > 1) what's wrong with using middle bits of dentry as index?  What the hell
> > > is that thing about pid for?
> > 
> > That does "hell" have to do with it?
> > 
> > All we need here is a random number.  Preferably a cheap random number.
> > pid is cheap and quite random.
> > The dentry pointer would be even cheaper (no mem access) providing it
> > doesn't cost much to get the randomness out.  I considered hash_ptr()
> > but thought that was more code that it was worth.
> > 
> > Do you have a formula for selecting the "middle" bits in a way that is
> > expected to still give good randomness?
> 
> ((unsigned long) dentry / L1_CACHE_BYTES) % <table size>
> 
> Bits just over the cacheline size should have uniform distribution...

I tested this, doing the calculation on each allocation and counting the
number of times each bucket was hit.
On my test kernel with lockdep enabled the dentry is 328 bytes and
L1_CACHE_BYTES is 64.  So 6 cache lines per dentry and 10 dentries per
4K slab.  The indices created by the above formula were roughly 1 in 6
of available.
The 256 possibilities can be divided into 4 groups of 64 and within each
group there are 10 possible values.: 0 6 12 18 24 30 36 42 48 54

Without lockdep making the dentry extra large, struct dentry is 192
bytes, exactly 3 cache lines.  There are 16 entries per 4K slab.
Now exactly 1/4 of possible indices are used.
For every group of 16 possible indices, only 0, 4, 8, 12 are used.
slabinfo says the object size is 256 which explains some of the spread. 
But ultimately the problem is that addresses are not evenly distributed
inside a single slab.

If I divide by PAGE_SIZE instead of L1_CACHE_BYTES I get every possible
value used but it is far from uniform.
With 40000 allocations we would want about 160 in each slot.
The median I measured is 155 (good) but the range is from 16 to 330
which is nearly +/- 100% of the median.
So that isn't random - but then you weren't suggesting that exactly.

I don't think there is a good case here for selecting bits from the
middle of the dentry address.

If I use hash_ptr(dentry, 8) I get a more uniform distribution.  64000
entries would hope for 250 per bucket.  Median is 248.  Range is 186 to
324 so +/- 25%.

Maybe that is the better choice.


> 
> > > 2) part in d_add_ci() might be worth a comment re d_lookup_done() coming
> > > for the original dentry, no matter what.
> > 
> > I think the previous code deserved explanation more than the new, but
> > maybe I missed something.
> > In each case, d_wait_lookup() will wait for the given dentry to no
> > longer be d_in_lookup() which means waiting for DCACHE_PAR_LOOKUP to be
> > cleared.  The only place which clears DCACHE_PAR_LOOKUP is
> > __d_lookup_unhash_wake(). which always wakes the target.
> > In the previous code it would wake both the non-case-exact dentry and
> > the case-exact dentry waiters but they would go back to sleep if their
> > DCACHE_PAR_LOOKUP hadn't been cleared, so no interesting behaviour.
> > Reusing the wq from one to the other is a sensible simplification, but
> > not something we need any reminder of once it is no longer needed.
> 
> It's not just about the wakeups; any in-lookup dentry should be taken
> out of in-lookup hash before it gets dropped.
>  
> > > 3) the dance with conditional __wake_up() is worth a helper, IMO.
> 
> I mean an inlined helper function.

Yes.. Of course...

Maybe we should put

static inline void wake_up_key(struct wait_queue_head *wq, void *key)
{
	__wake_up(wq, TASK_NORMAL, 0, key);
}

in include/linux/wait.h to avoid the __wake_up() "internal" name, and
then use
	wake_up_key(d_wait, dentry);
in the two places in dcache.c, or did you want something
dcache-specific?
I'm not good at guessing what other people are thinking.

Thanks,
NeilBrown
Al Viro Feb. 12, 2025, 12:25 a.m. UTC | #5
On Wed, Feb 12, 2025 at 10:35:41AM +1100, NeilBrown wrote:

> Without lockdep making the dentry extra large, struct dentry is 192
> bytes, exactly 3 cache lines.  There are 16 entries per 4K slab.
> Now exactly 1/4 of possible indices are used.
> For every group of 16 possible indices, only 0, 4, 8, 12 are used.
> slabinfo says the object size is 256 which explains some of the spread. 

Interesting...

root@cannonball:~# grep -w dentry /proc/slabinfo
dentry            1370665 1410864    192   21    1 : tunables    0    0    0 : slabdata  67184  67184      0

Where does that 256 come from?  The above is on amd64, with 6.1-based debian
kernel and I see the same object size on other boxen (with local configs).

> I don't think there is a good case here for selecting bits from the
> middle of the dentry address.
> 
> If I use hash_ptr(dentry, 8) I get a more uniform distribution.  64000
> entries would hope for 250 per bucket.  Median is 248.  Range is 186 to
> 324 so +/- 25%.
> 
> Maybe that is the better choice.

That's really interesting, considering the implications for m_hash() and mp_hash()
(see fs/namespace.c)...

> > > > 3) the dance with conditional __wake_up() is worth a helper, IMO.
> > 
> > I mean an inlined helper function.
> 
> Yes.. Of course...
> 
> Maybe we should put
> 
> static inline void wake_up_key(struct wait_queue_head *wq, void *key)
> {
> 	__wake_up(wq, TASK_NORMAL, 0, key);
> }
> 
> in include/linux/wait.h to avoid the __wake_up() "internal" name, and
> then use
> 	wake_up_key(d_wait, dentry);
> in the two places in dcache.c, or did you want something
> dcache-specific?

More like
	if (wq)
		__wake_up(wq, TASK_NORMAL, 0, key);
probably...
NeilBrown Feb. 12, 2025, 1:46 a.m. UTC | #6
On Wed, 12 Feb 2025, Al Viro wrote:
> On Wed, Feb 12, 2025 at 10:35:41AM +1100, NeilBrown wrote:
> 
> > Without lockdep making the dentry extra large, struct dentry is 192
> > bytes, exactly 3 cache lines.  There are 16 entries per 4K slab.
> > Now exactly 1/4 of possible indices are used.
> > For every group of 16 possible indices, only 0, 4, 8, 12 are used.
> > slabinfo says the object size is 256 which explains some of the spread. 
> 
> Interesting...
> 
> root@cannonball:~# grep -w dentry /proc/slabinfo
> dentry            1370665 1410864    192   21    1 : tunables    0    0    0 : slabdata  67184  67184      0
> 
> Where does that 256 come from?  The above is on amd64, with 6.1-based debian
> kernel and I see the same object size on other boxen (with local configs).

I found SLUB_DEBUG and redzoning does that.  Disabling the debug brings
done to 192 bytes and 21 per slab which you see.  That is still only 33%
hit rate.

> 
> > I don't think there is a good case here for selecting bits from the
> > middle of the dentry address.
> > 
> > If I use hash_ptr(dentry, 8) I get a more uniform distribution.  64000
> > entries would hope for 250 per bucket.  Median is 248.  Range is 186 to
> > 324 so +/- 25%.
> > 
> > Maybe that is the better choice.
> 
> That's really interesting, considering the implications for m_hash() and mp_hash()
> (see fs/namespace.c)...

Those functions add in the next set of bits as well - effectively mixing
in more bits from the page address.  If I do that the spread is better
but there are still buckets with close to twice the median, though most
are +/- 30%.

> 
> > > > > 3) the dance with conditional __wake_up() is worth a helper, IMO.
> > > 
> > > I mean an inlined helper function.
> > 
> > Yes.. Of course...
> > 
> > Maybe we should put
> > 
> > static inline void wake_up_key(struct wait_queue_head *wq, void *key)
> > {
> > 	__wake_up(wq, TASK_NORMAL, 0, key);
> > }
> > 
> > in include/linux/wait.h to avoid the __wake_up() "internal" name, and
> > then use
> > 	wake_up_key(d_wait, dentry);
> > in the two places in dcache.c, or did you want something
> > dcache-specific?
> 
> More like
> 	if (wq)
> 		__wake_up(wq, TASK_NORMAL, 0, key);
> probably...
> 

Thanks,
NeilBrown
diff mbox series

Patch

diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index a1e581946b93..aa4363a1c6fa 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -239,13 +239,11 @@  int afs_silly_iput(struct dentry *dentry, struct inode *inode)
 	struct dentry *alias;
 	int ret;
 
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-
 	_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
 
 	down_read(&dvnode->rmdir_lock);
 
-	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq);
+	alias = d_alloc_parallel(dentry->d_parent, &dentry->d_name);
 	if (IS_ERR(alias)) {
 		up_read(&dvnode->rmdir_lock);
 		return 0;
diff --git a/fs/dcache.c b/fs/dcache.c
index 96b21a47312e..e49607d00d2d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2095,8 +2095,7 @@  struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		return found;
 	}
 	if (d_in_lookup(dentry)) {
-		found = d_alloc_parallel(dentry->d_parent, name,
-					dentry->d_wait);
+		found = d_alloc_parallel(dentry->d_parent, name);
 		if (IS_ERR(found) || !d_in_lookup(found)) {
 			iput(inode);
 			return found;
@@ -2106,7 +2105,7 @@  struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
 		if (!found) {
 			iput(inode);
 			return ERR_PTR(-ENOMEM);
-		} 
+		}
 	}
 	res = d_splice_alias(inode, found);
 	if (res) {
@@ -2476,30 +2475,70 @@  static inline unsigned start_dir_add(struct inode *dir)
 }
 
 static inline void end_dir_add(struct inode *dir, unsigned int n,
-			       wait_queue_head_t *d_wait)
+			       wait_queue_head_t *d_wait, struct dentry *de)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
 	preempt_enable_nested();
-	wake_up_all(d_wait);
+	if (d_wait)
+		__wake_up(d_wait, TASK_NORMAL, 0, de);
+}
+
+#define	PAR_LOOKUP_WQS	256
+static wait_queue_head_t par_wait_table[PAR_LOOKUP_WQS] __cacheline_aligned;
+
+static int __init par_wait_init(void)
+{
+	int i;
+
+	for (i = 0; i < PAR_LOOKUP_WQS; i++)
+		init_waitqueue_head(&par_wait_table[i]);
+	return 0;
+}
+fs_initcall(par_wait_init);
+
+struct par_wait_key {
+	struct dentry *de;
+	struct wait_queue_entry wqe;
+};
+
+static int d_wait_wake_fn(struct wait_queue_entry *wq_entry,
+			  unsigned mode, int sync, void *key)
+{
+	struct par_wait_key *pwk = container_of(wq_entry,
+						 struct par_wait_key, wqe);
+	if (pwk->de == key)
+		return default_wake_function(wq_entry, mode, sync, key);
+	return 0;
 }
 
 static void d_wait_lookup(struct dentry *dentry)
 {
 	if (d_in_lookup(dentry)) {
-		DECLARE_WAITQUEUE(wait, current);
-		add_wait_queue(dentry->d_wait, &wait);
+		struct par_wait_key wk = {
+			.de = dentry,
+			.wqe = {
+				.private = current,
+				.func = d_wait_wake_fn,
+			},
+		};
+		struct wait_queue_head *wq;
+		if (!dentry->d_wait)
+			dentry->d_wait = &par_wait_table[current->pid %
+							 PAR_LOOKUP_WQS];
+		wq = dentry->d_wait;
+		add_wait_queue(wq, &wk.wqe);
 		do {
 			set_current_state(TASK_UNINTERRUPTIBLE);
 			spin_unlock(&dentry->d_lock);
 			schedule();
 			spin_lock(&dentry->d_lock);
 		} while (d_in_lookup(dentry));
+		remove_wait_queue(wq, &wk.wqe);
 	}
 }
 
 struct dentry *d_alloc_parallel(struct dentry *parent,
-				const struct qstr *name,
-				wait_queue_head_t *wq)
+				const struct qstr *name)
 {
 	unsigned int hash = name->hash;
 	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2596,7 +2635,7 @@  struct dentry *d_alloc_parallel(struct dentry *parent,
 	rcu_read_unlock();
 	/* we can't take ->d_lock here; it's OK, though. */
 	new->d_flags |= DCACHE_PAR_LOOKUP;
-	new->d_wait = wq;
+	new->d_wait = NULL;
 	hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
 	hlist_bl_unlock(b);
 	return new;
@@ -2633,8 +2672,12 @@  static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
 
 void __d_lookup_unhash_wake(struct dentry *dentry)
 {
+	wait_queue_head_t *d_wait;
+
 	spin_lock(&dentry->d_lock);
-	wake_up_all(__d_lookup_unhash(dentry));
+	d_wait = __d_lookup_unhash(dentry);
+	if (d_wait)
+		__wake_up(d_wait, TASK_NORMAL, 0, dentry);
 	spin_unlock(&dentry->d_lock);
 }
 EXPORT_SYMBOL(__d_lookup_unhash_wake);
@@ -2662,7 +2705,7 @@  static inline void __d_add(struct dentry *dentry, struct inode *inode)
 	}
 	__d_rehash(dentry);
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, d_wait, dentry);
 	spin_unlock(&dentry->d_lock);
 	if (inode)
 		spin_unlock(&inode->i_lock);
@@ -2874,7 +2917,7 @@  static void __d_move(struct dentry *dentry, struct dentry *target,
 	write_seqcount_end(&dentry->d_seq);
 
 	if (dir)
-		end_dir_add(dir, n, d_wait);
+		end_dir_add(dir, n, d_wait, target);
 
 	if (dentry->d_parent != old_parent)
 		spin_unlock(&dentry->d_parent->d_lock);
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index 17ce9636a2b1..c6b646a3f1bd 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -160,7 +160,6 @@  static int fuse_direntplus_link(struct file *file,
 	struct inode *dir = d_inode(parent);
 	struct fuse_conn *fc;
 	struct inode *inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (!o->nodeid) {
 		/*
@@ -195,7 +194,7 @@  static int fuse_direntplus_link(struct file *file,
 	dentry = d_lookup(parent, &name);
 	if (!dentry) {
 retry:
-		dentry = d_alloc_parallel(parent, &name, &wq);
+		dentry = d_alloc_parallel(parent, &name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 	}
diff --git a/fs/namei.c b/fs/namei.c
index d98caf36e867..5cdbd2eb4056 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1769,13 +1769,12 @@  static struct dentry *__lookup_slow(const struct qstr *name,
 {
 	struct dentry *dentry, *old;
 	struct inode *inode = dir->d_inode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	/* Don't go there if it's already dead */
 	if (unlikely(IS_DEADDIR(inode)))
 		return ERR_PTR(-ENOENT);
 again:
-	dentry = d_alloc_parallel(dir, name, &wq);
+	dentry = d_alloc_parallel(dir, name);
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
@@ -3561,7 +3560,6 @@  static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	struct dentry *dentry;
 	int error, create_error = 0;
 	umode_t mode = op->mode;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 
 	if (unlikely(IS_DEADDIR(dir_inode)))
 		return ERR_PTR(-ENOENT);
@@ -3570,7 +3568,7 @@  static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	dentry = d_lookup(dir, &nd->last);
 	for (;;) {
 		if (!dentry) {
-			dentry = d_alloc_parallel(dir, &nd->last, &wq);
+			dentry = d_alloc_parallel(dir, &nd->last);
 			if (IS_ERR(dentry))
 				return dentry;
 		}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2b04038b0e40..27c7a5c4e91b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -725,7 +725,6 @@  void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		unsigned long dir_verifier)
 {
 	struct qstr filename = QSTR_INIT(entry->name, entry->len);
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct dentry *dentry;
 	struct dentry *alias;
 	struct inode *inode;
@@ -754,7 +753,7 @@  void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 	dentry = d_lookup(parent, &filename);
 again:
 	if (!dentry) {
-		dentry = d_alloc_parallel(parent, &filename, &wq);
+		dentry = d_alloc_parallel(parent, &filename);
 		if (IS_ERR(dentry))
 			return;
 	}
@@ -2059,7 +2058,6 @@  int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		    struct file *file, unsigned open_flags,
 		    umode_t mode)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct nfs_open_context *ctx;
 	struct dentry *res;
 	struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -2115,7 +2113,7 @@  int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		d_drop(dentry);
 		switched = true;
 		dentry = d_alloc_parallel(dentry->d_parent,
-					  &dentry->d_name, &wq);
+					  &dentry->d_name);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
 		if (unlikely(!d_in_lookup(dentry)))
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index bf77399696a7..d44162d3a8f1 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -124,7 +124,7 @@  static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
 	struct dentry *alias;
 
 	down_read_non_owner(&NFS_I(dir)->rmdir_sem);
-	alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
+	alias = d_alloc_parallel(dentry->d_parent, &data->args.name);
 	if (IS_ERR(alias)) {
 		up_read_non_owner(&NFS_I(dir)->rmdir_sem);
 		return 0;
@@ -185,7 +185,6 @@  nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
 
 	data->cred = get_current_cred();
 	data->res.dir_attr = &data->dir_attr;
-	init_waitqueue_head(&data->wq);
 
 	status = -EBUSY;
 	spin_lock(&dentry->d_lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index cd89e956c322..c8bcbdac87d5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2126,8 +2126,7 @@  bool proc_fill_cache(struct file *file, struct dir_context *ctx,
 
 	child = d_hash_and_lookup(dir, &qname);
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			goto end_instantiate;
 		if (d_in_lookup(child)) {
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index cc9d74a06ff0..9f1088f138f4 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -693,8 +693,7 @@  static bool proc_sys_fill_cache(struct file *file,
 
 	child = d_lookup(dir, &qname);
 	if (!child) {
-		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-		child = d_alloc_parallel(dir, &qname, &wq);
+		child = d_alloc_parallel(dir, &qname);
 		if (IS_ERR(child))
 			return false;
 		if (d_in_lookup(child)) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 50f96259d9ad..39d8a18cd443 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -73,7 +73,6 @@  cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 	struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
 	bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions;
 	bool reparse_need_reval = false;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	int rc;
 
 	cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
@@ -105,7 +104,7 @@  cifs_prime_dcache(struct dentry *parent, struct qstr *name,
 		    (fattr->cf_flags & CIFS_FATTR_NEED_REVAL))
 			return;
 
-		dentry = d_alloc_parallel(parent, name, &wq);
+		dentry = d_alloc_parallel(parent, name);
 	}
 	if (IS_ERR(dentry))
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4afb60365675..b03cbb0177a3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -247,8 +247,7 @@  extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op
 /* allocate/de-allocate */
 extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
 extern struct dentry * d_alloc_anon(struct super_block *);
-extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
-					wait_queue_head_t *);
+extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *);
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
 extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 9155a6ffc370..d0473e0d4aba 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1731,7 +1731,6 @@  struct nfs_unlinkdata {
 	struct nfs_removeargs args;
 	struct nfs_removeres res;
 	struct dentry *dentry;
-	wait_queue_head_t wq;
 	const struct cred *cred;
 	struct nfs_fattr dir_attr;
 	long timeout;