diff mbox series

[13/19] VFS: lock dentry for ->revalidate to avoid races with rename etc

Message ID 20250206054504.2950516-14-neilb@suse.de
State New
Headers show
Series RFC: Allow concurrent and async changes in a directory | expand

Commit Message

NeilBrown Feb. 6, 2025, 5:42 a.m. UTC
When we call ->revalidate we want to be sure we are revalidating the
expected name.  As a shared lock on i_rwsem no longer prevents renames
we need to lock the dentry and ensure it still has the expected name.

So pass parent name to d_revalidate() and be prepared to retry the
lookup if it returns -EAGAIN.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 11 deletions(-)

Comments

Al Viro Feb. 7, 2025, 8:28 p.m. UTC | #1
On Thu, Feb 06, 2025 at 04:42:50PM +1100, NeilBrown wrote:

> +	if (dentry->d_flags & LOOKUP_RCU) {

Really?
Al Viro Feb. 7, 2025, 8:35 p.m. UTC | #2
On Fri, Feb 07, 2025 at 08:28:39PM +0000, Al Viro wrote:
> On Thu, Feb 06, 2025 at 04:42:50PM +1100, NeilBrown wrote:
> 
> > +	if (dentry->d_flags & LOOKUP_RCU) {
> 
> Really?

That aside, you are *NOT* passing the parent's name here - if you
look at the callers, all of them have 'name' and 'last' arguments
identicaly.  What is going on here?
Al Viro Feb. 8, 2025, 1:30 a.m. UTC | #3
On Thu, Feb 06, 2025 at 04:42:50PM +1100, NeilBrown wrote:
> When we call ->revalidate we want to be sure we are revalidating the
> expected name.  As a shared lock on i_rwsem no longer prevents renames
> we need to lock the dentry and ensure it still has the expected name.

*blink*

We never had been guaranteed any lock on the parent - the most common
call chain doesn't (and didn't) have it taken.

> So pass parent name to d_revalidate() and be prepared to retry the
> lookup if it returns -EAGAIN.

I don't understand that one at all.  What's the point of those retries
on -EAGAIN?  Rename (or race with d_splice_alias(), for that matter)
can happen just as we return success from ->d_revalidate(), so we
don't get anything useful out of that check.

What's more, why do we need that exclusion in the first place?
The instance *is* given a stable parent reference and stable name,
so there's no need for it to even look at ->d_parent or ->d_name.

It looks like a bad rebase on top of ->d_revalidate() series that
had landed in -rc1, with the original variant trying to provide the
guarantees now offered by that series.

Unless there's something subtle I'm missing here, I would suggest
dropping that one.  Incidentally, d_update_trylock() would be
better off in fs/dcache.c - static and with just one argument.

HOWEVER, if you do not bother with doing that before ->d_unalias_trylock()
(and there's no reason to do that), the whole thing becomes much simpler -
you can do the check inside __d_move(), after all locks had been taken.

After
        spin_lock_nested(&dentry->d_lock, 2);
        spin_lock_nested(&target->d_lock, 3);
you have everything stable.  Just make the sucker return bool instead
of void, check that crap and have it return false if there's a problem.

Callers other than __d_unalias() would just do WARN_ON(!__d_move(...))
instead of their __d_move() calls and __d_unalias() would have
	if (__d_move(...))
		ret = 0;
and screw the d_update_trylock/d_update_unlock there.

All there is to it...
Al Viro Feb. 8, 2025, 1:35 a.m. UTC | #4
On Sat, Feb 08, 2025 at 01:30:43AM +0000, Al Viro wrote:
> On Thu, Feb 06, 2025 at 04:42:50PM +1100, NeilBrown wrote:
> > When we call ->revalidate we want to be sure we are revalidating the
> > expected name.  As a shared lock on i_rwsem no longer prevents renames
> > we need to lock the dentry and ensure it still has the expected name.
> 
> *blink*
> 
> We never had been guaranteed any lock on the parent - the most common
> call chain doesn't (and didn't) have it taken.
> 
> > So pass parent name to d_revalidate() and be prepared to retry the
> > lookup if it returns -EAGAIN.
> 
> I don't understand that one at all.  What's the point of those retries
> on -EAGAIN?  Rename (or race with d_splice_alias(), for that matter)
> can happen just as we return success from ->d_revalidate(), so we
> don't get anything useful out of that check.
> 
> What's more, why do we need that exclusion in the first place?
> The instance *is* given a stable parent reference and stable name,
> so there's no need for it to even look at ->d_parent or ->d_name.
> 
> It looks like a bad rebase on top of ->d_revalidate() series that
> had landed in -rc1, with the original variant trying to provide the
> guarantees now offered by that series.
> 
> Unless there's something subtle I'm missing here, I would suggest
> dropping that one.  Incidentally, d_update_trylock() would be
> better off in fs/dcache.c - static and with just one argument.

Sorry, lost a sentence here while editing:

The only remaining caller of d_update_trylock() would be the one in
__d_unalias(), just before the call of ->d_unalias_trylock() in there
and it gets NULL/NULL in the last two arguments.

> HOWEVER, if you do not bother with doing that before ->d_unalias_trylock()
> (and there's no reason to do that), the whole thing becomes much simpler -
> you can do the check inside __d_move(), after all locks had been taken.
> 
> After
>         spin_lock_nested(&dentry->d_lock, 2);
>         spin_lock_nested(&target->d_lock, 3);
> you have everything stable.  Just make the sucker return bool instead
> of void, check that crap and have it return false if there's a problem.
> 
> Callers other than __d_unalias() would just do WARN_ON(!__d_move(...))
> instead of their __d_move() calls and __d_unalias() would have
> 	if (__d_move(...))
> 		ret = 0;
> and screw the d_update_trylock/d_update_unlock there.
> 
> All there is to it...
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 145ae07f9b8c..3a107d6098be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -957,12 +957,24 @@  static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 }
 
 static inline int d_revalidate(struct inode *dir, const struct qstr *name,
-			       struct dentry *dentry, unsigned int flags)
+			       struct dentry *dentry, unsigned int flags,
+			       struct dentry *base, const struct qstr *last)
 {
-	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
-		return dentry->d_op->d_revalidate(dir, name, dentry, flags);
-	else
+	int status;
+
+	if (!unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
 		return 1;
+
+	if (dentry->d_flags & LOOKUP_RCU) {
+		if (!d_update_trylock(dentry, base, last))
+			return -ECHILD;
+	} else {
+		if (!d_update_lock(dentry, base, last, I_MUTEX_NORMAL))
+			return -EAGAIN;
+	}
+	status = dentry->d_op->d_revalidate(dir, name, dentry, flags);
+	d_update_unlock(dentry);
+	return status;
 }
 
 /**
@@ -1686,13 +1698,18 @@  static struct dentry *lookup_dcache(const struct qstr *name,
 				    struct dentry *dir,
 				    unsigned int flags)
 {
-	struct dentry *dentry = d_lookup(dir, name);
+	struct dentry *dentry;
+again:
+	dentry = d_lookup(dir, name);
 	if (dentry) {
-		int error = d_revalidate(dir->d_inode, name, dentry, flags);
+		int error = d_revalidate(dir->d_inode, name, dentry, flags, dir, name);
 		if (unlikely(error <= 0)) {
 			if (!error)
 				d_invalidate(dentry);
 			dput(dentry);
+			if (error == -EAGAIN)
+				/* raced with rename etc */
+				goto again;
 			return ERR_PTR(error);
 		}
 	}
@@ -1915,6 +1932,7 @@  static struct dentry *lookup_fast(struct nameidata *nd)
 	 * of a false negative due to a concurrent rename, the caller is
 	 * going to fall back to non-racy lookup.
 	 */
+again:
 	if (nd->flags & LOOKUP_RCU) {
 		dentry = __d_lookup_rcu(parent, &nd->last, &nd->next_seq);
 		if (unlikely(!dentry)) {
@@ -1930,7 +1948,7 @@  static struct dentry *lookup_fast(struct nameidata *nd)
 		if (read_seqcount_retry(&parent->d_seq, nd->seq))
 			return ERR_PTR(-ECHILD);
 
-		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags, parent, &nd->last);
 		if (likely(status > 0))
 			return dentry;
 		if (!try_to_unlazy_next(nd, dentry))
@@ -1938,17 +1956,19 @@  static struct dentry *lookup_fast(struct nameidata *nd)
 		if (status == -ECHILD)
 			/* we'd been told to redo it in non-rcu mode */
 			status = d_revalidate(nd->inode, &nd->last,
-					      dentry, nd->flags);
+					      dentry, nd->flags, parent, &nd->last);
 	} else {
 		dentry = __d_lookup(parent, &nd->last);
 		if (unlikely(!dentry))
 			return NULL;
-		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags);
+		status = d_revalidate(nd->inode, &nd->last, dentry, nd->flags, parent, &nd->last);
 	}
 	if (unlikely(status <= 0)) {
 		if (!status)
 			d_invalidate(dentry);
 		dput(dentry);
+		if (status == -EAGAIN)
+			goto again;
 		return ERR_PTR(status);
 	}
 	return dentry;
@@ -1970,7 +1990,7 @@  static struct dentry *__lookup_slow(const struct qstr *name,
 	if (IS_ERR(dentry))
 		return dentry;
 	if (unlikely(!d_in_lookup(dentry))) {
-		int error = d_revalidate(inode, name, dentry, flags);
+		int error = d_revalidate(inode, name, dentry, flags, dir, name);
 		if (unlikely(error <= 0)) {
 			if (!error) {
 				d_invalidate(dentry);
@@ -1978,6 +1998,8 @@  static struct dentry *__lookup_slow(const struct qstr *name,
 				goto again;
 			}
 			dput(dentry);
+			if (error == -EAGAIN)
+				goto again;
 			dentry = ERR_PTR(error);
 		}
 	} else {
@@ -3777,6 +3799,7 @@  static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		return ERR_PTR(-ENOENT);
 
 	file->f_mode &= ~FMODE_CREATED;
+again:
 	dentry = d_lookup(dir, &nd->last);
 	for (;;) {
 		if (!dentry) {
@@ -3787,9 +3810,13 @@  static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 		if (d_in_lookup(dentry))
 			break;
 
-		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags);
+		error = d_revalidate(dir_inode, &nd->last, dentry, nd->flags, dir, &nd->last);
 		if (likely(error > 0))
 			break;
+		if (error == -EAGAIN) {
+			dput(dentry);
+			goto again;
+		}
 		if (error)
 			goto out_dput;
 		d_invalidate(dentry);