diff mbox

[3/3] VFS: close race between getcwd() and d_move()

Message ID 871sl6eo7e.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Nov. 10, 2017, 4:45 a.m. UTC
On Thu, Nov 09 2017, Linus Torvalds wrote:

> On Thu, Nov 9, 2017 at 2:14 PM, NeilBrown <neilb@suse.com> wrote:
>> On Thu, Nov 09 2017, Linus Torvalds wrote:
>>>
>>> How nasty would it be to just expand the calls to __d_drop/__d_rehash
>>> into __d_move itself, and take both has list locks at the same time
>>> (with the usual ordering and checking if it's the same list, of
>>> course).
>>
>> something like this?
>
> Yes.
>
> This looks nicer to me. Partly because I hate those "pass flags to
> functions that modify their behavior" kinds of patches. I'd rather see
> just straight-line unconditional code with some possible duplication.
...
>
> I also do wonder if we can avoid all the unhash/rehash games entirely
> (and avoid the hash list locking) if it turns out that the dentry and
> target hash lists are the same.

I'm not convinced.  I haven't actually tried it, but the matrix of
possibilities seems a little large.
The source dentry may or may not be hashed (not in the "disconnected
IS_ROOT" case), and the target may or may not want to be rehashed
afterwards (depending on 'exchange').  We could skip the lock
for an exchange if they both had the same hash, but not for a simple
move.

However your description of what it was that you didn't like gave me an
idea - I can take the same approach as my original, but not pass flags
around.
I quite like how this turned out.
Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add
___d_rehash() without the BUG_ON() and call that from __d_rehash?

Thanks,
NeilBrown


From: NeilBrown <neilb@suse.com>
Date: Fri, 10 Nov 2017 15:20:06 +1100
Subject: [PATCH] VFS: close race between getcwd() and d_move()

d_move() will call __d_drop() and then __d_rehash()
on the dentry being moved.  This creates a small window
when the dentry appears to be unhashed.  Many tests
of d_unhashed() are made under ->d_lock and so are safe
from racing with this window, but some aren't.
In particular, getcwd() calls d_unlinked() (which calls
d_unhashed()) without d_lock protection, so it can race.

This races has been seen in practice with lustre, which uses d_move() as
part of name lookup.  See:
   https://jira.hpdd.intel.com/browse/LU-9735
It could race with a regular rename(), and result in ENOENT instead
of either the 'before' or 'after' name.

The race can be demonstrated with a simple program which
has two threads, one renaming a directory back and forth
while another calls getcwd() within that directory: it should never
fail, but does.  See:
  https://patchwork.kernel.org/patch/9455345/

We could fix this race by taking d_lock and rechecking when
d_unhashed() reports true.  Alternately when can remove the window,
which is the approach this patch takes.

___d_drop() is introduce which does *not* clear d_hash.pprev
so the dentry still appears to be hashed.  __d_drop() calls
___d_drop(), then clears d_hash.pprev.
__d_move() now uses ___d_drop() and only clears d_hash.pprev
when not rehashing.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/dcache.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Comments

Linus Torvalds Nov. 10, 2017, 7:52 p.m. UTC | #1
On Thu, Nov 9, 2017 at 8:45 PM, NeilBrown <neilb@suse.com> wrote:
>
> However your description of what it was that you didn't like gave me an
> idea - I can take the same approach as my original, but not pass flags
> around.
> I quite like how this turned out.
> Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add
> ___d_rehash() without the BUG_ON() and call that from __d_rehash?

This looks more palatable to me, yes.

It still worries me a bit that we end up having that dangling pprev
pointer despite having dropped the hash list lock, but as long as
nobody uses it for anything but that "is it hashed" test without
taking the dentry lock, it all *should* be safe.

Please fix the whitespace, though. This is not how we do function definitions:

    void __d_drop(struct dentry *dentry) {

but otherwise I think this patch is acceptable.

Still want commentary from Al (and preferably going through his vfs
tree for 4.15 or whatever).

             Linus
Al Viro Nov. 10, 2017, 8:53 p.m. UTC | #2
On Fri, Nov 10, 2017 at 03:45:41PM +1100, NeilBrown wrote:
> -void __d_drop(struct dentry *dentry)
> +static void ___d_drop(struct dentry *dentry)
>  {
>  	if (!d_unhashed(dentry)) {
>  		struct hlist_bl_head *b;
> @@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry)
>  
>  		hlist_bl_lock(b);
>  		__hlist_bl_del(&dentry->d_hash);
> -		dentry->d_hash.pprev = NULL;
>  		hlist_bl_unlock(b);
>  		/* After this call, in-progress rcu-walk path lookup will fail. */
>  		write_seqcount_invalidate(&dentry->d_seq);
>  	}
>  }
> +void __d_drop(struct dentry *dentry) {
> +	___d_drop(dentry);
> +	dentry->d_hash.pprev = NULL;

Umm...  That reordering (unhashed vs. ->d_seq) might be a problem
on the RCU side.  I'm not sure it is, we might get away with that,
actually, but I want to finish digging through the pathwalk-related
code.  Cursing it for being too subtle for its own good, as usual...
Al Viro Nov. 21, 2017, 11:50 p.m. UTC | #3
On Fri, Nov 10, 2017 at 08:53:28PM +0000, Al Viro wrote:
> On Fri, Nov 10, 2017 at 03:45:41PM +1100, NeilBrown wrote:
> > -void __d_drop(struct dentry *dentry)
> > +static void ___d_drop(struct dentry *dentry)
> >  {
> >  	if (!d_unhashed(dentry)) {
> >  		struct hlist_bl_head *b;
> > @@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry)
> >  
> >  		hlist_bl_lock(b);
> >  		__hlist_bl_del(&dentry->d_hash);
> > -		dentry->d_hash.pprev = NULL;
> >  		hlist_bl_unlock(b);
> >  		/* After this call, in-progress rcu-walk path lookup will fail. */
> >  		write_seqcount_invalidate(&dentry->d_seq);
> >  	}
> >  }
> > +void __d_drop(struct dentry *dentry) {
> > +	___d_drop(dentry);
> > +	dentry->d_hash.pprev = NULL;
> 
> Umm...  That reordering (unhashed vs. ->d_seq) might be a problem
> on the RCU side.  I'm not sure it is, we might get away with that,
> actually, but I want to finish digging through the pathwalk-related
> code.  Cursing it for being too subtle for its own good, as usual...

OK, I believe that it's survivable, but I'd prefer to keep in -next
for a while and give it more testing.
NeilBrown Nov. 22, 2017, 1:31 a.m. UTC | #4
On Tue, Nov 21 2017, Al Viro wrote:

> On Fri, Nov 10, 2017 at 08:53:28PM +0000, Al Viro wrote:
>> On Fri, Nov 10, 2017 at 03:45:41PM +1100, NeilBrown wrote:
>> > -void __d_drop(struct dentry *dentry)
>> > +static void ___d_drop(struct dentry *dentry)
>> >  {
>> >  	if (!d_unhashed(dentry)) {
>> >  		struct hlist_bl_head *b;
>> > @@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry)
>> >  
>> >  		hlist_bl_lock(b);
>> >  		__hlist_bl_del(&dentry->d_hash);
>> > -		dentry->d_hash.pprev = NULL;
>> >  		hlist_bl_unlock(b);
>> >  		/* After this call, in-progress rcu-walk path lookup will fail. */
>> >  		write_seqcount_invalidate(&dentry->d_seq);
>> >  	}
>> >  }
>> > +void __d_drop(struct dentry *dentry) {
>> > +	___d_drop(dentry);
>> > +	dentry->d_hash.pprev = NULL;
>> 
>> Umm...  That reordering (unhashed vs. ->d_seq) might be a problem
>> on the RCU side.  I'm not sure it is, we might get away with that,
>> actually, but I want to finish digging through the pathwalk-related
>> code.  Cursing it for being too subtle for its own good, as usual...
>
> OK, I believe that it's survivable, but I'd prefer to keep in -next
> for a while and give it more testing.

Great, thanks.  I assume you will fix the silly '{' at the end of the
line when defining __d_drop().  Let me know if you would rather I
resend.

Thanks,
NeilBrown
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..8c83543f5065 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -468,9 +468,11 @@  static void dentry_lru_add(struct dentry *dentry)
  * d_drop() is used mainly for stuff that wants to invalidate a dentry for some
  * reason (NFS timeouts or autofs deletes).
  *
- * __d_drop requires dentry->d_lock.
+ * __d_drop requires dentry->d_lock
+ * ___d_drop doesn't mark dentry as "unhashed"
+ *   (dentry->d_hash.pprev will be LIST_POISON2, not NULL).
  */
-void __d_drop(struct dentry *dentry)
+static void ___d_drop(struct dentry *dentry)
 {
 	if (!d_unhashed(dentry)) {
 		struct hlist_bl_head *b;
@@ -486,12 +488,15 @@  void __d_drop(struct dentry *dentry)
 
 		hlist_bl_lock(b);
 		__hlist_bl_del(&dentry->d_hash);
-		dentry->d_hash.pprev = NULL;
 		hlist_bl_unlock(b);
 		/* After this call, in-progress rcu-walk path lookup will fail. */
 		write_seqcount_invalidate(&dentry->d_seq);
 	}
 }
+void __d_drop(struct dentry *dentry) {
+	___d_drop(dentry);
+	dentry->d_hash.pprev = NULL;
+}
 EXPORT_SYMBOL(__d_drop);
 
 void d_drop(struct dentry *dentry)
@@ -2381,7 +2386,7 @@  EXPORT_SYMBOL(d_delete);
 static void __d_rehash(struct dentry *entry)
 {
 	struct hlist_bl_head *b = d_hash(entry->d_name.hash);
-	BUG_ON(!d_unhashed(entry));
+
 	hlist_bl_lock(b);
 	hlist_bl_add_head_rcu(&entry->d_hash, b);
 	hlist_bl_unlock(b);
@@ -2818,9 +2823,9 @@  static void __d_move(struct dentry *dentry, struct dentry *target,
 	write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);
 
 	/* unhash both */
-	/* __d_drop does write_seqcount_barrier, but they're OK to nest. */
-	__d_drop(dentry);
-	__d_drop(target);
+	/* ___d_drop does write_seqcount_barrier, but they're OK to nest. */
+	___d_drop(dentry);
+	___d_drop(target);
 
 	/* Switch the names.. */
 	if (exchange)
@@ -2832,6 +2837,8 @@  static void __d_move(struct dentry *dentry, struct dentry *target,
 	__d_rehash(dentry);
 	if (exchange)
 		__d_rehash(target);
+	else
+		target->d_hash.pprev = NULL;
 
 	/* ... and switch them in the tree */
 	if (IS_ROOT(dentry)) {