[BUG] lock_parent() breakage when used from shrink_dentry_list() (was Re: [PATCH v2 6/6] fs/dcache: Avoid remaining try_lock loop in shrink_dentry_list())

On Fri, Feb 23, 2018 at 05:42:16PM +0000, Al Viro wrote:

> 	4) the nasty one - shrink_dentry_list() evictions of zero-count dentries.
> _That_ calls for careful use of RCU, etc. - none of the others need that.  Need
> to think how to deal with that sucker; in any case, I do not believe that sharing
> said RCU use, etc. with any other cases would do anything other than obfuscating
> the rest.

Arrrgh...  Actually, there's a nasty corner case where the variant in mainline is
broken.  Look:
	dentry placed on a shrink list
	we pick the fucker from the list and lock it.
	we call lock_parent() on it.
		dentry is not a root and it's not deleted, so we proceed.
		trylock fails.
		we grab rcu_read_lock()
		we drop dentry->d_lock
	on another CPU, something does e.g. d_prune_aliases() (or finds the
	sucker in hash and proceeds to unhash and dput(), etc.) - anything
	that evicts that dentry.  It is marked with DCACHE_MAY_FREE and left
	alone.  The parent, OTOH, is dropped and freeing gets scheduled as
	soon as RCU allows.
		we grab parent->d_lock
		we verify that dentry->d_parent is still the same (it is)
		we do rcu_read_unlock()
		we grab dentry->d_lock
		we return parent
At that point we are fucked - there's nothing to prevent parent from being
freed at any point.  And we assume that its ->d_lock is held and needs to
be dropped.

The call site in d_prune_aliases() avoids the same scenario, since there we
are already holding ->i_lock and another thread won't get to __dentry_kill()
until we are done with lock_parent().

Unless I'm missing something, that's a (narrow) memory corruptor.  The window is
narrow, but not impossibly so - if that other thread had been spinning on attempt
to grab dentry->d_lock in d_prune_alias(), it has to squeeze through
                if (!dentry->d_lockref.count) {
and then in lock_parent() called there - through
        if (IS_ROOT(dentry))
                return NULL;
        if (unlikely(dentry->d_lockref.count < 0))
                return NULL;
        if (likely(spin_trylock(&parent->d_lock)))
before the first CPU gets through
        parent = READ_ONCE(dentry->d_parent);
        spin_lock(&parent->d_lock);

The first CPU can't be preempted, but there's nothing to prevent an IRQ arriving
at that point, letting the second one win the race.

Comments?

I think the (untested) patch below is -stable fodder:

lock_parent() needs to recheck if dentry got __dentry_kill'ed under it

In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent.  We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point.  Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

[BUG] lock_parent() breakage when used from shrink_dentry_list() (was Re: [PATCH v2 6/6] fs/dcache: Avoid remaining try_lock loop in shrink_dentry_list())

Commit Message

Comments

Patch