fs: NULL deref in atime_needs_update

On Thu, Feb 25, 2016 at 09:29:21AM +0100, Dmitry Vyukov wrote:

> Humm... I've left it running over night but no GPFs happened...
> Usually they happened within two hours or so. I would think that your
> patch fixes it and I did not actually apply it last time (or did not
> rebuild kernel). But I saw the new warnings that the patch adds, so I
> should have been rebuilt it...
> 
> What I saw is a dozen of pairs of warnings like the one below.
> Is it possible the warning printing introduces enough delay to close
> the inconsistency window?....

All that stops these warnings from triggering atime_... oopsen is that dentry
involved isn't a symlink one.  lookup_fast() should *not* return 0 and set
*inode to NULL.  Ever.  Trigger the same in walk_component() and you'll
get NULL nd->inode, which will oops as soon as you get to may_lookup() for
the next component (or atime one if dentry turns out to be a symlink and
not a directory).

IOW, what happened is that you've got dozens of instances of the underlying
bug triggered, all on non-symlink dentries in the last components of pathname.

The codepath in question is this:
                *inode = d_backing_inode(dentry);
                negative = d_is_negative(dentry);
                if (read_seqcount_retry(&dentry->d_seq, seq))
                        return -ECHILD;
at that point we'd better have negative and *inode refering to the state of
dentry at the same moment - seq had been fetched before both the inode and
dentry flags and has remained unchanged until the later point, i.e. through
all the interval containing both fetches.
		....
                if (negative)
                        return -ENOENT;
no *inode changes in between, so it ought to be non-NULL
                path->mnt = mnt;
                path->dentry = dentry;
                if (likely(__follow_mount_rcu(nd, path, inode, seqp))) {
                        WARN_ON(!*inode);
                        return 0;
and you are triggering that WARN_ON.  So either __follow_mount_rcu() has
returned true and zeroed *inode, or we have something very wrong with
->d_seq.

__follow_mount_rcu() reassigns *inode only in one place:
                mounted = __lookup_mnt(path->mnt, path->dentry);
                if (!mounted)
                        break;
                path->mnt = &mounted->mnt;
                path->dentry = mounted->mnt.mnt_root;
                nd->flags |= LOOKUP_JUMPED;
                *seqp = read_seqcount_begin(&path->dentry->d_seq);
                /*
                 * Update the inode too. We don't need to re-check the
                 * dentry sequence number here after this d_inode read,
                 * because a mount-point is always pinned.
                 */
                *inode = path->dentry->d_inode;
Note that it had returned true, so we have read_seqretry(&mount_lock, nd->m_seq)
yielding false, i.e. mount_lock hadn't been touched through all of that.
Having ->mnt_root->d_inode go NULL on a live vfsmount is a Bad Thing(tm), for
obvious reasons. ->mnt_root should've remained pinned until cleanup_mnt(),
which means that it must've already gotten through mntput_no_expire()
lock_mount_hash/unlock_mount_hash *before* we'd picked nd->m_seq; otherwise
we'd see mount_lock mismatch.  Now, all removals from vfsmount hash should
happen under mount_lock and prior to cleanup_mnt() on victim.  So we should've
had this:
Somebody:
	hlist_del_init(&mounted->mnt_hash);
	...
	write_sequnlock(&mount_lock);
Us:
	rcu_read_lock();
	nd->m_seq = read_seqbegin(&mount_lock);
	...
	hlist_for_each_entry_rcu(p, head, mnt_hash)
		... run into 'mounted'
	find mount_lock untouched.

AFAICS, write_sequnlock/read_seqbegin barriers should've sufficed to
prevent that...

Hrm...  OK, seeing that you still seem to trigger those within an hour or
two (and *any* of remaining WARN_ON() are serious bugs - none of the
"mitigation had been triggered" remained, sorry for not making it clear),
let's try this.  Again, any WARN_ON triggered means that we'd caught something,
whether it progresses into oops or not.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

fs: NULL deref in atime_needs_update

Commit Message

Comments

Patch