diff mbox

fs: NULL deref in atime_needs_update

Message ID 20160224044628.GQ17997@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Feb. 24, 2016, 4:46 a.m. UTC
On Wed, Feb 24, 2016 at 11:12:13AM +0800, Ian Kent wrote:
> On Wed, 2016-02-17 at 00:40 +0100, Mickaël Salaün wrote:
> > Hi,
> > 
> > Actually I found the same bug (without fuzzing) and I can reproduce it
> > in a deterministic way (e.g. by creating a LSM that return 1 for the
> > security_file_open hook). At least, from v4.2.8 I can easily trigger
> > traces like this :
> 
> Reading through this thread I wonder if this is a new problem.
> 
> Is there a previous kernel it can't be reproduced on?
> Perhaps a bisect will shed some light on what's happening.

There are several things in the mix.  What Mickaël has found is that a bunch
of places where _positive_ number returned instead of expected 0 or -E... can
get propagated all way back to caller of do_last(), confusing the hell out
of it.  That's not what Dmitry has triggered, though.  WARN_ON() in the
end of do_last() would've triggered, and IMO this one, along with mitigation
(map that "error value" to -EINVAL) should go into mainline and all -stable.
Bogus ->open() returning a positive number had always been bad news; in the
best case it would be returned to userland, leading to "open(2) has failed
and returned a positive number".  Hell knows if we ever had such instances
(or have them right now), but I wouldn't bet on their absense.  Rare
failure exits returning bogus values in an ->open() instance in some driver
can easily stay unnoticed for a long time.  Starting from at least 3.6
(circa the atomic_open support) it got more unpleasant than simple "confuse
the hell out of userland".  ->open() isn't the only vector for injection of
such crap - ->permission() would also serve, same for several LSM turds, etc.

Again, that's a separate problem.  What Dmitry seems to be catching is getting
crap values fed to should_follow_link() as inode.  I see one bug in that
area that does need fixing (fix present in the last patch I've posted, with
WARN_ON() to indicate that this thing has triggered; _that_ WARN_ON() should
be gone from the final variant, since this can trigger without driver bugs,
etc.)  In RCU mode after we'd checked that dentry is a symlink one, we need
to verify that it hadn't been changed since before we'd fetched the inode.
It might have been e.g. a regular file, which got unlinked with symlink
created in its place.  Then we'd go into get_link() with non-symlink inode
and oops on attempt to call its ->i_op->get_link().  That race is real, very
hard to hit (you need both the unlink(2) and symlink(2) to happen between
lookup_fast() and should_follow_link() and unless you have PREEMPT_RCU you
can't lose the timeslice there) and would've manifested differently.

But that leaves other two kinds of bugs getting triggered: inode of some
non-symlink is possible, but what we saw included NULL inode when we'd
reached finish_open: in do_last().  Should be flat-out impossible - we either
get lookup_fast(..., &inode, ...) return 0 and store NULL in inode, or
get NULL inode from pinned d_is_symlink() dentry after having grabbed
a reference and left RCU mode.  Neither should be possible without either
something weird happening to lookup_fast() (and we would've seen oopsen in
link_path_walk() if that could happen, BTW) or screwed dentry refcounting
somewhere, combined with a race that managed to turn...

Oh, shit.  No screwed refcounting is needed.  Look:
        BUG_ON(nd->flags & LOOKUP_RCU);
        inode = d_backing_inode(path.dentry);
        seq = 0;        /* out of RCU mode, so the value doesn't matter */
        if (unlikely(d_is_negative(path.dentry))) {
                path_to_nameidata(&path, nd);
                return -ENOENT;
        }
Suppose we come here with negative path.dentry.  We are holding a reference,
all right, and for a _postive_ dentry that would've been enough to keep
it positive.  Not so for a negative one, though - symlink(2) on another
CPU doint d_instantiate() just before the d_is_negative() check and we
are fucked - inode is stale and we sail through all the checks, all the
way into should_follow_link().

We also have the same kind of crap in walk_component() -
                err = lookup_slow(nd, &path);
                if (err < 0)
                        return err;
                inode = d_backing_inode(path.dentry);
                seq = 0;        /* we are already out of RCU mode */
                err = -ENOENT;
                if (d_is_negative(path.dentry))  
                        goto out_path_put;
There it's much harder to hit, though - we need it not just d_instantiate()
overlapping those lines; we need the racing syscall to get from locking
the parent to d_instantiate() between the point where lookup_slow() has
unlocked the parent and d_is_negative().  And lookup_slow() couldn't have
gone into mountpoint crossing, so it's really hard to hit - you pretty
much have to get preempted just after fetching inode.

OK, the next delta to try, and there definitely are several commits in
that pile.  It still does not explain the traces with GPF at 0x50, though -
for that we need not just a NULL getting to should_follow_link() but
something non-NULL with NULL at offset 40 from it (offset of ->i_sb in
struct inode).  That something *can't* be a valid struct inode or had been
one in recent past - ->i_sb is assigned in new_inode(), value is always
non-NULL and never modified all the way until RCU-delayed freeing of struct
inode.  So that has to be something entirely different...  Anyway, the
patch so far follows:

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dmitry Vyukov Feb. 24, 2016, 10:03 a.m. UTC | #1
On Wed, Feb 24, 2016 at 5:46 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Feb 24, 2016 at 11:12:13AM +0800, Ian Kent wrote:
>> On Wed, 2016-02-17 at 00:40 +0100, Mickaël Salaün wrote:
>> > Hi,
>> >
>> > Actually I found the same bug (without fuzzing) and I can reproduce it
>> > in a deterministic way (e.g. by creating a LSM that return 1 for the
>> > security_file_open hook). At least, from v4.2.8 I can easily trigger
>> > traces like this :
>>
>> Reading through this thread I wonder if this is a new problem.
>>
>> Is there a previous kernel it can't be reproduced on?
>> Perhaps a bisect will shed some light on what's happening.
>
> There are several things in the mix.  What Mickaël has found is that a bunch
> of places where _positive_ number returned instead of expected 0 or -E... can
> get propagated all way back to caller of do_last(), confusing the hell out
> of it.  That's not what Dmitry has triggered, though.  WARN_ON() in the
> end of do_last() would've triggered, and IMO this one, along with mitigation
> (map that "error value" to -EINVAL) should go into mainline and all -stable.
> Bogus ->open() returning a positive number had always been bad news; in the
> best case it would be returned to userland, leading to "open(2) has failed
> and returned a positive number".  Hell knows if we ever had such instances
> (or have them right now), but I wouldn't bet on their absense.  Rare
> failure exits returning bogus values in an ->open() instance in some driver
> can easily stay unnoticed for a long time.  Starting from at least 3.6
> (circa the atomic_open support) it got more unpleasant than simple "confuse
> the hell out of userland".  ->open() isn't the only vector for injection of
> such crap - ->permission() would also serve, same for several LSM turds, etc.
>
> Again, that's a separate problem.  What Dmitry seems to be catching is getting
> crap values fed to should_follow_link() as inode.  I see one bug in that
> area that does need fixing (fix present in the last patch I've posted, with
> WARN_ON() to indicate that this thing has triggered; _that_ WARN_ON() should
> be gone from the final variant, since this can trigger without driver bugs,
> etc.)  In RCU mode after we'd checked that dentry is a symlink one, we need
> to verify that it hadn't been changed since before we'd fetched the inode.
> It might have been e.g. a regular file, which got unlinked with symlink
> created in its place.  Then we'd go into get_link() with non-symlink inode
> and oops on attempt to call its ->i_op->get_link().  That race is real, very
> hard to hit (you need both the unlink(2) and symlink(2) to happen between
> lookup_fast() and should_follow_link() and unless you have PREEMPT_RCU you
> can't lose the timeslice there) and would've manifested differently.
>
> But that leaves other two kinds of bugs getting triggered: inode of some
> non-symlink is possible, but what we saw included NULL inode when we'd
> reached finish_open: in do_last().  Should be flat-out impossible - we either
> get lookup_fast(..., &inode, ...) return 0 and store NULL in inode, or
> get NULL inode from pinned d_is_symlink() dentry after having grabbed
> a reference and left RCU mode.  Neither should be possible without either
> something weird happening to lookup_fast() (and we would've seen oopsen in
> link_path_walk() if that could happen, BTW) or screwed dentry refcounting
> somewhere, combined with a race that managed to turn...
>
> Oh, shit.  No screwed refcounting is needed.  Look:
>         BUG_ON(nd->flags & LOOKUP_RCU);
>         inode = d_backing_inode(path.dentry);
>         seq = 0;        /* out of RCU mode, so the value doesn't matter */
>         if (unlikely(d_is_negative(path.dentry))) {
>                 path_to_nameidata(&path, nd);
>                 return -ENOENT;
>         }
> Suppose we come here with negative path.dentry.  We are holding a reference,
> all right, and for a _postive_ dentry that would've been enough to keep
> it positive.  Not so for a negative one, though - symlink(2) on another
> CPU doint d_instantiate() just before the d_is_negative() check and we
> are fucked - inode is stale and we sail through all the checks, all the
> way into should_follow_link().
>
> We also have the same kind of crap in walk_component() -
>                 err = lookup_slow(nd, &path);
>                 if (err < 0)
>                         return err;
>                 inode = d_backing_inode(path.dentry);
>                 seq = 0;        /* we are already out of RCU mode */
>                 err = -ENOENT;
>                 if (d_is_negative(path.dentry))
>                         goto out_path_put;
> There it's much harder to hit, though - we need it not just d_instantiate()
> overlapping those lines; we need the racing syscall to get from locking
> the parent to d_instantiate() between the point where lookup_slow() has
> unlocked the parent and d_is_negative().  And lookup_slow() couldn't have
> gone into mountpoint crossing, so it's really hard to hit - you pretty
> much have to get preempted just after fetching inode.
>
> OK, the next delta to try, and there definitely are several commits in
> that pile.  It still does not explain the traces with GPF at 0x50, though -
> for that we need not just a NULL getting to should_follow_link() but
> something non-NULL with NULL at offset 40 from it (offset of ->i_sb in
> struct inode).  That something *can't* be a valid struct inode or had been
> one in recent past - ->i_sb is assigned in new_inode(), value is always
> non-NULL and never modified all the way until RCU-delayed freeing of struct
> inode.  So that has to be something entirely different...  Anyway, the
> patch so far follows:


Restarted testing with this patch (dropped previous patches because
they conflict).

These "unlikely" scenarios can be more likely inside of VMs where
effective preemption can happen at random points. Also NMIs probably
can increase probability of such races.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Vyukov Feb. 24, 2016, 10:15 a.m. UTC | #2
On Wed, Feb 24, 2016 at 11:03 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Feb 24, 2016 at 5:46 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Wed, Feb 24, 2016 at 11:12:13AM +0800, Ian Kent wrote:
>>> On Wed, 2016-02-17 at 00:40 +0100, Mickaël Salaün wrote:
>>> > Hi,
>>> >
>>> > Actually I found the same bug (without fuzzing) and I can reproduce it
>>> > in a deterministic way (e.g. by creating a LSM that return 1 for the
>>> > security_file_open hook). At least, from v4.2.8 I can easily trigger
>>> > traces like this :
>>>
>>> Reading through this thread I wonder if this is a new problem.
>>>
>>> Is there a previous kernel it can't be reproduced on?
>>> Perhaps a bisect will shed some light on what's happening.
>>
>> There are several things in the mix.  What Mickaël has found is that a bunch
>> of places where _positive_ number returned instead of expected 0 or -E... can
>> get propagated all way back to caller of do_last(), confusing the hell out
>> of it.  That's not what Dmitry has triggered, though.  WARN_ON() in the
>> end of do_last() would've triggered, and IMO this one, along with mitigation
>> (map that "error value" to -EINVAL) should go into mainline and all -stable.
>> Bogus ->open() returning a positive number had always been bad news; in the
>> best case it would be returned to userland, leading to "open(2) has failed
>> and returned a positive number".  Hell knows if we ever had such instances
>> (or have them right now), but I wouldn't bet on their absense.  Rare
>> failure exits returning bogus values in an ->open() instance in some driver
>> can easily stay unnoticed for a long time.  Starting from at least 3.6
>> (circa the atomic_open support) it got more unpleasant than simple "confuse
>> the hell out of userland".  ->open() isn't the only vector for injection of
>> such crap - ->permission() would also serve, same for several LSM turds, etc.
>>
>> Again, that's a separate problem.  What Dmitry seems to be catching is getting
>> crap values fed to should_follow_link() as inode.  I see one bug in that
>> area that does need fixing (fix present in the last patch I've posted, with
>> WARN_ON() to indicate that this thing has triggered; _that_ WARN_ON() should
>> be gone from the final variant, since this can trigger without driver bugs,
>> etc.)  In RCU mode after we'd checked that dentry is a symlink one, we need
>> to verify that it hadn't been changed since before we'd fetched the inode.
>> It might have been e.g. a regular file, which got unlinked with symlink
>> created in its place.  Then we'd go into get_link() with non-symlink inode
>> and oops on attempt to call its ->i_op->get_link().  That race is real, very
>> hard to hit (you need both the unlink(2) and symlink(2) to happen between
>> lookup_fast() and should_follow_link() and unless you have PREEMPT_RCU you
>> can't lose the timeslice there) and would've manifested differently.
>>
>> But that leaves other two kinds of bugs getting triggered: inode of some
>> non-symlink is possible, but what we saw included NULL inode when we'd
>> reached finish_open: in do_last().  Should be flat-out impossible - we either
>> get lookup_fast(..., &inode, ...) return 0 and store NULL in inode, or
>> get NULL inode from pinned d_is_symlink() dentry after having grabbed
>> a reference and left RCU mode.  Neither should be possible without either
>> something weird happening to lookup_fast() (and we would've seen oopsen in
>> link_path_walk() if that could happen, BTW) or screwed dentry refcounting
>> somewhere, combined with a race that managed to turn...
>>
>> Oh, shit.  No screwed refcounting is needed.  Look:
>>         BUG_ON(nd->flags & LOOKUP_RCU);
>>         inode = d_backing_inode(path.dentry);
>>         seq = 0;        /* out of RCU mode, so the value doesn't matter */
>>         if (unlikely(d_is_negative(path.dentry))) {
>>                 path_to_nameidata(&path, nd);
>>                 return -ENOENT;
>>         }
>> Suppose we come here with negative path.dentry.  We are holding a reference,
>> all right, and for a _postive_ dentry that would've been enough to keep
>> it positive.  Not so for a negative one, though - symlink(2) on another
>> CPU doint d_instantiate() just before the d_is_negative() check and we
>> are fucked - inode is stale and we sail through all the checks, all the
>> way into should_follow_link().
>>
>> We also have the same kind of crap in walk_component() -
>>                 err = lookup_slow(nd, &path);
>>                 if (err < 0)
>>                         return err;
>>                 inode = d_backing_inode(path.dentry);
>>                 seq = 0;        /* we are already out of RCU mode */
>>                 err = -ENOENT;
>>                 if (d_is_negative(path.dentry))
>>                         goto out_path_put;
>> There it's much harder to hit, though - we need it not just d_instantiate()
>> overlapping those lines; we need the racing syscall to get from locking
>> the parent to d_instantiate() between the point where lookup_slow() has
>> unlocked the parent and d_is_negative().  And lookup_slow() couldn't have
>> gone into mountpoint crossing, so it's really hard to hit - you pretty
>> much have to get preempted just after fetching inode.
>>
>> OK, the next delta to try, and there definitely are several commits in
>> that pile.  It still does not explain the traces with GPF at 0x50, though -
>> for that we need not just a NULL getting to should_follow_link() but
>> something non-NULL with NULL at offset 40 from it (offset of ->i_sb in
>> struct inode).  That something *can't* be a valid struct inode or had been
>> one in recent past - ->i_sb is assigned in new_inode(), value is always
>> non-NULL and never modified all the way until RCU-delayed freeing of struct
>> inode.  So that has to be something entirely different...  Anyway, the
>> patch so far follows:
>
>
> Restarted testing with this patch (dropped previous patches because
> they conflict).
>
> These "unlikely" scenarios can be more likely inside of VMs where
> effective preemption can happen at random points. Also NMIs probably
> can increase probability of such races.


For now I can only say that I am hitting this one (3 times in 20 mins):

                if (read_seqcount_retry(&link->dentry->d_seq, seq)) {
                        WARN_ON(1);     // just as way to report hitting
                                        // that path; it's OK to get
                                        // here, in the final variant
                                        // WARN_ON will disappear.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Vyukov Feb. 24, 2016, 1:35 p.m. UTC | #3
On Wed, Feb 24, 2016 at 11:15 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Feb 24, 2016 at 11:03 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Wed, Feb 24, 2016 at 5:46 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> On Wed, Feb 24, 2016 at 11:12:13AM +0800, Ian Kent wrote:
>>>> On Wed, 2016-02-17 at 00:40 +0100, Mickaël Salaün wrote:
>>>> > Hi,
>>>> >
>>>> > Actually I found the same bug (without fuzzing) and I can reproduce it
>>>> > in a deterministic way (e.g. by creating a LSM that return 1 for the
>>>> > security_file_open hook). At least, from v4.2.8 I can easily trigger
>>>> > traces like this :
>>>>
>>>> Reading through this thread I wonder if this is a new problem.
>>>>
>>>> Is there a previous kernel it can't be reproduced on?
>>>> Perhaps a bisect will shed some light on what's happening.
>>>
>>> There are several things in the mix.  What Mickaël has found is that a bunch
>>> of places where _positive_ number returned instead of expected 0 or -E... can
>>> get propagated all way back to caller of do_last(), confusing the hell out
>>> of it.  That's not what Dmitry has triggered, though.  WARN_ON() in the
>>> end of do_last() would've triggered, and IMO this one, along with mitigation
>>> (map that "error value" to -EINVAL) should go into mainline and all -stable.
>>> Bogus ->open() returning a positive number had always been bad news; in the
>>> best case it would be returned to userland, leading to "open(2) has failed
>>> and returned a positive number".  Hell knows if we ever had such instances
>>> (or have them right now), but I wouldn't bet on their absense.  Rare
>>> failure exits returning bogus values in an ->open() instance in some driver
>>> can easily stay unnoticed for a long time.  Starting from at least 3.6
>>> (circa the atomic_open support) it got more unpleasant than simple "confuse
>>> the hell out of userland".  ->open() isn't the only vector for injection of
>>> such crap - ->permission() would also serve, same for several LSM turds, etc.
>>>
>>> Again, that's a separate problem.  What Dmitry seems to be catching is getting
>>> crap values fed to should_follow_link() as inode.  I see one bug in that
>>> area that does need fixing (fix present in the last patch I've posted, with
>>> WARN_ON() to indicate that this thing has triggered; _that_ WARN_ON() should
>>> be gone from the final variant, since this can trigger without driver bugs,
>>> etc.)  In RCU mode after we'd checked that dentry is a symlink one, we need
>>> to verify that it hadn't been changed since before we'd fetched the inode.
>>> It might have been e.g. a regular file, which got unlinked with symlink
>>> created in its place.  Then we'd go into get_link() with non-symlink inode
>>> and oops on attempt to call its ->i_op->get_link().  That race is real, very
>>> hard to hit (you need both the unlink(2) and symlink(2) to happen between
>>> lookup_fast() and should_follow_link() and unless you have PREEMPT_RCU you
>>> can't lose the timeslice there) and would've manifested differently.
>>>
>>> But that leaves other two kinds of bugs getting triggered: inode of some
>>> non-symlink is possible, but what we saw included NULL inode when we'd
>>> reached finish_open: in do_last().  Should be flat-out impossible - we either
>>> get lookup_fast(..., &inode, ...) return 0 and store NULL in inode, or
>>> get NULL inode from pinned d_is_symlink() dentry after having grabbed
>>> a reference and left RCU mode.  Neither should be possible without either
>>> something weird happening to lookup_fast() (and we would've seen oopsen in
>>> link_path_walk() if that could happen, BTW) or screwed dentry refcounting
>>> somewhere, combined with a race that managed to turn...
>>>
>>> Oh, shit.  No screwed refcounting is needed.  Look:
>>>         BUG_ON(nd->flags & LOOKUP_RCU);
>>>         inode = d_backing_inode(path.dentry);
>>>         seq = 0;        /* out of RCU mode, so the value doesn't matter */
>>>         if (unlikely(d_is_negative(path.dentry))) {
>>>                 path_to_nameidata(&path, nd);
>>>                 return -ENOENT;
>>>         }
>>> Suppose we come here with negative path.dentry.  We are holding a reference,
>>> all right, and for a _postive_ dentry that would've been enough to keep
>>> it positive.  Not so for a negative one, though - symlink(2) on another
>>> CPU doint d_instantiate() just before the d_is_negative() check and we
>>> are fucked - inode is stale and we sail through all the checks, all the
>>> way into should_follow_link().
>>>
>>> We also have the same kind of crap in walk_component() -
>>>                 err = lookup_slow(nd, &path);
>>>                 if (err < 0)
>>>                         return err;
>>>                 inode = d_backing_inode(path.dentry);
>>>                 seq = 0;        /* we are already out of RCU mode */
>>>                 err = -ENOENT;
>>>                 if (d_is_negative(path.dentry))
>>>                         goto out_path_put;
>>> There it's much harder to hit, though - we need it not just d_instantiate()
>>> overlapping those lines; we need the racing syscall to get from locking
>>> the parent to d_instantiate() between the point where lookup_slow() has
>>> unlocked the parent and d_is_negative().  And lookup_slow() couldn't have
>>> gone into mountpoint crossing, so it's really hard to hit - you pretty
>>> much have to get preempted just after fetching inode.
>>>
>>> OK, the next delta to try, and there definitely are several commits in
>>> that pile.  It still does not explain the traces with GPF at 0x50, though -
>>> for that we need not just a NULL getting to should_follow_link() but
>>> something non-NULL with NULL at offset 40 from it (offset of ->i_sb in
>>> struct inode).  That something *can't* be a valid struct inode or had been
>>> one in recent past - ->i_sb is assigned in new_inode(), value is always
>>> non-NULL and never modified all the way until RCU-delayed freeing of struct
>>> inode.  So that has to be something entirely different...  Anyway, the
>>> patch so far follows:



Fired after some time:

[ 3491.147607] ------------[ cut here ]------------
[ 3491.147986] WARNING: CPU: 1 PID: 17000 at fs/namei.c:1728
path_openat+0x14e5/0x1560()
[ 3491.148576] Modules linked in:
[ 3491.148818] CPU: 1 PID: 17000 Comm: syz-executor Tainted: G
W       4.5.0-rc5+ #72
[ 3491.149530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 3491.149530]  0000000000000000 ffff88002d6ebcb8 ffffffff8194ad79
0000000000000000
[ 3491.152103]  ffffffff8334501c ffff88002d6ebcf0 ffffffff81172291
0000000000000000
[ 3491.152103]  ffff88002d6ebd98 ffff88002d6ebde0 0000000000048000
ffff88002d6ebefc
[ 3491.155310] Call Trace:
[ 3491.155310]  [<ffffffff8194ad79>] dump_stack+0x99/0xd0
[ 3491.155310]  [<ffffffff81172291>] warn_slowpath_common+0x81/0xc0
[ 3491.157483]  [<ffffffff81172385>] warn_slowpath_null+0x15/0x20
[ 3491.157483]  [<ffffffff81318495>] path_openat+0x14e5/0x1560
[ 3491.157483]  [<ffffffff81319549>] do_filp_open+0x79/0xd0
[ 3491.157483]  [<ffffffff82b11d92>] ? _raw_spin_unlock+0x22/0x30
[ 3491.157483]  [<ffffffff81328f08>] ? __alloc_fd+0xf8/0x200
[ 3491.157483]  [<ffffffff81306eb0>] do_sys_open+0x110/0x1f0
[ 3491.157483]  [<ffffffff81306fbf>] SyS_openat+0xf/0x20
[ 3491.157483]  [<ffffffff82b12836>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 3491.161521] ---[ end trace c65af9be7536efae ]---
[ 3491.161838] BUG: unable to handle kernel NULL pointer dereference
at 000000000000000c
[ 3491.162339] IP: [<ffffffff81327429>] atime_needs_update+0x9/0xc0
[ 3491.162577] PGD 2d626067 PUD 2c4cc067 PMD 0
[ 3491.163036] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 3491.163039] Modules linked in:
[ 3491.163039] CPU: 1 PID: 17000 Comm: syz-executor Tainted: G
W       4.5.0-rc5+ #72
[ 3491.163039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 3491.164386] task: ffff88002c9196c0 ti: ffff88002d6e8000 task.ti:
ffff88002d6e8000
[ 3491.164386] RIP: 0010:[<ffffffff81327429>]  [<ffffffff81327429>]
atime_needs_update+0x9/0xc0
[ 3491.164386] RSP: 0018:ffff88002d6ebcb0  EFLAGS: 00010282
[ 3491.165591] RAX: 0000000000000030 RBX: ffff88002d6ebde0 RCX: 0000000000000002
[ 3491.165591] RDX: ffff88002d6ebe38 RSI: 0000000000000000 RDI: ffff88002d6ebe38
[ 3491.165591] RBP: ffff88002d6ebcc0 R08: ffff88002d6ebe38 R09: 0000000000000001
[ 3491.167202] R10: 0000000000000001 R11: 0000000000001816 R12: 0000000000000000
[ 3491.167368] R13: ffff88002d6ebe38 R14: ffff880072508980 R15: ffff88002d6ebefc
[ 3491.167368] FS:  00007f02454e5700(0000) GS:ffff88003ed00000(0000)
knlGS:0000000000000000
[ 3491.167368] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3491.167368] CR2: 000000000000000c CR3: 000000002cb5f000 CR4: 00000000000006e0
[ 3491.167368] Stack:
[ 3491.169740]  0000000000000000 ffff88002d6ebde0 ffff88002d6ebd00
ffffffff81314922
[ 3491.169740]  0000000000000001 0000000000000000 ffff88002d6ebd98
ffff88002d6ebde0
[ 3491.169740]  0000000000048000 ffff88002d6ebefc ffff88002d6ebdd0
ffffffff8131729d
[ 3491.169740] Call Trace:
[ 3491.169740]  [<ffffffff81314922>] trailing_symlink+0x62/0x260
[ 3491.169740]  [<ffffffff8131729d>] path_openat+0x2ed/0x1560
[ 3491.169740]  [<ffffffff81319549>] do_filp_open+0x79/0xd0
[ 3491.169740]  [<ffffffff82b11d92>] ? _raw_spin_unlock+0x22/0x30
[ 3491.169740]  [<ffffffff81328f08>] ? __alloc_fd+0xf8/0x200
[ 3491.169740]  [<ffffffff81306eb0>] do_sys_open+0x110/0x1f0
[ 3491.169740]  [<ffffffff81306fbf>] SyS_openat+0xf/0x20
[ 3491.169740]  [<ffffffff82b12836>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 3491.169740] Code: ff ff ff 48 85 c0 48 89 c3 74 08 48 89 c7 e8 ef
dc ff ff 48 89 d8 5b 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
48 83 ec 08 <f6> 46 0c 02 48 8b 1f 75 6b 48 8b 7e 28 48 8b 47 50 a9 01
04 00
[ 3491.169740] RIP  [<ffffffff81327429>] atime_needs_update+0x9/0xc0
[ 3491.169740]  RSP <ffff88002d6ebcb0>
[ 3491.169740] CR2: 000000000000000c
[ 3491.195051] ---[ end trace c65af9be7536efaf ]---
[ 3491.195051] BUG: sleeping function called from invalid context at
include/linux/sched.h:2795
[ 3491.195051] in_atomic(): 1, irqs_disabled(): 1, pid: 17000, name:
syz-executor
[ 3491.195051] INFO: lockdep is turned off.
[ 3491.195051] irq event stamp: 622
[ 3491.195051] hardirqs last  enabled at (621): [<ffffffff811d9926>]
vprintk_emit+0x2d6/0x5f0
[ 3491.195051] hardirqs last disabled at (622): [<ffffffff82b14d49>]
error_entry+0x69/0xc0
[ 3491.195051] softirqs last  enabled at (618): [<ffffffff81178172>]
__do_softirq+0x222/0x4a0
[ 3491.195051] softirqs last disabled at (595): [<ffffffff81178767>]
irq_exit+0xa7/0xc0
[ 3491.195051] CPU: 1 PID: 17000 Comm: syz-executor Tainted: G      D
W       4.5.0-rc5+ #72
[ 3491.195051] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 3491.195051]  0000000000000000 ffff88002d6eb9a8 ffffffff8194ad79
ffff88002c9196c0
[ 3491.195051]  0000000000004268 ffff88002d6eb9d0 ffffffff811a0659
ffffffff8329b7fb
[ 3491.195051]  0000000000000aeb 0000000000000000 ffff88002d6eb9f8
ffffffff811a0764
[ 3491.195051] Call Trace:
[ 3491.195051]  [<ffffffff8194ad79>] dump_stack+0x99/0xd0
[ 3491.195051]  [<ffffffff811a0659>] ___might_sleep+0x179/0x240
[ 3491.195051]  [<ffffffff811a0764>] __might_sleep+0x44/0x80
[ 3491.195051]  [<ffffffff811842cf>] exit_signals+0x1f/0x130
[ 3491.195051]  [<ffffffff811756ef>] do_exit+0xbf/0xd10
[ 3491.195051]  [<ffffffff811da524>] ? kmsg_dump+0x104/0x180
[ 3491.195051]  [<ffffffff8108549f>] oops_end+0x9f/0xe0
[ 3491.195051]  [<ffffffff810ce6e8>] no_context+0x108/0x390
[ 3491.195051]  [<ffffffff810cea8d>] __bad_area_nosemaphore+0x11d/0x220
[ 3491.195051]  [<ffffffff810ceb9e>] bad_area_nosemaphore+0xe/0x10
[ 3491.195051]  [<ffffffff810cf2c4>] __do_page_fault+0x84/0x470
[ 3491.195051]  [<ffffffff810cf764>] trace_do_page_fault+0x74/0x2c0
[ 3491.195051]  [<ffffffff810c9ba4>] do_async_page_fault+0x14/0x90
[ 3491.195051]  [<ffffffff82b14b78>] async_page_fault+0x28/0x30
[ 3491.195051]  [<ffffffff81327429>] ? atime_needs_update+0x9/0xc0
[ 3491.195051]  [<ffffffff81314922>] trailing_symlink+0x62/0x260
[ 3491.195051]  [<ffffffff8131729d>] path_openat+0x2ed/0x1560
[ 3491.195051]  [<ffffffff81319549>] do_filp_open+0x79/0xd0
[ 3491.195051]  [<ffffffff82b11d92>] ? _raw_spin_unlock+0x22/0x30
[ 3491.195051]  [<ffffffff81328f08>] ? __alloc_fd+0xf8/0x200
[ 3491.195051]  [<ffffffff81306eb0>] do_sys_open+0x110/0x1f0
[ 3491.195051]  [<ffffffff81306fbf>] SyS_openat+0xf/0x20
[ 3491.195051]  [<ffffffff82b12836>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 3491.284831] note: syz-executor[17000] exited with preempt_count 1



The warning is this one:

static inline int should_follow_link(struct nameidata *nd, struct path *link,
                                     int follow,
                                     struct inode *inode, unsigned seq)
{
....
        WARN_ON(!inode);                // now, _that_ should not happen.
        return pick_link(nd, link, inode, seq);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index c6d7d3d..86f81e3 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -323,6 +323,7 @@  static struct dentry *autofs4_mountpoint_changed(struct path *path)
 		struct dentry *new = d_lookup(parent, &dentry->d_name);
 		if (!new)
 			return NULL;
+		WARN_ON(d_is_negative(new));
 		ino = autofs4_dentry_ino(new);
 		ino->last_used = jiffies;
 		dput(path->dentry);
diff --git a/fs/namei.c b/fs/namei.c
index f624d13..a5bcf63 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1209,6 +1209,7 @@  static int follow_managed(struct path *path, struct nameidata *nd)
 		/* Handle an automount point */
 		if (managed & DCACHE_NEED_AUTOMOUNT) {
 			ret = follow_automount(path, nd, &need_mntput);
+			WARN_ON(d_is_negative(path->dentry));
 			if (ret < 0)
 				break;
 			continue;
@@ -1613,8 +1614,10 @@  unlazy:
 	path->mnt = mnt;
 	path->dentry = dentry;
 	err = follow_managed(path, nd);
-	if (likely(!err))
+	if (likely(!err)) {
 		*inode = d_backing_inode(path->dentry);
+		WARN_ON(!inode);
+	}
 	return err;
 
 need_lookup:
@@ -1712,6 +1715,17 @@  static inline int should_follow_link(struct nameidata *nd, struct path *link,
 		return 0;
 	if (!follow)
 		return 0;
+	/* make sure that d_is_symlink above matches inode */
+	if (nd->flags & LOOKUP_RCU) {
+		if (read_seqcount_retry(&link->dentry->d_seq, seq)) {
+			WARN_ON(1);	// just as way to report hitting
+					// that path; it's OK to get
+					// here, in the final variant
+					// WARN_ON will disappear.
+			return -ECHILD;
+		}
+	}
+	WARN_ON(!inode);		// now, _that_ should not happen.
 	return pick_link(nd, link, inode, seq);
 }
 
@@ -1743,11 +1757,11 @@  static int walk_component(struct nameidata *nd, int flags)
 		if (err < 0)
 			return err;
 
-		inode = d_backing_inode(path.dentry);
 		seq = 0;	/* we are already out of RCU mode */
 		err = -ENOENT;
 		if (d_is_negative(path.dentry))
 			goto out_path_put;
+		inode = d_backing_inode(path.dentry);
 	}
 
 	if (flags & WALK_PUT)
@@ -3192,12 +3206,12 @@  retry_lookup:
 		return error;
 
 	BUG_ON(nd->flags & LOOKUP_RCU);
-	inode = d_backing_inode(path.dentry);
 	seq = 0;	/* out of RCU mode, so the value doesn't matter */
 	if (unlikely(d_is_negative(path.dentry))) {
 		path_to_nameidata(&path, nd);
 		return -ENOENT;
 	}
+	inode = d_backing_inode(path.dentry);
 finish_lookup:
 	if (nd->depth)
 		put_link(nd);
@@ -3206,11 +3220,6 @@  finish_lookup:
 	if (unlikely(error))
 		return error;
 
-	if (unlikely(d_is_symlink(path.dentry)) && !(open_flag & O_PATH)) {
-		path_to_nameidata(&path, nd);
-		return -ELOOP;
-	}
-
 	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
 		path_to_nameidata(&path, nd);
 	} else {
@@ -3229,6 +3238,10 @@  finish_open:
 		return error;
 	}
 	audit_inode(nd->name, nd->path.dentry, 0);
+	if (unlikely(d_is_symlink(nd->path.dentry)) && !(open_flag & O_PATH)) {
+		error = -ELOOP;
+		goto out;
+	}
 	error = -EISDIR;
 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
 		goto out;
@@ -3273,6 +3286,10 @@  opened:
 			goto exit_fput;
 	}
 out:
+	if (unlikely(error > 0)) {
+		WARN_ON(1);
+		error = -EINVAL;
+	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
 	path_put(&save_parent);