Message ID | 20190906135621.16410-1-riteshh@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] vfs: Really check for inode ptr in lookup_fast | expand |
Hello Al Viro, Any comments on this patch? This is the same approach as suggested by you [1] [1]: https://patchwork.kernel.org/patch/10909881/ -ritesh On 9/6/19 7:26 PM, Ritesh Harjani wrote: > d_is_negative can race with d_instantiate_new() > -> __d_set_inode_and_type(). > For e.g. in use cases where Thread-1 is creating > symlink (doing d_instantiate_new()) & Thread-2 is doing > cat of that symlink while doing lookup_fast (via REF-walk- > one such case is, when ->permission returns -ECHILD). > > During this race if __d_set_and_inode_type() does out-of-order > execution and set the dentry->d_flags before setting > dentry->inode, then it can result into following kernel panic. > > This change fixes the issue by directly checking for inode. > > E.g. kernel panic, since inode was NULL. > trailing_symlink() -> may_follow_link() -> inode->i_uid. > Issue signature:- > [NIP : trailing_symlink+80] > [LR : trailing_symlink+1092] > #4 [c00000198069bb70] trailing_symlink at c0000000004bae60 (unreliable) > #5 [c00000198069bc00] path_openat at c0000000004bdd14 > #6 [c00000198069bc90] do_filp_open at c0000000004c0274 > #7 [c00000198069bdb0] do_sys_open at c00000000049b248 > #8 [c00000198069be30] system_call at c00000000000b388 > > Sequence of events:- > Thread-2(Comm: ln) Thread-1(Comm: cat) > > dentry = __d_lookup() //nonRCU > > __d_set_and_inode_type() (Out-of-order execution) > flags = READ_ONCE(dentry->d_flags); > flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); > flags |= type_flags; > WRITE_ONCE(dentry->d_flags, flags); > > if (unlikely(d_is_negative()) // fails > {} > // since d_flags is already updated in > // Thread-2 in parallel but inode > // not yet set. > // d_is_negative returns false > > *inode = d_backing_inode(path->dentry); > // means inode is still NULL > > dentry->d_inode = inode; > > trailing_symlink() > may_follow_link() > inode = nd->link_inode; > // nd->link_inode = NULL > //Then it crashes while > //doing inode->i_uid > > Reported-by: Guang Yuan Wu <wugyuan@cn.ibm.com> > Tested-by: Guang Yuan Wu <wugyuan@cn.ibm.com> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > --- > fs/namei.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 209c51a5226c..b5867fe988e0 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd, > dput(dentry); > return status; > } > - if (unlikely(d_is_negative(dentry))) { > + > + /* > + * Caution: d_is_negative() can race with > + * __d_set_inode_and_type(). > + * For e.g. in use cases where Thread-1 is creating > + * symlink (doing d_instantiate_new()) & Thread-2 is doing > + * cat of that symlink and falling here (via Ref-walk) while > + * doing lookup_fast (one such case is when ->permission > + * returns -ECHILD). > + * Now if __d_set_inode_and_type() does out-of-order execution > + * i.e. it first sets the dentry->d_flags & then dentry->inode > + * then it can result into inode being NULL, causing panic later. > + * Hence directly check if inode is NULL here. > + */ > + if (unlikely(d_really_is_negative(dentry))) { > dput(dentry); > return -ENOENT; > } >
On Fri, 2019-09-06 at 19:26 +0530, Ritesh Harjani wrote: > d_is_negative can race with d_instantiate_new() > -> __d_set_inode_and_type(). > For e.g. in use cases where Thread-1 is creating > symlink (doing d_instantiate_new()) & Thread-2 is doing > cat of that symlink while doing lookup_fast (via REF-walk- > one such case is, when ->permission returns -ECHILD). > > During this race if __d_set_and_inode_type() does out-of-order > execution and set the dentry->d_flags before setting > dentry->inode, then it can result into following kernel panic. > > This change fixes the issue by directly checking for inode. > > E.g. kernel panic, since inode was NULL. > trailing_symlink() -> may_follow_link() -> inode->i_uid. > Issue signature:- > [NIP : trailing_symlink+80] > [LR : trailing_symlink+1092] > #4 [c00000198069bb70] trailing_symlink at c0000000004bae60 (unreliable) > #5 [c00000198069bc00] path_openat at c0000000004bdd14 > #6 [c00000198069bc90] do_filp_open at c0000000004c0274 > #7 [c00000198069bdb0] do_sys_open at c00000000049b248 > #8 [c00000198069be30] system_call at c00000000000b388 > > Sequence of events:- > Thread-2(Comm: ln) Thread-1(Comm: cat) > > dentry = __d_lookup() //nonRCU > > __d_set_and_inode_type() (Out-of-order execution) > flags = READ_ONCE(dentry->d_flags); > flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); > flags |= type_flags; > WRITE_ONCE(dentry->d_flags, flags); > > if (unlikely(d_is_negative()) // fails > {} > // since d_flags is already updated in > // Thread-2 in parallel but inode > // not yet set. > // d_is_negative returns false > > *inode = d_backing_inode(path->dentry); > // means inode is still NULL > > dentry->d_inode = inode; > > trailing_symlink() > may_follow_link() > inode = nd->link_inode; > // nd->link_inode = NULL > //Then it crashes while > //doing inode->i_uid > > Reported-by: Guang Yuan Wu <wugyuan@cn.ibm.com> > Tested-by: Guang Yuan Wu <wugyuan@cn.ibm.com> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> > --- > fs/namei.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 209c51a5226c..b5867fe988e0 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd, > dput(dentry); > return status; > } > - if (unlikely(d_is_negative(dentry))) { > + > + /* > + * Caution: d_is_negative() can race with > + * __d_set_inode_and_type(). > + * For e.g. in use cases where Thread-1 is creating > + * symlink (doing d_instantiate_new()) & Thread-2 is doing > + * cat of that symlink and falling here (via Ref-walk) while > + * doing lookup_fast (one such case is when ->permission > + * returns -ECHILD). > + * Now if __d_set_inode_and_type() does out-of-order execution > + * i.e. it first sets the dentry->d_flags & then dentry->inode > + * then it can result into inode being NULL, causing panic later. > + * Hence directly check if inode is NULL here. > + */ > + if (unlikely(d_really_is_negative(dentry))) { > dput(dentry); > return -ENOENT; > } Looks reasonable to me. The only alternative I see is to put the barriers back in there (which still might not be a bad idea), but this should at least address this race. Acked-by: Jeff Layton <jlayton@kernel.org>
diff --git a/fs/namei.c b/fs/namei.c index 209c51a5226c..b5867fe988e0 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd, dput(dentry); return status; } - if (unlikely(d_is_negative(dentry))) { + + /* + * Caution: d_is_negative() can race with + * __d_set_inode_and_type(). + * For e.g. in use cases where Thread-1 is creating + * symlink (doing d_instantiate_new()) & Thread-2 is doing + * cat of that symlink and falling here (via Ref-walk) while + * doing lookup_fast (one such case is when ->permission + * returns -ECHILD). + * Now if __d_set_inode_and_type() does out-of-order execution + * i.e. it first sets the dentry->d_flags & then dentry->inode + * then it can result into inode being NULL, causing panic later. + * Hence directly check if inode is NULL here. + */ + if (unlikely(d_really_is_negative(dentry))) { dput(dentry); return -ENOENT; }