diff mbox series

[1/1] vfs: Really check for inode ptr in lookup_fast

Message ID 20190906135621.16410-1-riteshh@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [1/1] vfs: Really check for inode ptr in lookup_fast | expand

Commit Message

Ritesh Harjani Sept. 6, 2019, 1:56 p.m. UTC
d_is_negative can race with d_instantiate_new()
-> __d_set_inode_and_type().
For e.g. in use cases where Thread-1 is creating
symlink (doing d_instantiate_new()) & Thread-2 is doing
cat of that symlink while doing lookup_fast (via REF-walk-
one such case is, when ->permission returns -ECHILD).

During this race if __d_set_and_inode_type() does out-of-order
execution and set the dentry->d_flags before setting
dentry->inode, then it can result into following kernel panic.

This change fixes the issue by directly checking for inode.

E.g. kernel panic, since inode was NULL.
trailing_symlink() -> may_follow_link() -> inode->i_uid.
Issue signature:-
  [NIP  : trailing_symlink+80]
  [LR   : trailing_symlink+1092]
  #4 [c00000198069bb70] trailing_symlink at c0000000004bae60  (unreliable)
  #5 [c00000198069bc00] path_openat at c0000000004bdd14
  #6 [c00000198069bc90] do_filp_open at c0000000004c0274
  #7 [c00000198069bdb0] do_sys_open at c00000000049b248
  #8 [c00000198069be30] system_call at c00000000000b388

Sequence of events:-
Thread-2(Comm: ln) 	       Thread-1(Comm: cat)

	                dentry = __d_lookup() //nonRCU

__d_set_and_inode_type() (Out-of-order execution)
    flags = READ_ONCE(dentry->d_flags);
    flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
    flags |= type_flags;
    WRITE_ONCE(dentry->d_flags, flags);

	                if (unlikely(d_is_negative()) // fails
	                       {}
	                // since d_flags is already updated in
	                // Thread-2 in parallel but inode
	                // not yet set.
	                // d_is_negative returns false

	                *inode = d_backing_inode(path->dentry);
	                // means inode is still NULL

    dentry->d_inode = inode;

	                trailing_symlink()
	                    may_follow_link()
	                        inode = nd->link_inode;
	                        // nd->link_inode = NULL
	                        //Then it crashes while
	                        //doing inode->i_uid

Reported-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
Tested-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/namei.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Ritesh Harjani Sept. 16, 2019, 3:22 a.m. UTC | #1
Hello Al Viro,

Any comments on this patch?
This is the same approach as suggested by you [1]

[1]: https://patchwork.kernel.org/patch/10909881/

-ritesh


On 9/6/19 7:26 PM, Ritesh Harjani wrote:
> d_is_negative can race with d_instantiate_new()
> -> __d_set_inode_and_type().
> For e.g. in use cases where Thread-1 is creating
> symlink (doing d_instantiate_new()) & Thread-2 is doing
> cat of that symlink while doing lookup_fast (via REF-walk-
> one such case is, when ->permission returns -ECHILD).
> 
> During this race if __d_set_and_inode_type() does out-of-order
> execution and set the dentry->d_flags before setting
> dentry->inode, then it can result into following kernel panic.
> 
> This change fixes the issue by directly checking for inode.
> 
> E.g. kernel panic, since inode was NULL.
> trailing_symlink() -> may_follow_link() -> inode->i_uid.
> Issue signature:-
>    [NIP  : trailing_symlink+80]
>    [LR   : trailing_symlink+1092]
>    #4 [c00000198069bb70] trailing_symlink at c0000000004bae60  (unreliable)
>    #5 [c00000198069bc00] path_openat at c0000000004bdd14
>    #6 [c00000198069bc90] do_filp_open at c0000000004c0274
>    #7 [c00000198069bdb0] do_sys_open at c00000000049b248
>    #8 [c00000198069be30] system_call at c00000000000b388
> 
> Sequence of events:-
> Thread-2(Comm: ln) 	       Thread-1(Comm: cat)
> 
> 	                dentry = __d_lookup() //nonRCU
> 
> __d_set_and_inode_type() (Out-of-order execution)
>      flags = READ_ONCE(dentry->d_flags);
>      flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>      flags |= type_flags;
>      WRITE_ONCE(dentry->d_flags, flags);
> 
> 	                if (unlikely(d_is_negative()) // fails
> 	                       {}
> 	                // since d_flags is already updated in
> 	                // Thread-2 in parallel but inode
> 	                // not yet set.
> 	                // d_is_negative returns false
> 
> 	                *inode = d_backing_inode(path->dentry);
> 	                // means inode is still NULL
> 
>      dentry->d_inode = inode;
> 
> 	                trailing_symlink()
> 	                    may_follow_link()
> 	                        inode = nd->link_inode;
> 	                        // nd->link_inode = NULL
> 	                        //Then it crashes while
> 	                        //doing inode->i_uid
> 
> Reported-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
> Tested-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> ---
>   fs/namei.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..b5867fe988e0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd,
>   		dput(dentry);
>   		return status;
>   	}
> -	if (unlikely(d_is_negative(dentry))) {
> +
> +	/*
> +	 * Caution: d_is_negative() can race with
> +	 * __d_set_inode_and_type().
> +	 * For e.g. in use cases where Thread-1 is creating
> +	 * symlink (doing d_instantiate_new()) & Thread-2 is doing
> +	 * cat of that symlink and falling here (via Ref-walk) while
> +	 * doing lookup_fast (one such case is when ->permission
> +	 * returns -ECHILD).
> +	 * Now if __d_set_inode_and_type() does out-of-order execution
> +	 * i.e. it first sets the dentry->d_flags & then dentry->inode
> +	 * then it can result into inode being NULL, causing panic later.
> +	 * Hence directly check if inode is NULL here.
> +	 */
> +	if (unlikely(d_really_is_negative(dentry))) {
>   		dput(dentry);
>   		return -ENOENT;
>   	}
>
Jeff Layton Sept. 16, 2019, 9:54 p.m. UTC | #2
On Fri, 2019-09-06 at 19:26 +0530, Ritesh Harjani wrote:
> d_is_negative can race with d_instantiate_new()
> -> __d_set_inode_and_type().
> For e.g. in use cases where Thread-1 is creating
> symlink (doing d_instantiate_new()) & Thread-2 is doing
> cat of that symlink while doing lookup_fast (via REF-walk-
> one such case is, when ->permission returns -ECHILD).
> 
> During this race if __d_set_and_inode_type() does out-of-order
> execution and set the dentry->d_flags before setting
> dentry->inode, then it can result into following kernel panic.
> 
> This change fixes the issue by directly checking for inode.
> 
> E.g. kernel panic, since inode was NULL.
> trailing_symlink() -> may_follow_link() -> inode->i_uid.
> Issue signature:-
>   [NIP  : trailing_symlink+80]
>   [LR   : trailing_symlink+1092]
>   #4 [c00000198069bb70] trailing_symlink at c0000000004bae60  (unreliable)
>   #5 [c00000198069bc00] path_openat at c0000000004bdd14
>   #6 [c00000198069bc90] do_filp_open at c0000000004c0274
>   #7 [c00000198069bdb0] do_sys_open at c00000000049b248
>   #8 [c00000198069be30] system_call at c00000000000b388
> 
> Sequence of events:-
> Thread-2(Comm: ln) 	       Thread-1(Comm: cat)
> 
> 	                dentry = __d_lookup() //nonRCU
> 
> __d_set_and_inode_type() (Out-of-order execution)
>     flags = READ_ONCE(dentry->d_flags);
>     flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>     flags |= type_flags;
>     WRITE_ONCE(dentry->d_flags, flags);
> 
> 	                if (unlikely(d_is_negative()) // fails
> 	                       {}
> 	                // since d_flags is already updated in
> 	                // Thread-2 in parallel but inode
> 	                // not yet set.
> 	                // d_is_negative returns false
> 
> 	                *inode = d_backing_inode(path->dentry);
> 	                // means inode is still NULL
> 
>     dentry->d_inode = inode;
> 
> 	                trailing_symlink()
> 	                    may_follow_link()
> 	                        inode = nd->link_inode;
> 	                        // nd->link_inode = NULL
> 	                        //Then it crashes while
> 	                        //doing inode->i_uid
> 
> Reported-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
> Tested-by: Guang Yuan Wu <wugyuan@cn.ibm.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> ---
>  fs/namei.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..b5867fe988e0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd,
>  		dput(dentry);
>  		return status;
>  	}
> -	if (unlikely(d_is_negative(dentry))) {
> +
> +	/*
> +	 * Caution: d_is_negative() can race with
> +	 * __d_set_inode_and_type().
> +	 * For e.g. in use cases where Thread-1 is creating
> +	 * symlink (doing d_instantiate_new()) & Thread-2 is doing
> +	 * cat of that symlink and falling here (via Ref-walk) while
> +	 * doing lookup_fast (one such case is when ->permission
> +	 * returns -ECHILD).
> +	 * Now if __d_set_inode_and_type() does out-of-order execution
> +	 * i.e. it first sets the dentry->d_flags & then dentry->inode
> +	 * then it can result into inode being NULL, causing panic later.
> +	 * Hence directly check if inode is NULL here.
> +	 */
> +	if (unlikely(d_really_is_negative(dentry))) {
>  		dput(dentry);
>  		return -ENOENT;
>  	}

Looks reasonable to me. The only alternative I see is to put the
barriers back in there (which still might not be a bad idea), but this
should at least address this race.

Acked-by: Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..b5867fe988e0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1623,7 +1623,21 @@  static int lookup_fast(struct nameidata *nd,
 		dput(dentry);
 		return status;
 	}
-	if (unlikely(d_is_negative(dentry))) {
+
+	/*
+	 * Caution: d_is_negative() can race with
+	 * __d_set_inode_and_type().
+	 * For e.g. in use cases where Thread-1 is creating
+	 * symlink (doing d_instantiate_new()) & Thread-2 is doing
+	 * cat of that symlink and falling here (via Ref-walk) while
+	 * doing lookup_fast (one such case is when ->permission
+	 * returns -ECHILD).
+	 * Now if __d_set_inode_and_type() does out-of-order execution
+	 * i.e. it first sets the dentry->d_flags & then dentry->inode
+	 * then it can result into inode being NULL, causing panic later.
+	 * Hence directly check if inode is NULL here.
+	 */
+	if (unlikely(d_really_is_negative(dentry))) {
 		dput(dentry);
 		return -ENOENT;
 	}