[-next,v2] fork: annotate a data race in vm_area_dup()
diff mbox series

Message ID 1582122495-12885-1-git-send-email-cai@lca.pw
State New
Headers show
Series
  • [-next,v2] fork: annotate a data race in vm_area_dup()
Related show

Commit Message

Qian Cai Feb. 19, 2020, 2:28 p.m. UTC
struct vm_area_struct could be accessed concurrently as noticed by
KCSAN,

 write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
  vma_interval_tree_insert+0x101/0x150:
  rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
  (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
  __vma_link_file+0x6e/0xe0
  __vma_link_file at mm/mmap.c:629
  vma_link+0xa2/0x120
  mmap_region+0x753/0xb90
  do_mmap+0x45c/0x710
  vm_mmap_pgoff+0xc0/0x130
  ksys_mmap_pgoff+0x1d1/0x300
  __x64_sys_mmap+0x33/0x40
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
  vm_area_dup+0x6a/0xe0
  vm_area_dup at kernel/fork.c:362
  __split_vma+0x72/0x2a0
  __split_vma at mm/mmap.c:2661
  split_vma+0x5a/0x80
  mprotect_fixup+0x368/0x3f0
  do_mprotect_pkey+0x263/0x420
  __x64_sys_mprotect+0x51/0x70
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

vm_area_dup() blindly copies all fields of original VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally
protected by i_mmap_lock. But this is fine because the read value will
be overwritten on the following __vma_link_file() under proper
protection. Thus, mark it as an intentional data race and insert a few
assertions for the fields that should not be modified concurrently.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: Insert a few assertions thanks to Marco.
    Update the commit log thanks to Kirill.

 kernel/fork.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Paul E. McKenney Feb. 19, 2020, 7:04 p.m. UTC | #1
On Wed, Feb 19, 2020 at 09:28:15AM -0500, Qian Cai wrote:
> struct vm_area_struct could be accessed concurrently as noticed by
> KCSAN,
> 
>  write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
>   vma_interval_tree_insert+0x101/0x150:
>   rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
>   (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
>   __vma_link_file+0x6e/0xe0
>   __vma_link_file at mm/mmap.c:629
>   vma_link+0xa2/0x120
>   mmap_region+0x753/0xb90
>   do_mmap+0x45c/0x710
>   vm_mmap_pgoff+0xc0/0x130
>   ksys_mmap_pgoff+0x1d1/0x300
>   __x64_sys_mmap+0x33/0x40
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
>  read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
>   vm_area_dup+0x6a/0xe0
>   vm_area_dup at kernel/fork.c:362
>   __split_vma+0x72/0x2a0
>   __split_vma at mm/mmap.c:2661
>   split_vma+0x5a/0x80
>   mprotect_fixup+0x368/0x3f0
>   do_mprotect_pkey+0x263/0x420
>   __x64_sys_mprotect+0x51/0x70
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> vm_area_dup() blindly copies all fields of original VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally
> protected by i_mmap_lock. But this is fine because the read value will
> be overwritten on the following __vma_link_file() under proper
> protection. Thus, mark it as an intentional data race and insert a few
> assertions for the fields that should not be modified concurrently.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>

Queued for safekeeping on -rcu.  I had to adjust a bit to get it to
apply on -rcu, please see below.  In my experience, git should have
no trouble figuring it out.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit 1228aca56f2a25b67876d8a819437b620a6e1cee
Author: Qian Cai <cai@lca.pw>
Date:   Wed Feb 19 11:00:54 2020 -0800

    fork: Annotate a data race in vm_area_dup()
    
    struct vm_area_struct could be accessed concurrently as noticed by
    KCSAN,
    
     write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
      vma_interval_tree_insert+0x101/0x150:
      rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
      (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
      __vma_link_file+0x6e/0xe0
      __vma_link_file at mm/mmap.c:629
      vma_link+0xa2/0x120
      mmap_region+0x753/0xb90
      do_mmap+0x45c/0x710
      vm_mmap_pgoff+0xc0/0x130
      ksys_mmap_pgoff+0x1d1/0x300
      __x64_sys_mmap+0x33/0x40
      do_syscall_64+0x91/0xc44
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
     read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
      vm_area_dup+0x6a/0xe0
      vm_area_dup at kernel/fork.c:362
      __split_vma+0x72/0x2a0
      __split_vma at mm/mmap.c:2661
      split_vma+0x5a/0x80
      mprotect_fixup+0x368/0x3f0
      do_mprotect_pkey+0x263/0x420
      __x64_sys_mprotect+0x51/0x70
      do_syscall_64+0x91/0xc44
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    vm_area_dup() blindly copies all fields of original VMA to the new one.
    This includes coping vm_area_struct::shared.rb which is normally
    protected by i_mmap_lock. But this is fine because the read value will
    be overwritten on the following __vma_link_file() under proper
    protection. Thus, mark it as an intentional data race and insert a few
    assertions for the fields that should not be modified concurrently.
    
    Signed-off-by: Qian Cai <cai@lca.pw>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/fork.c b/kernel/fork.c
index 60a1295..e592e6f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
 
 	if (new) {
-		*new = *orig;
+		ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+		ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+		/*
+		 * orig->shared.rb may be modified concurrently, but the clone
+		 * will be reinitialized.
+		 */
+		*new = data_race(*orig);
 		INIT_LIST_HEAD(&new->anon_vma_chain);
 	}
 	return new;
Kirill A. Shutemov Feb. 20, 2020, 12:56 p.m. UTC | #2
On Wed, Feb 19, 2020 at 09:28:15AM -0500, Qian Cai wrote:
> struct vm_area_struct could be accessed concurrently as noticed by
> KCSAN,
> 
>  write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
>   vma_interval_tree_insert+0x101/0x150:
>   rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
>   (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
>   __vma_link_file+0x6e/0xe0
>   __vma_link_file at mm/mmap.c:629
>   vma_link+0xa2/0x120
>   mmap_region+0x753/0xb90
>   do_mmap+0x45c/0x710
>   vm_mmap_pgoff+0xc0/0x130
>   ksys_mmap_pgoff+0x1d1/0x300
>   __x64_sys_mmap+0x33/0x40
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
>  read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
>   vm_area_dup+0x6a/0xe0
>   vm_area_dup at kernel/fork.c:362
>   __split_vma+0x72/0x2a0
>   __split_vma at mm/mmap.c:2661
>   split_vma+0x5a/0x80
>   mprotect_fixup+0x368/0x3f0
>   do_mprotect_pkey+0x263/0x420
>   __x64_sys_mprotect+0x51/0x70
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> vm_area_dup() blindly copies all fields of original VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally
> protected by i_mmap_lock. But this is fine because the read value will
> be overwritten on the following __vma_link_file() under proper
> protection. Thus, mark it as an intentional data race and insert a few
> assertions for the fields that should not be modified concurrently.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Patch
diff mbox series

diff --git a/kernel/fork.c b/kernel/fork.c
index cb2ae49e497e..67a5d691ffa8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,13 @@  struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
 
 	if (new) {
-		*new = *orig;
+		ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+		ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+		/*
+		 * orig->shared.rb may be modified concurrently, but the clone
+		 * will be reinitialized.
+		 */
+		*new = data_race(*orig);
 		INIT_LIST_HEAD(&new->anon_vma_chain);
 		new->vm_next = new->vm_prev = NULL;
 	}