mm/memory.c: do_fault: avoid usage of stale vm_area_struct

Message ID	0b7a4604529e16ace8d65a42dac7c78582e7fb28.1551538524.git.jstancek@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of jstancek@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; From: Jan Stancek <jstancek@redhat.com> To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, peterz@infradead.org, riel@surriel.com, mhocko@suse.com, ying.huang@intel.com, jrdr.linux@gmail.com, jglisse@redhat.com, aneesh.kumar@linux.ibm.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, rientjes@google.com, kirill@shutemov.name, mgorman@techsingularity.net, jstancek@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct Date: Sat, 2 Mar 2019 16:11:26 +0100 Message-Id: <0b7a4604529e16ace8d65a42dac7c78582e7fb28.1551538524.git.jstancek@redhat.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/memory.c: do_fault: avoid usage of stale vm_area_struct \| expand mm/memory.c: do_fault: avoid usage of stale vm_area_struct

Message ID

0b7a4604529e16ace8d65a42dac7c78582e7fb28.1551538524.git.jstancek@redhat.com (mailing list archive)

State

New, archived

Headers

Received-SPF: pass (google.com: domain of jstancek@redhat.com designates
 209.132.183.28 as permitted sender) client-ip=209.132.183.28;
From: Jan Stancek <jstancek@redhat.com>
To: linux-mm@kvack.org,
	akpm@linux-foundation.org,
	willy@infradead.org,
	peterz@infradead.org,
	riel@surriel.com,
	mhocko@suse.com,
	ying.huang@intel.com,
	jrdr.linux@gmail.com,
	jglisse@redhat.com,
	aneesh.kumar@linux.ibm.com,
	david@redhat.com,
	aarcange@redhat.com,
	raquini@redhat.com,
	rientjes@google.com,
	kirill@shutemov.name,
	mgorman@techsingularity.net,
	jstancek@redhat.com
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct
Date: Sat,  2 Mar 2019 16:11:26 +0100
Message-Id: 
 <0b7a4604529e16ace8d65a42dac7c78582e7fb28.1551538524.git.jstancek@redhat.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/memory.c: do_fault: avoid usage of stale vm_area_struct | expand

Commit Message

Jan Stancek March 2, 2019, 3:11 p.m. UTC

LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:

  CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
  Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
  Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
  Call Trace:
  ([<0000000000000000>]           (null))
   [<00000000001adae4>] lock_acquire+0xec/0x258
   [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
   [<000000000012a780>] page_table_free+0x48/0x1a8
   [<00000000002f6e54>] do_fault+0xdc/0x670
   [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0
   [<00000000002fb138>] handle_mm_fault+0x1b0/0x320
   [<00000000001248cc>] do_dat_exception+0x19c/0x2c8
   [<000000000080e5ee>] pgm_check_handler+0x19e/0x200

page_table_free() is called with NULL mm parameter, but because
"0" is a valid address on s390 (see S390_lowcore), it keeps
going until it eventually crashes in lockdep's lock_acquire.
This crash is reproducible at least since 4.14.

Problem is that "vmf->vma" used in do_fault() can become stale.
Because mmap_sem may be released, other threads can come in,
call munmap() and cause "vma" be returned to kmem cache, and
get zeroed/re-initialized and re-used:

handle_mm_fault                           |
  __handle_mm_fault                       |
    do_fault                              |
      vma = vmf->vma                      |
      do_read_fault                       |
        __do_fault                        |
          vma->vm_ops->fault(vmf);        |
            mmap_sem is released          |
                                          |
                                          | do_munmap()
                                          |   remove_vma_list()
                                          |     remove_vma()
                                          |       vm_area_free()
                                          |         # vma is released
                                          | ...
                                          | # same vma is allocated
                                          | # from kmem cache
                                          | do_mmap()
                                          |   vm_area_alloc()
                                          |     memset(vma, 0, ...)
                                          |
      pte_free(vma->vm_mm, ...);          |
        page_table_free                   |
          spin_lock_bh(&mm->context.lock);|
            <crash>                       |

This patch pins mm_struct and stores its value, to avoid using
potentially stale "vma" when calling pte_free().

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c

Signed-off-by: Jan Stancek <jstancek@redhat.com>
---
 mm/memory.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox March 2, 2019, 5:10 p.m. UTC | #1

On Sat, Mar 02, 2019 at 04:11:26PM +0100, Jan Stancek wrote:
> Problem is that "vmf->vma" used in do_fault() can become stale.
> Because mmap_sem may be released, other threads can come in,
> call munmap() and cause "vma" be returned to kmem cache, and
> get zeroed/re-initialized and re-used:

> This patch pins mm_struct and stores its value, to avoid using
> potentially stale "vma" when calling pte_free().

OK, we need to cache the mm_struct, but why do we need the extra atomic op?
There's surely no way the mm can be freed while the thread is in the middle
of handling a fault.

ie I would drop these lines:

> +	mmgrab(vm_mm);
> +
...
> +
> +	mmdrop(vm_mm);
> +

Jan Stancek March 2, 2019, 6 p.m. UTC | #2

----- Original Message -----
> On Sat, Mar 02, 2019 at 04:11:26PM +0100, Jan Stancek wrote:
> > Problem is that "vmf->vma" used in do_fault() can become stale.
> > Because mmap_sem may be released, other threads can come in,
> > call munmap() and cause "vma" be returned to kmem cache, and
> > get zeroed/re-initialized and re-used:
> 
> > This patch pins mm_struct and stores its value, to avoid using
> > potentially stale "vma" when calling pte_free().
> 
> OK, we need to cache the mm_struct, but why do we need the extra atomic op?
> There's surely no way the mm can be freed while the thread is in the middle
> of handling a fault.

You're right, I was needlessly paranoid.

> 
> ie I would drop these lines:

I'll send v2.

Thanks,
Jan

> 
> > +	mmgrab(vm_mm);
> > +
> ...
> > +
> > +	mmdrop(vm_mm);
> > +
>

diff --git a/mm/memory.c b/mm/memory.c
index e11ca9dd823f..1287ee9acbdc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3517,12 +3517,17 @@  static vm_fault_t do_shared_fault(struct vm_fault *vmf)
  * but allow concurrent faults).
  * The mmap_sem may have been released depending on flags and our
  * return value.  See filemap_fault() and __lock_page_or_retry().
+ * If mmap_sem is released, vma may become invalid (for example
+ * by other thread calling munmap()).
  */
 static vm_fault_t do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm);
 	vm_fault_t ret;
 
+	mmgrab(vm_mm);
+
 	/*
 	 * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
 	 */
@@ -3561,9 +3566,12 @@  static vm_fault_t do_fault(struct vm_fault *vmf)
 
 	/* preallocated pagetable is unused: free it */
 	if (vmf->prealloc_pte) {
-		pte_free(vma->vm_mm, vmf->prealloc_pte);
+		pte_free(vm_mm, vmf->prealloc_pte);
 		vmf->prealloc_pte = NULL;
 	}
+
+	mmdrop(vm_mm);
+
 	return ret;
 }

mm/memory.c: do_fault: avoid usage of stale vm_area_struct

Commit Message

Comments

Patch