diff mbox series

[2/2] mm: Fix a huge pud insertion race during faulting

Message ID 20191115115808.21181-2-thomas_os@shipmail.org
State New, archived
Headers show
Series [1/2] mm: Move the backup x_devmap() functions to asm-generic/pgtable.h | expand

Commit Message

Thomas Hellström (Intel) Nov. 15, 2019, 11:58 a.m. UTC
From: Thomas Hellstrom <thellstrom@vmware.com>

A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault(). That will lead to pmd_alloc() returning an
invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
similar to pmd_trans_unstable() and check whether the pud is really stable
before using the pmd pointer.

Race:
Thread 1:             Thread 2:                 Comment
create_huge_pud()                               Fallback - not taken.
		      create_huge_pud()         Taken.
pmd_alloc()                                     Returns an invalid pointer.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 include/asm-generic/pgtable.h | 25 +++++++++++++++++++++++++
 mm/memory.c                   |  6 ++++++
 2 files changed, 31 insertions(+)

Comments

Andrew Morton Nov. 15, 2019, 7:58 p.m. UTC | #1
On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:

> A huge pud page can theoretically be faulted in racing with pmd_alloc()
> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
> similar to pmd_trans_unstable() and check whether the pud is really stable
> before using the pmd pointer.
> 
> Race:
> Thread 1:             Thread 2:                 Comment
> create_huge_pud()                               Fallback - not taken.
> 		      create_huge_pud()         Taken.
> pmd_alloc()                                     Returns an invalid pointer.

What are the user-visible runtime effects of this change?

Is a -stable backport warranted?
Kirill A. Shutemov Nov. 18, 2019, 10:22 a.m. UTC | #2
On Fri, Nov 15, 2019 at 11:58:00AM -0800, Andrew Morton wrote:
> On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:
> 
> > A huge pud page can theoretically be faulted in racing with pmd_alloc()
> > in __handle_mm_fault(). That will lead to pmd_alloc() returning an
> > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
> > similar to pmd_trans_unstable() and check whether the pud is really stable
> > before using the pmd pointer.
> > 
> > Race:
> > Thread 1:             Thread 2:                 Comment
> > create_huge_pud()                               Fallback - not taken.
> > 		      create_huge_pud()         Taken.
> > pmd_alloc()                                     Returns an invalid pointer.
> 
> What are the user-visible runtime effects of this change?

Data corruption: kernel writes to a huge page thing it's page table.

> Is a -stable backport warranted?

I believe it is.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thomas Hellström (Intel) Nov. 18, 2019, 12:58 p.m. UTC | #3
On 11/18/19 11:22 AM, Kirill A. Shutemov wrote:
> On Fri, Nov 15, 2019 at 11:58:00AM -0800, Andrew Morton wrote:
>> On Fri, 15 Nov 2019 12:58:08 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:
>>
>>> A huge pud page can theoretically be faulted in racing with pmd_alloc()
>>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
>>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
>>> similar to pmd_trans_unstable() and check whether the pud is really stable
>>> before using the pmd pointer.
>>>
>>> Race:
>>> Thread 1:             Thread 2:                 Comment
>>> create_huge_pud()                               Fallback - not taken.
>>> 		      create_huge_pud()         Taken.
>>> pmd_alloc()                                     Returns an invalid pointer.
>> What are the user-visible runtime effects of this change?
> Data corruption: kernel writes to a huge page thing it's page table.
>
>> Is a -stable backport warranted?
> I believe it is.

Note that this was caught during a code audit rather than a real 
experienced problem. It looks to me like the only implementation that 
currently creates huge pud pagetable entries is dev_dax_huge_fault() 
which doesn't appear to care much about private (COW) mappings or 
write-tracking which is, I believe, a prerequisite for create_huge_pud() 
falling back on thread 1, but not in thread 2.

This means (assuming that's intentional) that a stable backport 
shouldn't be needed.

For the WIP huge page support for graphics memory we'll be allowing both 
COW mappings and write-tracking, though, but that's still some time away.

In any case, I think this patch needs -rc testing to catch potential 
pud_devmap issues before submitted to stable.

Thanks,

Thomas

>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
Andrew Morton Nov. 18, 2019, 9:02 p.m. UTC | #4
On Mon, 18 Nov 2019 13:58:04 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:

> >> Is a -stable backport warranted?
> > I believe it is.
> 
> Note that this was caught during a code audit rather than a real 
> experienced problem. It looks to me like the only implementation that 
> currently creates huge pud pagetable entries is dev_dax_huge_fault() 
> which doesn't appear to care much about private (COW) mappings or 
> write-tracking which is, I believe, a prerequisite for create_huge_pud() 
> falling back on thread 1, but not in thread 2.
> 
> This means (assuming that's intentional) that a stable backport 
> shouldn't be needed.
> 
> For the WIP huge page support for graphics memory we'll be allowing both 
> COW mappings and write-tracking, though, but that's still some time away.
> 
> In any case, I think this patch needs -rc testing to catch potential 
> pud_devmap issues before submitted to stable.

OK, thanks, I'll queue it for 5.5-rc1 with a -stable tag.  Hopefully
that way it will get a bit of exposure before the stable trees pick it
up.  Maybe this is optimistic..
diff mbox series

Patch

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8efa45580fd0..c40a0ced53bd 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -927,6 +927,31 @@  static inline int pud_trans_huge(pud_t pud)
 }
 #endif
 
+/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */
+static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud)
+{
+	pud_t pudval = READ_ONCE(*pud);
+
+	if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval))
+		return 1;
+	if (unlikely(pud_bad(pudval))) {
+		pud_clear_bad(pud);
+		return 1;
+	}
+	return 0;
+}
+
+/* See pmd_trans_unstable for discussion. */
+static inline int pud_trans_unstable(pud_t *pud)
+{
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) &&			\
+	defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+	return pud_none_or_trans_huge_or_dev_or_clear_bad(pud);
+#else
+	return 0;
+#endif
+}
+
 #ifndef pmd_read_atomic
 static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 {
diff --git a/mm/memory.c b/mm/memory.c
index b1ca51a079f2..43ff372f4f07 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3914,6 +3914,7 @@  static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	vmf.pud = pud_alloc(mm, p4d, address);
 	if (!vmf.pud)
 		return VM_FAULT_OOM;
+retry_pud:
 	if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
 		ret = create_huge_pud(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
@@ -3940,6 +3941,11 @@  static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
 	if (!vmf.pmd)
 		return VM_FAULT_OOM;
+
+	/* Huge pud page fault raced with pmd_alloc? */
+	if (pud_trans_unstable(vmf.pud))
+		goto retry_pud;
+
 	if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
 		ret = create_huge_pmd(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))