[v5,13/32] x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
diff mbox

Message ID 3213d3f31959a6467b1feb80a384c1e11341b2be.1468270393.git.luto@kernel.org
State New
Headers show

Commit Message

Andy Lutomirski July 11, 2016, 8:53 p.m. UTC
If we get a vmalloc fault while current->active_mm->pgd doesn't
match CR3, we'll crash without this change.  I've seen this failure
mode on heavily instrumented kernels with virtually mapped stacks.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dave Hansen July 12, 2016, 5:51 p.m. UTC | #1
On 07/11/2016 01:53 PM, Andy Lutomirski wrote:
> If we get a vmalloc fault while current->active_mm->pgd doesn't
> match CR3, we'll crash without this change.  I've seen this failure
> mode on heavily instrumented kernels with virtually mapped stacks.

When does this happen, btw?  Crossing page boundaries on the stack
between the time we swap mm's and the time we switch stacks?
Andy Lutomirski July 12, 2016, 6:03 p.m. UTC | #2
On Tue, Jul 12, 2016 at 10:51 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 07/11/2016 01:53 PM, Andy Lutomirski wrote:
>> If we get a vmalloc fault while current->active_mm->pgd doesn't
>> match CR3, we'll crash without this change.  I've seen this failure
>> mode on heavily instrumented kernels with virtually mapped stacks.
>
> When does this happen, btw?  Crossing page boundaries on the stack
> between the time we swap mm's and the time we switch stacks?

This can happen for any vmalloc fault between the mm swap and writing
to current or current->active_mm.  I hit it when playing with KASAN
during the first clone.  (KASAN has other issues, but this was one of
them AFAICT.)

--Andy

Patch
diff mbox

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7d1fa7cd2374..ca44e2e7fd00 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -439,7 +439,7 @@  static noinline int vmalloc_fault(unsigned long address)
 	 * happen within a race in page table update. In the later
 	 * case just flush:
 	 */
-	pgd = pgd_offset(current->active_mm, address);
+	pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);
 	pgd_ref = pgd_offset_k(address);
 	if (pgd_none(*pgd_ref))
 		return -1;