[v2,13/35] x86/mm: attempt speculative mm faults first

Message ID	20220128131006.67712-14-michel@lespinasse.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Michel Lespinasse <michel@lespinasse.org> To: Linux-MM <linux-mm@kvack.org>, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> Cc: kernel-team@fb.com, Laurent Dufour <ldufour@linux.ibm.com>, Jerome Glisse <jglisse@google.com>, Peter Zijlstra <peterz@infradead.org>, Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Davidlohr Bueso <dave@stgolabs.net>, Matthew Wilcox <willy@infradead.org>, Liam Howlett <liam.howlett@oracle.com>, Rik van Riel <riel@surriel.com>, Paul McKenney <paulmck@kernel.org>, Song Liu <songliubraving@fb.com>, Suren Baghdasaryan <surenb@google.com>, Minchan Kim <minchan@google.com>, Joel Fernandes <joelaf@google.com>, David Rientjes <rientjes@google.com>, Axel Rasmussen <axelrasmussen@google.com>, Andy Lutomirski <luto@kernel.org>, Michel Lespinasse <michel@lespinasse.org> Subject: [PATCH v2 13/35] x86/mm: attempt speculative mm faults first Date: Fri, 28 Jan 2022 05:09:44 -0800 Message-Id: <20220128131006.67712-14-michel@lespinasse.org> In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org> References: <20220128131006.67712-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Speculative page faults \| expand [v2,00/35] Speculative page faults [v2,01/35] mm: export dump_mm [v2,02/35] mmap locking API: mmap_lock_is_contended returns a bool [v2,03/35] mmap locking API: name the return values [v2,04/35] do_anonymous_page: use update_mmu_tlb() [v2,05/35] do_anonymous_page: reduce code duplication [v2,06/35] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT [v2,07/35] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT [v2,08/35] mm: add FAULT_FLAG_SPECULATIVE flag [v2,09/35] mm: add do_handle_mm_fault() [v2,10/35] mm: add per-mm mmap sequence counter for speculative page fault handling. [v2,11/35] mm: rcu safe vma freeing [v2,12/35] mm: separate mmap locked assertion from find_vma [v2,13/35] x86/mm: attempt speculative mm faults first [v2,14/35] mm: add speculative_page_walk_begin() and speculative_page_walk_end() [v2,15/35] mm: refactor __handle_mm_fault() / handle_pte_fault() [v2,16/35] mm: implement speculative handling in __handle_mm_fault(). [v2,17/35] mm: add pte_map_lock() and pte_spinlock() [v2,18/35] mm: implement speculative handling in do_anonymous_page() [v2,19/35] mm: enable speculative fault handling through do_anonymous_page() [v2,20/35] mm: implement speculative handling in do_numa_page() [v2,21/35] mm: enable speculative fault handling in do_numa_page() [v2,22/35] percpu-rwsem: enable percpu_sem destruction in atomic context [v2,23/35] mm: add mmu_notifier_lock [v2,24/35] mm: write lock mmu_notifier_lock when registering mmu notifiers [v2,25/35] mm: add mmu_notifier_trylock() and mmu_notifier_unlock() [v2,26/35] mm: implement speculative handling in wp_page_copy() [v2,27/35] mm: implement and enable speculative fault handling in handle_pte_fault() [v2,28/35] mm: disable speculative faults for single threaded user space [v2,29/35] mm: disable rcu safe vma freeing for single threaded user space [v2,30/35] mm: create new include/linux/vm_event.h header file [v2,31/35] mm: anon spf statistics [v2,32/35] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT [v2,33/35] arm64/mm: attempt speculative mm faults first [v2,34/35] powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT [v2,35/35] powerpc/mm: attempt speculative mm faults first

Message ID

20220128131006.67712-14-michel@lespinasse.org (mailing list archive)

State

New

Headers

From: Michel Lespinasse <michel@lespinasse.org>
To: Linux-MM <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Cc: kernel-team@fb.com,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Jerome Glisse <jglisse@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Matthew Wilcox <willy@infradead.org>,
	Liam Howlett <liam.howlett@oracle.com>,
	Rik van Riel <riel@surriel.com>,
	Paul McKenney <paulmck@kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Minchan Kim <minchan@google.com>,
	Joel Fernandes <joelaf@google.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andy Lutomirski <luto@kernel.org>,
	Michel Lespinasse <michel@lespinasse.org>
Subject: [PATCH v2 13/35] x86/mm: attempt speculative mm faults first
Date: Fri, 28 Jan 2022 05:09:44 -0800
Message-Id: <20220128131006.67712-14-michel@lespinasse.org>
In-Reply-To: <20220128131006.67712-1-michel@lespinasse.org>
References: <20220128131006.67712-1-michel@lespinasse.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Speculative page faults | expand

Commit Message

Michel Lespinasse Jan. 28, 2022, 1:09 p.m. UTC

Attempt speculative mm fault handling first, and fall back to the
existing (non-speculative) code if that fails.

The speculative handling closely mirrors the non-speculative logic.
This includes some x86 specific bits such as the access_error() call.
This is why we chose to implement the speculative handling in arch/x86
rather than in common code.

The vma is first looked up and copied, under protection of the rcu
read lock. The mmap lock sequence count is used to verify the
integrity of the copied vma, and passed to do_handle_mm_fault() to
allow checking against races with mmap writers when finalizing the fault.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
---
 arch/x86/mm/fault.c           | 44 +++++++++++++++++++++++++++++++++++
 include/linux/mm_types.h      |  5 ++++
 include/linux/vm_event_item.h |  4 ++++
 mm/vmstat.c                   |  4 ++++
 4 files changed, 57 insertions(+)

Comments

Liam R. Howlett Feb. 1, 2022, 5:16 p.m. UTC | #1

* Michel Lespinasse <michel@lespinasse.org> [220128 08:10]:
> Attempt speculative mm fault handling first, and fall back to the
> existing (non-speculative) code if that fails.
> 
> The speculative handling closely mirrors the non-speculative logic.
> This includes some x86 specific bits such as the access_error() call.
> This is why we chose to implement the speculative handling in arch/x86
> rather than in common code.
> 
> The vma is first looked up and copied, under protection of the rcu
> read lock. The mmap lock sequence count is used to verify the
> integrity of the copied vma, and passed to do_handle_mm_fault() to
> allow checking against races with mmap writers when finalizing the fault.
> 
> Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
> ---
>  arch/x86/mm/fault.c           | 44 +++++++++++++++++++++++++++++++++++
>  include/linux/mm_types.h      |  5 ++++
>  include/linux/vm_event_item.h |  4 ++++
>  mm/vmstat.c                   |  4 ++++
>  4 files changed, 57 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index d0074c6ed31a..99b0a358154e 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1226,6 +1226,10 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	struct mm_struct *mm;
>  	vm_fault_t fault;
>  	unsigned int flags = FAULT_FLAG_DEFAULT;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	struct vm_area_struct pvma;
> +	unsigned long seq;
> +#endif
>  
>  	tsk = current;
>  	mm = tsk->mm;
> @@ -1323,6 +1327,43 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	}
>  #endif
>  
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	count_vm_event(SPF_ATTEMPT);
> +	seq = mmap_seq_read_start(mm);
> +	if (seq & 1)
> +		goto spf_abort;
> +	rcu_read_lock();
> +	vma = __find_vma(mm, address);
> +	if (!vma || vma->vm_start > address) {

This fits the vma_lookup() pattern - although you will have to work
around the locking issue still.  This is the same for the other
platforms too; they fit the pattern also.

> +		rcu_read_unlock();
> +		goto spf_abort;
> +	}
> +	pvma = *vma;
> +	rcu_read_unlock();
> +	if (!mmap_seq_read_check(mm, seq))
> +		goto spf_abort;
> +	vma = &pvma;
> +	if (unlikely(access_error(error_code, vma)))
> +		goto spf_abort;
> +	fault = do_handle_mm_fault(vma, address,
> +				   flags | FAULT_FLAG_SPECULATIVE, seq, regs);
> +
> +	if (!(fault & VM_FAULT_RETRY))
> +		goto done;
> +
> +	/* Quick path to respond to signals */
> +	if (fault_signal_pending(fault, regs)) {
> +		if (!user_mode(regs))
> +			kernelmode_fixup_or_oops(regs, error_code, address,
> +						 SIGBUS, BUS_ADRERR,
> +						 ARCH_DEFAULT_PKEY);
> +		return;
> +	}
> +
> +spf_abort:
> +	count_vm_event(SPF_ABORT);
> +#endif
> +
>  	/*
>  	 * Kernel-mode access to the user address space should only occur
>  	 * on well-defined single instructions listed in the exception
> @@ -1419,6 +1460,9 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	}
>  
>  	mmap_read_unlock(mm);
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +done:
> +#endif
>  	if (likely(!(fault & VM_FAULT_ERROR)))
>  		return;
>  
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index b6678578a729..305f05d2a4bc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -370,6 +370,11 @@ struct anon_vma_name {
>   * per VM-area/task. A VM area is any part of the process virtual memory
>   * space that has a special rule for the page-fault handlers (ie a shared
>   * library, the executable area etc).
> + *
> + * Note that speculative page faults make an on-stack copy of the VMA,
> + * so the structure size matters.
> + * (TODO - it would be preferable to copy only the required vma attributes
> + *  rather than the entire vma).
>   */
>  struct vm_area_struct {
>  	/* The first cache line has the info for VMA tree walking. */
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 7b2363388bfa..f00b3e36ff39 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -133,6 +133,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  #ifdef CONFIG_X86
>  		DIRECT_MAP_LEVEL2_SPLIT,
>  		DIRECT_MAP_LEVEL3_SPLIT,
> +#endif
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +		SPF_ATTEMPT,
> +		SPF_ABORT,
>  #endif
>  		NR_VM_EVENT_ITEMS
>  };
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4057372745d0..dbb0160e5558 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1390,6 +1390,10 @@ const char * const vmstat_text[] = {
>  	"direct_map_level2_splits",
>  	"direct_map_level3_splits",
>  #endif
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	"spf_attempt",
> +	"spf_abort",
> +#endif
>  #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
>  };
>  #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
> -- 
> 2.20.1
>

Michel Lespinasse Feb. 7, 2022, 5:39 p.m. UTC | #2

On Tue, Feb 01, 2022 at 05:16:43PM +0000, Liam Howlett wrote:
> > +	vma = __find_vma(mm, address);
> > +	if (!vma || vma->vm_start > address) {
> 
> This fits the vma_lookup() pattern - although you will have to work
> around the locking issue still.  This is the same for the other
> platforms too; they fit the pattern also.

In this case, I think it's just as well to follow the lines of the
non-speculative path, which itself can't use vma_lookup() because it
needs to handle the stack expansion case...

--
Michel "walken" Lespinasse

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d0074c6ed31a..99b0a358154e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1226,6 +1226,10 @@  void do_user_addr_fault(struct pt_regs *regs,
 	struct mm_struct *mm;
 	vm_fault_t fault;
 	unsigned int flags = FAULT_FLAG_DEFAULT;
+#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
+	struct vm_area_struct pvma;
+	unsigned long seq;
+#endif
 
 	tsk = current;
 	mm = tsk->mm;
@@ -1323,6 +1327,43 @@  void do_user_addr_fault(struct pt_regs *regs,
 	}
 #endif
 
+#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
+	count_vm_event(SPF_ATTEMPT);
+	seq = mmap_seq_read_start(mm);
+	if (seq & 1)
+		goto spf_abort;
+	rcu_read_lock();
+	vma = __find_vma(mm, address);
+	if (!vma || vma->vm_start > address) {
+		rcu_read_unlock();
+		goto spf_abort;
+	}
+	pvma = *vma;
+	rcu_read_unlock();
+	if (!mmap_seq_read_check(mm, seq))
+		goto spf_abort;
+	vma = &pvma;
+	if (unlikely(access_error(error_code, vma)))
+		goto spf_abort;
+	fault = do_handle_mm_fault(vma, address,
+				   flags | FAULT_FLAG_SPECULATIVE, seq, regs);
+
+	if (!(fault & VM_FAULT_RETRY))
+		goto done;
+
+	/* Quick path to respond to signals */
+	if (fault_signal_pending(fault, regs)) {
+		if (!user_mode(regs))
+			kernelmode_fixup_or_oops(regs, error_code, address,
+						 SIGBUS, BUS_ADRERR,
+						 ARCH_DEFAULT_PKEY);
+		return;
+	}
+
+spf_abort:
+	count_vm_event(SPF_ABORT);
+#endif
+
 	/*
 	 * Kernel-mode access to the user address space should only occur
 	 * on well-defined single instructions listed in the exception
@@ -1419,6 +1460,9 @@  void do_user_addr_fault(struct pt_regs *regs,
 	}
 
 	mmap_read_unlock(mm);
+#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
+done:
+#endif
 	if (likely(!(fault & VM_FAULT_ERROR)))
 		return;
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index b6678578a729..305f05d2a4bc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -370,6 +370,11 @@  struct anon_vma_name {
  * per VM-area/task. A VM area is any part of the process virtual memory
  * space that has a special rule for the page-fault handlers (ie a shared
  * library, the executable area etc).
+ *
+ * Note that speculative page faults make an on-stack copy of the VMA,
+ * so the structure size matters.
+ * (TODO - it would be preferable to copy only the required vma attributes
+ *  rather than the entire vma).
  */
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 7b2363388bfa..f00b3e36ff39 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -133,6 +133,10 @@  enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_X86
 		DIRECT_MAP_LEVEL2_SPLIT,
 		DIRECT_MAP_LEVEL3_SPLIT,
+#endif
+#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
+		SPF_ATTEMPT,
+		SPF_ABORT,
 #endif
 		NR_VM_EVENT_ITEMS
 };
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4057372745d0..dbb0160e5558 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1390,6 +1390,10 @@  const char * const vmstat_text[] = {
 	"direct_map_level2_splits",
 	"direct_map_level3_splits",
 #endif
+#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
+	"spf_attempt",
+	"spf_abort",
+#endif
 #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
 };
 #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */

[v2,13/35] x86/mm: attempt speculative mm faults first

Commit Message

Comments

Patch