diff mbox series

[tip:x86/mm,v3,1/3] x86/mm: Use IPIs to synchronize LAM enablement

Message ID 20240418012835.3360429-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series [tip:x86/mm,v3,1/3] x86/mm: Use IPIs to synchronize LAM enablement | expand

Commit Message

Yosry Ahmed April 18, 2024, 1:28 a.m. UTC
LAM can only be enabled when a process is single-threaded.  But _kernel_
threads can temporarily use a single-threaded process's mm.

If LAM is enabled by a userspace process while a kthread is using its
mm, the kthread will not observe LAM enablement (i.e.  LAM will be
disabled in CR3). This could be fine for the kthread itself, as LAM only
affects userspace addresses. However, if the kthread context switches to
a thread in the same userspace process, CR3 may or may not be updated
because the mm_struct doesn't change (based on pending TLB flushes). If
CR3 is not updated, the userspace thread will run incorrectly with LAM
disabled, which may cause page faults when using tagged addresses.
Example scenario:

CPU 1                                   CPU 2
/* kthread */
kthread_use_mm()
                                        /* user thread */
                                        prctl_enable_tagged_addr()
                                        /* LAM enabled on CPU 2 */
/* LAM disabled on CPU 1 */
                                        context_switch() /* to CPU 1 */
/* Switching to user thread */
switch_mm_irqs_off()
/* CR3 not updated */
/* LAM is still disabled on CPU 1 */

Synchronize LAM enablement by sending an IPI from
prctl_enable_tagged_addr() to all CPUs running with the mm_struct to
enable LAM. This makes sure LAM is enabled on CPU 1 in the above
scenario before prctl_enable_tagged_addr() returns and userspace starts
using tagged addresses, and before it's possible to run the userspace
process on CPU 1.

In switch_mm_irqs_off(), move reading the LAM mask until after
mm_cpumask() is updated. This ensures that if an outdated LAM mask is
written to CR3, an IPI is received to update it right after IRQs are
re-enabled.

Fixes: 82721d8b25d7 ("x86/mm: Handle LAM on context switch")
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v2 -> v3:
- Rebased on top of the latest tip:x86/mm after v6.9-rc3.
- Collected R-b on patch 2 (thanks!).

---
 arch/x86/kernel/process_64.c | 13 +++++++++++--
 arch/x86/mm/tlb.c            |  7 +++----
 2 files changed, 14 insertions(+), 6 deletions(-)

Comments

Yosry Ahmed May 6, 2024, 2:37 p.m. UTC | #1
On Wed, Apr 17, 2024 at 6:28 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> LAM can only be enabled when a process is single-threaded.  But _kernel_
> threads can temporarily use a single-threaded process's mm.
>
> If LAM is enabled by a userspace process while a kthread is using its
> mm, the kthread will not observe LAM enablement (i.e.  LAM will be
> disabled in CR3). This could be fine for the kthread itself, as LAM only
> affects userspace addresses. However, if the kthread context switches to
> a thread in the same userspace process, CR3 may or may not be updated
> because the mm_struct doesn't change (based on pending TLB flushes). If
> CR3 is not updated, the userspace thread will run incorrectly with LAM
> disabled, which may cause page faults when using tagged addresses.
> Example scenario:
>
> CPU 1                                   CPU 2
> /* kthread */
> kthread_use_mm()
>                                         /* user thread */
>                                         prctl_enable_tagged_addr()
>                                         /* LAM enabled on CPU 2 */
> /* LAM disabled on CPU 1 */
>                                         context_switch() /* to CPU 1 */
> /* Switching to user thread */
> switch_mm_irqs_off()
> /* CR3 not updated */
> /* LAM is still disabled on CPU 1 */
>
> Synchronize LAM enablement by sending an IPI from
> prctl_enable_tagged_addr() to all CPUs running with the mm_struct to
> enable LAM. This makes sure LAM is enabled on CPU 1 in the above
> scenario before prctl_enable_tagged_addr() returns and userspace starts
> using tagged addresses, and before it's possible to run the userspace
> process on CPU 1.
>
> In switch_mm_irqs_off(), move reading the LAM mask until after
> mm_cpumask() is updated. This ensures that if an outdated LAM mask is
> written to CR3, an IPI is received to update it right after IRQs are
> re-enabled.
>
> Fixes: 82721d8b25d7 ("x86/mm: Handle LAM on context switch")
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

The merge window is approaching, is this patchset ready to be picked
up? Is any further action needed?
Dave Hansen May 9, 2024, 6:22 p.m. UTC | #2
On 5/6/24 07:37, Yosry Ahmed wrote:
> The merge window is approaching, is this patchset ready to be picked
> up? Is any further action needed?

Yosry, the merge window is when we send content to Linus.  We generally
create that content weeks or months before the merge window.

The best action on your here would be to touch base with Andy and make
sure he's on board with the solution you think he suggested.
Yosry Ahmed May 9, 2024, 6:28 p.m. UTC | #3
On Thu, May 9, 2024 at 11:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 5/6/24 07:37, Yosry Ahmed wrote:
> > The merge window is approaching, is this patchset ready to be picked
> > up? Is any further action needed?
>
> Yosry, the merge window is when we send content to Linus.  We generally
> create that content weeks or months before the merge window.

Right. The patch series has been idle for a long time so I assumed it
was going to be picked up for this merge window, but given how close
we have gotten I think we're past this now.

That's fine, I am just trying to get some clarity on whether anything
else needs to be done for this to be in a mergeable state.

>
> The best action on your here would be to touch base with Andy and make
> sure he's on board with the solution you think he suggested.

I have been slightly pinging the series and/or resending every once in
a while, trying not to be too annoying :) Hopefully Andy will voice
his opinion.
Yosry Ahmed June 3, 2024, 10:20 p.m. UTC | #4
On Thu, May 9, 2024 at 11:28 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, May 9, 2024 at 11:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 5/6/24 07:37, Yosry Ahmed wrote:
> > > The merge window is approaching, is this patchset ready to be picked
> > > up? Is any further action needed?
> >
> > Yosry, the merge window is when we send content to Linus.  We generally
> > create that content weeks or months before the merge window.
>
> Right. The patch series has been idle for a long time so I assumed it
> was going to be picked up for this merge window, but given how close
> we have gotten I think we're past this now.
>
> That's fine, I am just trying to get some clarity on whether anything
> else needs to be done for this to be in a mergeable state.
>
> >
> > The best action on your here would be to touch base with Andy and make
> > sure he's on board with the solution you think he suggested.
>
> I have been slightly pinging the series and/or resending every once in
> a while, trying not to be too annoying :) Hopefully Andy will voice
> his opinion.

Hey Dave,

I have tried reaching out to Andy on and off list, and didn't have any
luck unfortunately :/

Is there any way we can still make progress on getting the fixes in
this series merged?

Thanks!
Yosry Ahmed June 24, 2024, 11:19 a.m. UTC | #5
On Mon, Jun 3, 2024 at 3:20 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, May 9, 2024 at 11:28 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Thu, May 9, 2024 at 11:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
> > >
> > > On 5/6/24 07:37, Yosry Ahmed wrote:
> > > > The merge window is approaching, is this patchset ready to be picked
> > > > up? Is any further action needed?
> > >
> > > Yosry, the merge window is when we send content to Linus.  We generally
> > > create that content weeks or months before the merge window.
> >
> > Right. The patch series has been idle for a long time so I assumed it
> > was going to be picked up for this merge window, but given how close
> > we have gotten I think we're past this now.
> >
> > That's fine, I am just trying to get some clarity on whether anything
> > else needs to be done for this to be in a mergeable state.
> >
> > >
> > > The best action on your here would be to touch base with Andy and make
> > > sure he's on board with the solution you think he suggested.
> >
> > I have been slightly pinging the series and/or resending every once in
> > a while, trying not to be too annoying :) Hopefully Andy will voice
> > his opinion.
>
> Hey Dave,
>
> I have tried reaching out to Andy on and off list, and didn't have any
> luck unfortunately :/
>
> Is there any way we can still make progress on getting the fixes in
> this series merged?

Friendly ping on this :)
diff mbox series

Patch

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 7062b84dd467d..c27798f23ef82 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -798,6 +798,16 @@  static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 
 #define LAM_U57_BITS 6
 
+static void enable_lam_func(void *__mm)
+{
+	struct mm_struct *mm = __mm;
+
+	if (this_cpu_read(cpu_tlbstate.loaded_mm) == mm) {
+		write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
+		set_tlbstate_lam_mode(mm);
+	}
+}
+
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 {
 	if (!cpu_feature_enabled(X86_FEATURE_LAM))
@@ -830,8 +840,7 @@  static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 		return -EINVAL;
 	}
 
-	write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
-	set_tlbstate_lam_mode(mm);
+	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
 	set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags);
 
 	mmap_write_unlock(mm);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 44ac64f3a047c..a041d2ecd8380 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -503,9 +503,9 @@  void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 {
 	struct mm_struct *prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
-	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
+	unsigned long new_lam;
 	u64 next_tlb_gen;
 	bool need_flush;
 	u16 new_asid;
@@ -619,9 +619,7 @@  void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 			cpumask_clear_cpu(cpu, mm_cpumask(prev));
 		}
 
-		/*
-		 * Start remote flushes and then read tlb_gen.
-		 */
+		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
 		if (next != &init_mm)
 			cpumask_set_cpu(cpu, mm_cpumask(next));
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
@@ -633,6 +631,7 @@  void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		barrier();
 	}
 
+	new_lam = mm_lam_cr3_mask(next);
 	set_tlbstate_lam_mode(next);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);