diff mbox series

[v2,1/3] x86/mm: Use IPIs to synchronize LAM enablement

Message ID 20240312155641.4003683-1-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series [v2,1/3] x86/mm: Use IPIs to synchronize LAM enablement | expand

Commit Message

Yosry Ahmed March 12, 2024, 3:56 p.m. UTC
LAM can only be enabled when a process is single-threaded.  But _kernel_
threads can temporarily use a single-threaded process's mm.

If LAM is enabled by a userspace process while a kthread is using its
mm, the kthread will not observe LAM enablement (i.e.  LAM will be
disabled in CR3). This could be fine for the kthread itself, as LAM only
affects userspace addresses. However, if the kthread context switches to
a thread in the same userspace process, CR3 may or may not be updated
because the mm_struct doesn't change (based on pending TLB flushes). If
CR3 is not updated, the userspace thread will run incorrectly with LAM
disabled, which may cause page faults when using tagged addresses.
Example scenario:

CPU 1                                   CPU 2
/* kthread */
kthread_use_mm()
                                        /* user thread */
                                        prctl_enable_tagged_addr()
                                        /* LAM enabled on CPU 2 */
/* LAM disabled on CPU 1 */
                                        context_switch() /* to CPU 1 */
/* Switching to user thread */
switch_mm_irqs_off()
/* CR3 not updated */
/* LAM is still disabled on CPU 1 */

Synchronize LAM enablement by sending an IPI from
prctl_enable_tagged_addr() to all CPUs running with the mm_struct to
enable LAM. This makes sure LAM is enabled on CPU 1 in the above
scenario before prctl_enable_tagged_addr() returns and userspace starts
using tagged addresses, and before it's possible to run the userspace
process on CPU 1.

In switch_mm_irqs_off(), move reading the LAM mask until after
mm_cpumask() is updated. This ensures that if an outdated LAM mask is
written to CR3, an IPI is received to update it right after IRQs are
re-enabled.

Fixes: 82721d8b25d7 ("x86/mm: Handle LAM on context switch")
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v1 -> v2:
- Defer dereferences in enable_lam_func() until after we check
  cpu_tlbstate.loaded_mm.
- Collect Rbs.

---
 arch/x86/kernel/process_64.c | 13 +++++++++++--
 arch/x86/mm/tlb.c            |  7 +++----
 2 files changed, 14 insertions(+), 6 deletions(-)

Comments

Dave Hansen March 12, 2024, 4:04 p.m. UTC | #1
Yosry,

Could you please slow down a bit on these?  Us lazy west coast Americans
are barely a cup of coffee into our day.  We haven't even had a chance
to read v1.

Once a week is about the right cadence to be sending these.  Every 12
hours is more than my poor inbox can take! :)
Yosry Ahmed March 12, 2024, 4:09 p.m. UTC | #2
On Tue, Mar 12, 2024 at 9:04 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> Yosry,
>
> Could you please slow down a bit on these?  Us lazy west coast Americans
> are barely a cup of coffee into our day.  We haven't even had a chance
> to read v1.
>
> Once a week is about the right cadence to be sending these.  Every 12
> hours is more than my poor inbox can take! :)

My bad, I lost track of when I sent v1 and saw Kirill's comment when I
woke up so I addressed that. FWIW, v1 and v2 are almost identical
except for a small change in patch 2 to address Kirill's comment. I
will hold off on sending anything else this week.

Cheers!
Borislav Petkov March 12, 2024, 4:13 p.m. UTC | #3
On Tue, Mar 12, 2024 at 09:09:30AM -0700, Yosry Ahmed wrote:
> My bad, I lost track of when I sent v1 and saw Kirill's comment when I
> woke up so I addressed that. FWIW, v1 and v2 are almost identical
> except for a small change in patch 2 to address Kirill's comment. I
> will hold off on sending anything else this week.

and while you do, you can have a look at

https://kernel.org/doc/html/latest/process/development-process.html

And we're in a merge window now so no queueing of new patches for
2 weeks unless they're regressions.

HTH.
Yosry Ahmed March 12, 2024, 4:23 p.m. UTC | #4
On Tue, Mar 12, 2024 at 9:14 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Mar 12, 2024 at 09:09:30AM -0700, Yosry Ahmed wrote:
> > My bad, I lost track of when I sent v1 and saw Kirill's comment when I
> > woke up so I addressed that. FWIW, v1 and v2 are almost identical
> > except for a small change in patch 2 to address Kirill's comment. I
> > will hold off on sending anything else this week.
>
> and while you do, you can have a look at
>
> https://kernel.org/doc/html/latest/process/development-process.html

Sure, although I am kinda familiar with that. It would be useful to
point out what part(s) I may be violating :)

>
> And we're in a merge window now so no queueing of new patches for
> 2 weeks unless they're regressions.

Right, I am aware of that part. According to the tip tree handbook I
shouldn't expect them to be handled during the merge window, but do
x86 folks prefer I hold off on sending them until after the merge
window?
Borislav Petkov March 12, 2024, 4:35 p.m. UTC | #5
On Tue, Mar 12, 2024 at 09:23:48AM -0700, Yosry Ahmed wrote:
> Sure, although I am kinda familiar with that. It would be useful to
> point out what part(s) I may be violating :)

Are you kidding?

Dave just told you.

Lemme paste the whole text for you:

"Don't get discouraged - or impatient

After you have submitted your change, be patient and wait. Reviewers are
busy people and may not get to your patch right away.

Once upon a time, patches used to disappear into the void without
comment, but the development process works more smoothly than that now.
You should receive comments within a few weeks (typically 2-3); if that
does not happen, make sure that you have sent your patches to the right
place. Wait for a minimum of one week before resubmitting or pinging
reviewers - possibly longer during busy times like merge windows."

https://kernel.org/doc/html/latest/process/submitting-patches.html

> Right, I am aware of that part. According to the tip tree handbook I
> shouldn't expect them to be handled during the merge window, but do
> x86 folks prefer I hold off on sending them until after the merge
> window?

I believe I speak for all of tip folks when I say that they prefer not
to be spammed with the same set too regularly.

As to when you send: there's never a good moment because our mailboxes
are constantly overflowing.
Yosry Ahmed March 12, 2024, 4:46 p.m. UTC | #6
On Tue, Mar 12, 2024 at 9:35 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Mar 12, 2024 at 09:23:48AM -0700, Yosry Ahmed wrote:
> > Sure, although I am kinda familiar with that. It would be useful to
> > point out what part(s) I may be violating :)
>
> Are you kidding?
>
> Dave just told you.
>
> Lemme paste the whole text for you:
>
> "Don't get discouraged - or impatient
>
> After you have submitted your change, be patient and wait. Reviewers are
> busy people and may not get to your patch right away.
>
> Once upon a time, patches used to disappear into the void without
> comment, but the development process works more smoothly than that now.
> You should receive comments within a few weeks (typically 2-3); if that
> does not happen, make sure that you have sent your patches to the right
> place. Wait for a minimum of one week before resubmitting or pinging
> reviewers - possibly longer during busy times like merge windows."
>
> https://kernel.org/doc/html/latest/process/submitting-patches.html

Thanks for sharing that. I have always assumed this was about pinging
or resending patches when reviews are taking too long. In this case, I
was responding to review comments. Maybe I misinterpreted that.

Anyway, sending a new version in the same day is too fast regardless.
I did admit that already. My bad again :)
Borislav Petkov March 12, 2024, 5:01 p.m. UTC | #7
On Tue, Mar 12, 2024 at 09:46:07AM -0700, Yosry Ahmed wrote:
> Thanks for sharing that. I have always assumed this was about pinging
> or resending patches when reviews are taking too long. In this case, I
> was responding to review comments. Maybe I misinterpreted that.

So what I would do, for example, is send my set, collect review
comments, work them in, discuss them and once there are no more, I'll do
a rev+1, test and send again. Not under a week unless it is some really
serious bug.

This is a perfectly fine cadence.

> Anyway, sending a new version in the same day is too fast regardless.
> I did admit that already. My bad again :)

No worries. I'm saying this also for all the others who are reading. :-)
Yosry Ahmed March 12, 2024, 5:05 p.m. UTC | #8
On Tue, Mar 12, 2024 at 10:01 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Mar 12, 2024 at 09:46:07AM -0700, Yosry Ahmed wrote:
> > Thanks for sharing that. I have always assumed this was about pinging
> > or resending patches when reviews are taking too long. In this case, I
> > was responding to review comments. Maybe I misinterpreted that.
>
> So what I would do, for example, is send my set, collect review
> comments, work them in, discuss them and once there are no more, I'll do
> a rev+1, test and send again. Not under a week unless it is some really
> serious bug.
>
> This is a perfectly fine cadence.

Makes sense to me, thanks!
Yosry Ahmed April 4, 2024, 9:27 p.m. UTC | #9
On Tue, Mar 12, 2024 at 8:56 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> LAM can only be enabled when a process is single-threaded.  But _kernel_
> threads can temporarily use a single-threaded process's mm.
>
> If LAM is enabled by a userspace process while a kthread is using its
> mm, the kthread will not observe LAM enablement (i.e.  LAM will be
> disabled in CR3). This could be fine for the kthread itself, as LAM only
> affects userspace addresses. However, if the kthread context switches to
> a thread in the same userspace process, CR3 may or may not be updated
> because the mm_struct doesn't change (based on pending TLB flushes). If
> CR3 is not updated, the userspace thread will run incorrectly with LAM
> disabled, which may cause page faults when using tagged addresses.
> Example scenario:
>
> CPU 1                                   CPU 2
> /* kthread */
> kthread_use_mm()
>                                         /* user thread */
>                                         prctl_enable_tagged_addr()
>                                         /* LAM enabled on CPU 2 */
> /* LAM disabled on CPU 1 */
>                                         context_switch() /* to CPU 1 */
> /* Switching to user thread */
> switch_mm_irqs_off()
> /* CR3 not updated */
> /* LAM is still disabled on CPU 1 */
>
> Synchronize LAM enablement by sending an IPI from
> prctl_enable_tagged_addr() to all CPUs running with the mm_struct to
> enable LAM. This makes sure LAM is enabled on CPU 1 in the above
> scenario before prctl_enable_tagged_addr() returns and userspace starts
> using tagged addresses, and before it's possible to run the userspace
> process on CPU 1.
>
> In switch_mm_irqs_off(), move reading the LAM mask until after
> mm_cpumask() is updated. This ensures that if an outdated LAM mask is
> written to CR3, an IPI is received to update it right after IRQs are
> re-enabled.
>
> Fixes: 82721d8b25d7 ("x86/mm: Handle LAM on context switch")
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

There hasn't been any further review comments on v2, and the merge
window has been closed for a while now. Do I need  to take any further
action or send a new version of this series?
diff mbox series

Patch

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 33b268747bb7b..76e91fc68c5f3 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -750,6 +750,16 @@  static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 
 #define LAM_U57_BITS 6
 
+static void enable_lam_func(void *__mm)
+{
+	struct mm_struct *mm = __mm;
+
+	if (this_cpu_read(cpu_tlbstate.loaded_mm) == mm) {
+		write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
+		set_tlbstate_lam_mode(mm);
+	}
+}
+
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 {
 	if (!cpu_feature_enabled(X86_FEATURE_LAM))
@@ -782,8 +792,7 @@  static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 		return -EINVAL;
 	}
 
-	write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
-	set_tlbstate_lam_mode(mm);
+	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
 	set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags);
 
 	mmap_write_unlock(mm);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 5768d386efab6..e8feb2e154db2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -497,9 +497,9 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
-	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
+	unsigned long new_lam;
 	u64 next_tlb_gen;
 	bool need_flush;
 	u16 new_asid;
@@ -622,9 +622,7 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
 		}
 
-		/*
-		 * Start remote flushes and then read tlb_gen.
-		 */
+		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
 		if (next != &init_mm)
 			cpumask_set_cpu(cpu, mm_cpumask(next));
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
@@ -636,6 +634,7 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	new_lam = mm_lam_cr3_mask(next);
 	set_tlbstate_lam_mode(next);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);