[2/2] x86/mm: clarify "prev" usage in switch_mm_irqs_off()

Message ID	20240126080644.1714297-2-yosryahmed@google.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Fri, 26 Jan 2024 08:06:44 +0000 In-Reply-To: <20240126080644.1714297-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20240126080644.1714297-1-yosryahmed@google.com> Message-ID: <20240126080644.1714297-2-yosryahmed@google.com> Subject: [PATCH 2/2] x86/mm: clarify "prev" usage in switch_mm_irqs_off() From: Yosry Ahmed <yosryahmed@google.com> To: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed <yosryahmed@google.com> Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[1/2] x86/mm: delete unused cpu argument to leave_mm() \| expand [1/2] x86/mm: delete unused cpu argument to leave_mm() [2/2] x86/mm: clarify "prev" usage in switch_mm_irqs_off()

Message ID

20240126080644.1714297-2-yosryahmed@google.com (mailing list archive)

State

New

Headers

Date: Fri, 26 Jan 2024 08:06:44 +0000
In-Reply-To: <20240126080644.1714297-1-yosryahmed@google.com>
Mime-Version: 1.0
References: <20240126080644.1714297-1-yosryahmed@google.com>
Message-ID: <20240126080644.1714297-2-yosryahmed@google.com>
Subject: [PATCH 2/2] x86/mm: clarify "prev" usage in switch_mm_irqs_off()
From: Yosry Ahmed <yosryahmed@google.com>
To: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
 Andrew Morton <akpm@linux-foundation.org>, x86@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Yosry Ahmed <yosryahmed@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[1/2] x86/mm: delete unused cpu argument to leave_mm() | expand

Commit Message

Yosry Ahmed Jan. 26, 2024, 8:06 a.m. UTC

In the x86 implementation of switch_mm_irqs_off(), we do not use the
"prev" argument passed in by the caller, we use exclusively use
"real_prev", which is cpu_tlbstate.loaded_mm. This is not obvious at the
first sight.

Furthermore, a comment describes a condition that happens
when called with prev == next, but this should not affect the function
in any way since prev is unused. Apparently, the comment is intended to
clarify why we don't rely on prev == next to decide whether we need to
update CR3, but again, it is not obvious. The comment also references
the fact that leave_mm() calls with prev == NULL and tsk == NULL, but
this also shouldn't matter because prev is unused and tsk is only used
in one function which has a NULL check.

Clarify things by renaming (prev -> unused) and (real_prev -> prev),
also move and rewrite the comment as an explanation for why we don't
rely on "prev" supplied by the caller in x86 code and use our own.
Hopefully this makes reading the code easier.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 arch/x86/mm/tlb.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

Comments

Dave Hansen Feb. 22, 2024, 4:48 p.m. UTC | #1

On 1/26/24 00:06, Yosry Ahmed wrote:
> +/*
> + * The "prev" argument passed by the caller does not always match CR3. For
> + * example, the scheduler passes in active_mm when switching from lazy TLB mode
> + * to normal mode, but switch_mm_irqs_off() can be called from x86 code without
> + * updating active_mm. Use cpu_tlbstate.loaded_mm instead.
> + */
> +void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
>  			struct task_struct *tsk)

One nit here: It's not obvious that "unused" is 'the "prev" argument'.

Would something like this be more clear?

/*
 * This optimizes when not actually switching mm's.  Some architectures
 * use the 'unused' argument for this optimization, but x86 must use
 * 'cpu_tlbstate.loaded_mm' instead because it does not always keep
 * ->active_mm up to date.
 */

Also, I think it might be useful to have the rule that arch/x86 code
_always_ calls switch_mm_irqs_off() with the first argument (the
newly-named 'unused') set to NULL.  I think there's only one site:

> void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>                struct task_struct *tsk)
> {
>         unsigned long flags;
> 
>         local_irq_save(flags);
>         switch_mm_irqs_off(prev, next, tsk);
>         local_irq_restore(flags);
> }

Yosry Ahmed Feb. 22, 2024, 6:43 p.m. UTC | #2

On Thu, Feb 22, 2024 at 08:48:17AM -0800, Dave Hansen wrote:
> On 1/26/24 00:06, Yosry Ahmed wrote:
> > +/*
> > + * The "prev" argument passed by the caller does not always match CR3. For
> > + * example, the scheduler passes in active_mm when switching from lazy TLB mode
> > + * to normal mode, but switch_mm_irqs_off() can be called from x86 code without
> > + * updating active_mm. Use cpu_tlbstate.loaded_mm instead.
> > + */
> > +void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
> >  			struct task_struct *tsk)
> 
> One nit here: It's not obvious that "unused" is 'the "prev" argument'.
> 
> Would something like this be more clear?
> 
> /*
>  * This optimizes when not actually switching mm's.  Some architectures
>  * use the 'unused' argument for this optimization, but x86 must use
>  * 'cpu_tlbstate.loaded_mm' instead because it does not always keep
>  * ->active_mm up to date.
>  */

Yes, this is more clear, thanks! However, Andrew already merged that
patch into mm-stable, so it cannot be amended. I can send a separate
patch to rewrite the comment tho if you'd like, WDYT?

> 
> Also, I think it might be useful to have the rule that arch/x86 code
> _always_ calls switch_mm_irqs_off() with the first argument (the
> newly-named 'unused') set to NULL.  I think there's only one site:

Agreed. I can also send a separate patch for this. Thanks!

> 
> > void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> >                struct task_struct *tsk)
> > {
> >         unsigned long flags;
> > 
> >         local_irq_save(flags);
> >         switch_mm_irqs_off(prev, next, tsk);
> >         local_irq_restore(flags);
> > }
>

Dave Hansen Feb. 22, 2024, 6:47 p.m. UTC | #3

On 2/22/24 10:43, Yosry Ahmed wrote:
>> /*
>>  * This optimizes when not actually switching mm's.  Some architectures
>>  * use the 'unused' argument for this optimization, but x86 must use
>>  * 'cpu_tlbstate.loaded_mm' instead because it does not always keep
>>  * ->active_mm up to date.
>>  */
> Yes, this is more clear, thanks! However, Andrew already merged that
> patch into mm-stable, so it cannot be amended. I can send a separate
> patch to rewrite the comment tho if you'd like, WDYT?
> 
>> Also, I think it might be useful to have the rule that arch/x86 code
>> _always_ calls switch_mm_irqs_off() with the first argument (the
>> newly-named 'unused') set to NULL.  I think there's only one site:
> Agreed. I can also send a separate patch for this. Thanks!

That would be great.  I'd be happy to ack them.

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 80b0caa82a91b..bf9605caf24f7 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -492,10 +492,16 @@  void cr4_update_pce(void *ignored)
 static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
 #endif
 
-void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
+/*
+ * The "prev" argument passed by the caller does not always match CR3. For
+ * example, the scheduler passes in active_mm when switching from lazy TLB mode
+ * to normal mode, but switch_mm_irqs_off() can be called from x86 code without
+ * updating active_mm. Use cpu_tlbstate.loaded_mm instead.
+ */
+void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 			struct task_struct *tsk)
 {
-	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+	struct mm_struct *prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
 	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
@@ -504,15 +510,6 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	bool need_flush;
 	u16 new_asid;
 
-	/*
-	 * NB: The scheduler will call us with prev == next when switching
-	 * from lazy TLB mode to normal mode if active_mm isn't changing.
-	 * When this happens, we don't assume that CR3 (and hence
-	 * cpu_tlbstate.loaded_mm) matches next.
-	 *
-	 * NB: leave_mm() calls us with prev == NULL and tsk == NULL.
-	 */
-
 	/* We don't want flush_tlb_func() to run concurrently with us. */
 	if (IS_ENABLED(CONFIG_PROVE_LOCKING))
 		WARN_ON_ONCE(!irqs_disabled());
@@ -527,7 +524,7 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid,
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(prev->pgd, prev_asid,
 						   tlbstate_lam_cr3_mask()))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
@@ -559,7 +556,7 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * provides that full memory barrier and core serializing
 	 * instruction.
 	 */
-	if (real_prev == next) {
+	if (prev == next) {
 		/* Not actually switching mm's */
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
@@ -574,7 +571,7 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * mm_cpumask. The TLB shootdown code can figure out from
 		 * cpu_tlbstate_shared.is_lazy whether or not to send an IPI.
 		 */
-		if (WARN_ON_ONCE(real_prev != &init_mm &&
+		if (WARN_ON_ONCE(prev != &init_mm &&
 				 !cpumask_test_cpu(cpu, mm_cpumask(next))))
 			cpumask_set_cpu(cpu, mm_cpumask(next));
 
@@ -616,10 +613,10 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * Skip kernel threads; we never send init_mm TLB flushing IPIs,
 		 * but the bitmap manipulation can cause cache line contention.
 		 */
-		if (real_prev != &init_mm) {
+		if (prev != &init_mm) {
 			VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
-						mm_cpumask(real_prev)));
-			cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+						mm_cpumask(prev)));
+			cpumask_clear_cpu(cpu, mm_cpumask(prev));
 		}
 
 		/*
@@ -656,9 +653,9 @@  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	this_cpu_write(cpu_tlbstate.loaded_mm, next);
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
 
-	if (next != real_prev) {
+	if (next != prev) {
 		cr4_update_pce_mm(next);
-		switch_ldt(real_prev, next);
+		switch_ldt(prev, next);
 	}
 }

[2/2] x86/mm: clarify "prev" usage in switch_mm_irqs_off()

Commit Message

Comments

Patch