Message ID | 20240126080644.1714297-2-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] x86/mm: delete unused cpu argument to leave_mm() | expand |
On 1/26/24 00:06, Yosry Ahmed wrote: > +/* > + * The "prev" argument passed by the caller does not always match CR3. For > + * example, the scheduler passes in active_mm when switching from lazy TLB mode > + * to normal mode, but switch_mm_irqs_off() can be called from x86 code without > + * updating active_mm. Use cpu_tlbstate.loaded_mm instead. > + */ > +void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > struct task_struct *tsk) One nit here: It's not obvious that "unused" is 'the "prev" argument'. Would something like this be more clear? /* * This optimizes when not actually switching mm's. Some architectures * use the 'unused' argument for this optimization, but x86 must use * 'cpu_tlbstate.loaded_mm' instead because it does not always keep * ->active_mm up to date. */ Also, I think it might be useful to have the rule that arch/x86 code _always_ calls switch_mm_irqs_off() with the first argument (the newly-named 'unused') set to NULL. I think there's only one site: > void switch_mm(struct mm_struct *prev, struct mm_struct *next, > struct task_struct *tsk) > { > unsigned long flags; > > local_irq_save(flags); > switch_mm_irqs_off(prev, next, tsk); > local_irq_restore(flags); > }
On Thu, Feb 22, 2024 at 08:48:17AM -0800, Dave Hansen wrote: > On 1/26/24 00:06, Yosry Ahmed wrote: > > +/* > > + * The "prev" argument passed by the caller does not always match CR3. For > > + * example, the scheduler passes in active_mm when switching from lazy TLB mode > > + * to normal mode, but switch_mm_irqs_off() can be called from x86 code without > > + * updating active_mm. Use cpu_tlbstate.loaded_mm instead. > > + */ > > +void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > > struct task_struct *tsk) > > One nit here: It's not obvious that "unused" is 'the "prev" argument'. > > Would something like this be more clear? > > /* > * This optimizes when not actually switching mm's. Some architectures > * use the 'unused' argument for this optimization, but x86 must use > * 'cpu_tlbstate.loaded_mm' instead because it does not always keep > * ->active_mm up to date. > */ Yes, this is more clear, thanks! However, Andrew already merged that patch into mm-stable, so it cannot be amended. I can send a separate patch to rewrite the comment tho if you'd like, WDYT? > > Also, I think it might be useful to have the rule that arch/x86 code > _always_ calls switch_mm_irqs_off() with the first argument (the > newly-named 'unused') set to NULL. I think there's only one site: Agreed. I can also send a separate patch for this. Thanks! > > > void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > struct task_struct *tsk) > > { > > unsigned long flags; > > > > local_irq_save(flags); > > switch_mm_irqs_off(prev, next, tsk); > > local_irq_restore(flags); > > } >
On 2/22/24 10:43, Yosry Ahmed wrote: >> /* >> * This optimizes when not actually switching mm's. Some architectures >> * use the 'unused' argument for this optimization, but x86 must use >> * 'cpu_tlbstate.loaded_mm' instead because it does not always keep >> * ->active_mm up to date. >> */ > Yes, this is more clear, thanks! However, Andrew already merged that > patch into mm-stable, so it cannot be amended. I can send a separate > patch to rewrite the comment tho if you'd like, WDYT? > >> Also, I think it might be useful to have the rule that arch/x86 code >> _always_ calls switch_mm_irqs_off() with the first argument (the >> newly-named 'unused') set to NULL. I think there's only one site: > Agreed. I can also send a separate patch for this. Thanks! That would be great. I'd be happy to ack them.
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 80b0caa82a91b..bf9605caf24f7 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -492,10 +492,16 @@ void cr4_update_pce(void *ignored) static inline void cr4_update_pce_mm(struct mm_struct *mm) { } #endif -void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, +/* + * The "prev" argument passed by the caller does not always match CR3. For + * example, the scheduler passes in active_mm when switching from lazy TLB mode + * to normal mode, but switch_mm_irqs_off() can be called from x86 code without + * updating active_mm. Use cpu_tlbstate.loaded_mm instead. + */ +void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, struct task_struct *tsk) { - struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm); + struct mm_struct *prev = this_cpu_read(cpu_tlbstate.loaded_mm); u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); unsigned long new_lam = mm_lam_cr3_mask(next); bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy); @@ -504,15 +510,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, bool need_flush; u16 new_asid; - /* - * NB: The scheduler will call us with prev == next when switching - * from lazy TLB mode to normal mode if active_mm isn't changing. - * When this happens, we don't assume that CR3 (and hence - * cpu_tlbstate.loaded_mm) matches next. - * - * NB: leave_mm() calls us with prev == NULL and tsk == NULL. - */ - /* We don't want flush_tlb_func() to run concurrently with us. */ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) WARN_ON_ONCE(!irqs_disabled()); @@ -527,7 +524,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * isn't free. */ #ifdef CONFIG_DEBUG_VM - if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, + if (WARN_ON_ONCE(__read_cr3() != build_cr3(prev->pgd, prev_asid, tlbstate_lam_cr3_mask()))) { /* * If we were to BUG here, we'd be very likely to kill @@ -559,7 +556,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * provides that full memory barrier and core serializing * instruction. */ - if (real_prev == next) { + if (prev == next) { /* Not actually switching mm's */ VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != next->context.ctx_id); @@ -574,7 +571,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * mm_cpumask. The TLB shootdown code can figure out from * cpu_tlbstate_shared.is_lazy whether or not to send an IPI. */ - if (WARN_ON_ONCE(real_prev != &init_mm && + if (WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))) cpumask_set_cpu(cpu, mm_cpumask(next)); @@ -616,10 +613,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * Skip kernel threads; we never send init_mm TLB flushing IPIs, * but the bitmap manipulation can cause cache line contention. */ - if (real_prev != &init_mm) { + if (prev != &init_mm) { VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, - mm_cpumask(real_prev))); - cpumask_clear_cpu(cpu, mm_cpumask(real_prev)); + mm_cpumask(prev))); + cpumask_clear_cpu(cpu, mm_cpumask(prev)); } /* @@ -656,9 +653,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, this_cpu_write(cpu_tlbstate.loaded_mm, next); this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid); - if (next != real_prev) { + if (next != prev) { cr4_update_pce_mm(next); - switch_ldt(real_prev, next); + switch_ldt(prev, next); } }
In the x86 implementation of switch_mm_irqs_off(), we do not use the "prev" argument passed in by the caller, we use exclusively use "real_prev", which is cpu_tlbstate.loaded_mm. This is not obvious at the first sight. Furthermore, a comment describes a condition that happens when called with prev == next, but this should not affect the function in any way since prev is unused. Apparently, the comment is intended to clarify why we don't rely on prev == next to decide whether we need to update CR3, but again, it is not obvious. The comment also references the fact that leave_mm() calls with prev == NULL and tsk == NULL, but this also shouldn't matter because prev is unused and tsk is only used in one function which has a NULL check. Clarify things by renaming (prev -> unused) and (real_prev -> prev), also move and rewrite the comment as an explanation for why we don't rely on "prev" supplied by the caller in x86 code and use our own. Hopefully this makes reading the code easier. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- arch/x86/mm/tlb.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-)