Message ID | 20250213161423.449435-9-riel@surriel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | AMD broadcast TLB invalidation | expand |
On 2/13/25 08:13, Rik van Riel wrote: > In the page reclaim code, we only track the CPU(s) where the TLB needs > to be flushed, rather than all the individual mappings that may be getting > invalidated. > > Use broadcast TLB flushing when that is available. The changelog here is a little light. This patch is doing a *ton* of stuff. The existing code has two cases where it is doing a full TLB flush, not a ranged flush. 1. An actual IPI to some CPUs in batch->cpumask 2. A local flush, no IPI The change here eliminates both of those options, even the "common case" which is not sending an IPI at all. So this replaces a CPU-local (aka. 1 logical CPU) TLB flush with a broadcast to the *ENTIRE* system. That's a really really big change to not be noted. It's not something that's an obvious win to me. > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 3c29ef25dce4..de3f6e4ed16d 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -1316,7 +1316,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > * a local TLB flush is needed. Optimize this use-case by calling > * flush_tlb_func_local() directly in this case. > */ > - if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { > + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { > + invlpgb_flush_all_nonglobals(); > + } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { > flush_tlb_multi(&batch->cpumask, info); > } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { > lockdep_assert_irqs_enabled(); The structure of the code is also a bit off to me. O'd kinda prefer that we stick the pattern of (logically): if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { invlpgb_...(); } else { on_each_cpu*(); } This patch is going a couple of functions up in the call chain above the on_each_cpu()'s. It would be more consistent with the previous modifications in this series if the X86_FEATURE_INVLPGB check was in native_flush_tlb_multi(), instead. Would that make sense here? It would also preserve the "common case" optimization that's in arch_tlbbatch_flush().
On Fri, 2025-02-14 at 10:51 -0800, Dave Hansen wrote: > > Would that make sense here? It would also preserve the "common case" > optimization that's in arch_tlbbatch_flush(). > What we do in this patch goes out the window in patch 10. This is just a temporary stage along the way to what we have at the end of the series, needed to make sure we don't break bisect partway through the series. I'm not sure we should be doing much with this patch. I could fold it into the next patch, but that would make things harder to read.
On 2/18/25 11:31, Rik van Riel wrote: > On Fri, 2025-02-14 at 10:51 -0800, Dave Hansen wrote: >> Would that make sense here? It would also preserve the "common case" >> optimization that's in arch_tlbbatch_flush(). >> > What we do in this patch goes out the window in patch > 10. This is just a temporary stage along the way to > what we have at the end of the series, needed to make > sure we don't break bisect partway through the series. > > I'm not sure we should be doing much with this patch. > > I could fold it into the next patch, but that would > make things harder to read. Ahh, that makes sense. If it's going out the window, could it be temporarily commented, please? Add the comment here and remove it in patch 10 (or whatever).
On Tue, 2025-02-18 at 11:46 -0800, Dave Hansen wrote: > > Ahh, that makes sense. If it's going out the window, could it be > temporarily commented, please? Add the comment here and remove it in > patch 10 (or whatever). > I added an explanation in the commit message. You are right that it wasn't totally clear.
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 3c29ef25dce4..de3f6e4ed16d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1316,7 +1316,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + invlpgb_flush_all_nonglobals(); + } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled();