Message ID | 20191113095530.228959-1-shile.zhang@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() | expand |
On Wed, 2019-11-13 at 17:55 +0800, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Is it cleaner to use if (IS_ENABLED()...) ? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping.
On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ That is indeed a large performance regression. > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > ... > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Are we sure that x86_32 is the only architecture whcih is (or ever will be) affected? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check CONFIG_X86_32? From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix simplify config expression, use IS_ENABLED() Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix +++ a/mm/vmalloc.c @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig if (unlikely(valist == NULL)) return false; -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) - /* - * First make sure the mappings are removed from all page-tables - * before they are freed. - * - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is - * the case on a PAE kernel with PTI enabled. - */ - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) - vmalloc_sync_all(); + if (IS_ENABLED(CONFIG_X86_PAE)) { + /* + * First make sure the mappings are removed from all page-tables + * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which + * is the case on a PAE kernel with PTI enabled. + */ + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); + } #endif /*
Fix wrong cc list. On 2019/11/14 05:12, Andrew Morton wrote: > On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote: > >> vmalloc_sync_all() was put in the common path in >> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 >> with PTI enabled. It is needless for X86_64, which caused a big regression >> in UnixBench Shell8 testing on X86_64. >> Similar regression also reported by 0-day kernel test robot in reaim >> benchmarking: >> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > That is indeed a large performance regression. > >> Fix it by adding more conditions. >> >> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") >> >> ... >> >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) >> if (unlikely(valist == NULL)) >> return false; >> >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > Are we sure that x86_32 is the only architecture whcih is (or ever will > be) affected? > >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif >> >> /* >> * TODO: to calculate a flush range without looping. > CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check > CONFIG_X86_32? Yes, Thanks for your review and kindly refactoring! > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix > > simplify config expression, use IS_ENABLED() > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Joerg Roedel <jroedel@suse.de> > Cc: Qian Cai <cai@lca.pw> > Cc: Shile Zhang <shile.zhang@linux.alibaba.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/vmalloc.c | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix > +++ a/mm/vmalloc.c > @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig > if (unlikely(valist == NULL)) > return false; > > -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > - /* > - * First make sure the mappings are removed from all page-tables > - * before they are freed. > - * > - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > - * the case on a PAE kernel with PTI enabled. > - */ > - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > - vmalloc_sync_all(); > + if (IS_ENABLED(CONFIG_X86_PAE)) { > + /* > + * First make sure the mappings are removed from all page-tables > + * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which > + * is the case on a PAE kernel with PTI enabled. > + */ > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > + } > #endif > > /* > _
On Wed 13-11-19 17:55:30, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. This doesn't really explain a lot. Could you explain why the syncing should be limited only to PAE and !SHARED_KERNEL_PMD? > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. > -- > 2.24.0.rc2 >
On 2019/11/14 21:56, Michal Hocko wrote: > On Wed 13-11-19 17:55:30, Shile Zhang wrote: >> vmalloc_sync_all() was put in the common path in >> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 >> with PTI enabled. It is needless for X86_64, which caused a big regression >> in UnixBench Shell8 testing on X86_64. >> Similar regression also reported by 0-day kernel test robot in reaim >> benchmarking: >> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ >> >> Fix it by adding more conditions. > This doesn't really explain a lot. Could you explain why the syncing > should be limited only to PAE and !SHARED_KERNEL_PMD? Thanks for your review! Sorry for that, I'm not clear about the original issue the patch 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") fixed. I just get that limitation info from the commit log as following: " This is only needed x86-32 with !SHARED_KERNEL_PMD, which is the case on a PAE kernel with PTI enabled. On affected systems the missing sync causes old mappings to persist in some page-tables, causing data corruption and other undefined behavior. " https://patchwork.kernel.org/cover/11050511/ Hi, Joerg Could you please help to recall the original issue you encountered before? Thanks! > >> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") >> >> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> >> --- >> mm/vmalloc.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index a3c70e275f4e..7b9fc7966da6 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) >> if (unlikely(valist == NULL)) >> return false; >> >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif >> >> /* >> * TODO: to calculate a flush range without looping. >> -- >> 2.24.0.rc2 >>
On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif I already submitted another fix for this quite some time ago: https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ Regards, Joerg
On Thu, Nov 14, 2019 at 10:40:22PM +0800, Shile Zhang wrote:
> Could you please help to recall the original issue you encountered before?
The original issue was data corruption because old mappings did not get
removed from the vmalloc area, and thus also new mappings did not get
faulted in. So depending on the page-table currently loaded one
vmalloc/ioremap area pointed to different (often already freed or
re-used) memory and caused the corruption.
Regards,
Joerg
On Thu, 14 Nov 2019 18:12:31 +0100 Joerg Roedel <jroedel@suse.de> wrote: > On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > > /* > > * First make sure the mappings are removed from all page-tables > > * before they are freed. > > + * > > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > > + * the case on a PAE kernel with PTI enabled. > > */ > > - vmalloc_sync_all(); > > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > > + vmalloc_sync_all(); > > +#endif > > I already submitted another fix for this quite some time ago: > > https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ > Your patch is more verbose but looks quite a bit nicer than this patch (Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com). The separation of sync-for-a-mapping versus sync-for-an-unmapping adds clarity. It's fairly urgent - I consider this to be -stable material. Thomas & co, was that a deliberate skip?
On 2019/11/15 01:12, Joerg Roedel wrote: > On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif > I already submitted another fix for this quite some time ago: > > https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ > > Regards, > > Joerg Oh, sorry for I missed that, good job and thanks for your work!
On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > It's fairly urgent - I consider this to be -stable material. > > Thomas & co, was that a deliberate skip? More like lost in the mail flood. I'm assuming you're taking it? In any case, patch looks ok to me - the "unmappings" naming sounds like coming from someone with a very wild phantasy... :-) Thx.
On Fri, 15 Nov 2019 09:38:47 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > > It's fairly urgent - I consider this to be -stable material. > > > > Thomas & co, was that a deliberate skip? > > More like lost in the mail flood. I'm assuming you're taking it? > > In any case, patch looks ok to me - the "unmappings" naming sounds like > coming from someone with a very wild phantasy... :-) > OK, thanks, here's what I queued. Reviews, acks and testing, please?? From: Joerg Roedel <jroedel@suse.de> Subject: x86/mm: Split vmalloc_sync_all() Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Joerg Roedel <jroedel@suse.de> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- drivers/acpi/apei/ghes.c | 2 +- include/linux/vmalloc.h | 5 +++-- kernel/notifier.c | 2 +- mm/nommu.c | 10 +++++++--- mm/vmalloc.c | 11 +++++++---- 6 files changed, 43 insertions(+), 13 deletions(-) --- a/arch/x86/mm/fault.c~x86-mm-split-vmalloc_sync_all +++ a/arch/x86/mm/fault.c @@ -189,7 +189,7 @@ static inline pmd_t *vmalloc_sync_one(pg return pmd_k; } -void vmalloc_sync_all(void) +static void vmalloc_sync(void) { unsigned long address; @@ -216,6 +216,16 @@ void vmalloc_sync_all(void) } } +void vmalloc_sync_mappings(void) +{ + vmalloc_sync(); +} + +void vmalloc_sync_unmappings(void) +{ + vmalloc_sync(); +} + /* * 32-bit: * @@ -318,11 +328,23 @@ out: #else /* CONFIG_X86_64: */ -void vmalloc_sync_all(void) +void vmalloc_sync_mappings(void) { + /* + * 64-bit mappings might allocate new p4d/pud pages + * that need to be propagated to all tasks' PGDs. + */ sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END); } +void vmalloc_sync_unmappings(void) +{ + /* + * Unmappings never allocate or free p4d/pud pages. + * No work is required here. + */ +} + /* * 64-bit: * --- a/drivers/acpi/apei/ghes.c~x86-mm-split-vmalloc_sync_all +++ a/drivers/acpi/apei/ghes.c @@ -171,7 +171,7 @@ int ghes_estatus_pool_init(int num_ghes) * New allocation must be visible in all pgd before it can be found by * an NMI allocating from the pool. */ - vmalloc_sync_all(); + vmalloc_sync_mappings(); rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1); if (rc) --- a/include/linux/vmalloc.h~x86-mm-split-vmalloc_sync_all +++ a/include/linux/vmalloc.h @@ -126,8 +126,9 @@ extern int remap_vmalloc_range_partial(s extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); -void vmalloc_sync_all(void); - +void vmalloc_sync_mappings(void); +void vmalloc_sync_unmappings(void); + /* * Lowlevel-APIs (not for driver use!) */ --- a/kernel/notifier.c~x86-mm-split-vmalloc_sync_all +++ a/kernel/notifier.c @@ -554,7 +554,7 @@ NOKPROBE_SYMBOL(notify_die); int register_die_notifier(struct notifier_block *nb) { - vmalloc_sync_all(); + vmalloc_sync_mappings(); return atomic_notifier_chain_register(&die_chain, nb); } EXPORT_SYMBOL_GPL(register_die_notifier); --- a/mm/nommu.c~x86-mm-split-vmalloc_sync_all +++ a/mm/nommu.c @@ -359,10 +359,14 @@ void vm_unmap_aliases(void) EXPORT_SYMBOL_GPL(vm_unmap_aliases); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement a stub for vmalloc_sync_[un]mapping() if the architecture + * chose not to have one. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) +{ +} + +void __weak vmalloc_sync_unmappings(void) { } --- a/mm/vmalloc.c~x86-mm-split-vmalloc_sync_all +++ a/mm/vmalloc.c @@ -1259,7 +1259,7 @@ static bool __purge_vmap_area_lazy(unsig * First make sure the mappings are removed from all page-tables * before they are freed. */ - vmalloc_sync_all(); + vmalloc_sync_unmappings(); /* * TODO: to calculate a flush range without looping. @@ -3050,16 +3050,19 @@ int remap_vmalloc_range(struct vm_area_s EXPORT_SYMBOL(remap_vmalloc_range); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement stubs for vmalloc_sync_[un]mappings () if the architecture chose + * not to have one. * * The purpose of this function is to make sure the vmalloc area * mappings are identical in all page-tables in the system. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) { } +void __weak vmalloc_sync_unmappings(void) +{ +} static int f(pte_t *pte, unsigned long addr, void *data) {
On Fri, Nov 15, 2019 at 03:37:21PM -0800, Andrew Morton wrote: > OK, thanks, here's what I queued. Reviews, acks and testing, please?? > > From: Joerg Roedel <jroedel@suse.de> > Subject: x86/mm: Split vmalloc_sync_all() > > Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in > __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the > vunmap() code-path. While this change was necessary to maintain > correctness on x86-32-pae kernels, it also adds additional cycles for > architectures that don't need it. ... > arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- > drivers/acpi/apei/ghes.c | 2 +- > include/linux/vmalloc.h | 5 +++-- > kernel/notifier.c | 2 +- > mm/nommu.c | 10 +++++++--- > mm/vmalloc.c | 11 +++++++---- > 6 files changed, 43 insertions(+), 13 deletions(-) Booting it looks ok here: Tested-by: Borislav Petkov <bp@suse.de>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a3c70e275f4e..7b9fc7966da6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) if (unlikely(valist == NULL)) return false; +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) /* * First make sure the mappings are removed from all page-tables * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is + * the case on a PAE kernel with PTI enabled. */ - vmalloc_sync_all(); + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); +#endif /* * TODO: to calculate a flush range without looping.
vmalloc_sync_all() was put in the common path in __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 with PTI enabled. It is needless for X86_64, which caused a big regression in UnixBench Shell8 testing on X86_64. Similar regression also reported by 0-day kernel test robot in reaim benchmarking: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Fix it by adding more conditions. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> --- mm/vmalloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)