Message ID | 20211021122112.592634-1-namit@vmware.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/mprotect: avoid unnecessary TLB flushes | expand |
On Thu, 21 Oct 2021 05:21:07 -0700 Nadav Amit <nadav.amit@gmail.com> wrote: > This patch-set is intended to remove unnecessary TLB flushes. It is > based on feedback from v1 and several bugs I found in v1 myself. > > Basically, there are 3 optimizations in this patch-set: > 1. Avoiding TLB flushes on change_huge_pmd() that are only needed to > prevent the A/D bits from changing. > 2. Use TLB batching infrastructure to batch flushes across VMAs and > do better/fewer flushes. > 3. Avoid TLB flushes on permission demotion. > > Andrea asked for the aforementioned (2) to come after (3), but this > is not simple (specifically since change_prot_numa() needs the number > of pages affected). [1/5] appears to be a significant fix which should probably be backported into -stable kernels. If you agree with this then I suggest it be prepared as a standalone patch, separate from the other four patches. With a cc:stable. And the remaining patches are a performance optimization. Has any attempt been made to quantify the benefits?
> On Oct 21, 2021, at 8:04 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 21 Oct 2021 05:21:07 -0700 Nadav Amit <nadav.amit@gmail.com> wrote: > >> This patch-set is intended to remove unnecessary TLB flushes. It is >> based on feedback from v1 and several bugs I found in v1 myself. >> >> Basically, there are 3 optimizations in this patch-set: >> 1. Avoiding TLB flushes on change_huge_pmd() that are only needed to >> prevent the A/D bits from changing. >> 2. Use TLB batching infrastructure to batch flushes across VMAs and >> do better/fewer flushes. >> 3. Avoid TLB flushes on permission demotion. >> >> Andrea asked for the aforementioned (2) to come after (3), but this >> is not simple (specifically since change_prot_numa() needs the number >> of pages affected). > > [1/5] appears to be a significant fix which should probably be > backported into -stable kernels. If you agree with this then I suggest > it be prepared as a standalone patch, separate from the other four > patches. With a cc:stable. There is no functionality bug in the kernel. The Knights Landing bug was circumvented eventually by changing the swap entry structure so the access/dirty bits would not overlap with the swap entry data. > > And the remaining patches are a performance optimization. Has any > attempt been made to quantify the benefits? I included some data before [1]. In general the cost that is saved is the cost of a TLB flush/shootdown. I will modify my benchmark to test huge-pages (which were not included in the previous patch-set) and send results later. I would also try nodejs to see if there is a significant enough benefit. Nodejs crashed before (hence the 3rd patch added here), as it exec-protects/unprotects pages - I will see if the benefit shows in the benchmarks. [ The motivation behind the patches is to later introduce userfaultfd writeprotectv interface, and for my use-case that is under development this proved to improve performance considerably. ] [1] https://lore.kernel.org/linux-mm/DA49DBBB-FFEE-4ACC-BB6C-364D07533C5E@vmware.com/
On Thu, Oct 21, 2021 at 08:04:50PM -0700, Andrew Morton wrote: > On Thu, 21 Oct 2021 05:21:07 -0700 Nadav Amit <nadav.amit@gmail.com> wrote: > > > This patch-set is intended to remove unnecessary TLB flushes. It is > > based on feedback from v1 and several bugs I found in v1 myself. > > > > Basically, there are 3 optimizations in this patch-set: > > 1. Avoiding TLB flushes on change_huge_pmd() that are only needed to > > prevent the A/D bits from changing. > > 2. Use TLB batching infrastructure to batch flushes across VMAs and > > do better/fewer flushes. > > 3. Avoid TLB flushes on permission demotion. > > > > Andrea asked for the aforementioned (2) to come after (3), but this > > is not simple (specifically since change_prot_numa() needs the number > > of pages affected). > > [1/5] appears to be a significant fix which should probably be > backported into -stable kernels. If you agree with this then I suggest > it be prepared as a standalone patch, separate from the other four > patches. With a cc:stable. I am confused, 1/5 doesn't actually do *anything*. I also cannot find any further usage of the introduced X86_BUG_PTE_LEAK. I'm thinking patch #2 means to have something like: if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); In the newly minted: pmdp_invalidate_ad(), but alas, nothing there.
> On Oct 25, 2021, at 3:50 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Oct 21, 2021 at 08:04:50PM -0700, Andrew Morton wrote: >> On Thu, 21 Oct 2021 05:21:07 -0700 Nadav Amit <nadav.amit@gmail.com> wrote: >> >>> This patch-set is intended to remove unnecessary TLB flushes. It is >>> based on feedback from v1 and several bugs I found in v1 myself. >>> >>> Basically, there are 3 optimizations in this patch-set: >>> 1. Avoiding TLB flushes on change_huge_pmd() that are only needed to >>> prevent the A/D bits from changing. >>> 2. Use TLB batching infrastructure to batch flushes across VMAs and >>> do better/fewer flushes. >>> 3. Avoid TLB flushes on permission demotion. >>> >>> Andrea asked for the aforementioned (2) to come after (3), but this >>> is not simple (specifically since change_prot_numa() needs the number >>> of pages affected). >> >> [1/5] appears to be a significant fix which should probably be >> backported into -stable kernels. If you agree with this then I suggest >> it be prepared as a standalone patch, separate from the other four >> patches. With a cc:stable. > > I am confused, 1/5 doesn't actually do *anything*. I also cannot find > any further usage of the introduced X86_BUG_PTE_LEAK. > > I'm thinking patch #2 means to have something like: > > if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) > flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); > > In the newly minted: pmdp_invalidate_ad(), but alas, nothing there. This change was only intended for pmdp_invalidate_ad() but somehow got lost. I will add it there. I eventually did not add the optimization to avoid TLB flushes on (!dirty|write)->!write so I did not use it for the first case that you mentioned. I am too afraid, although I think this is correct. Perhaps I will add it as a separate patch.
On 10/22/21 2:58 PM, Nadav Amit wrote: >> [1/5] appears to be a significant fix which should probably be >> backported into -stable kernels. If you agree with this then I suggest >> it be prepared as a standalone patch, separate from the other four >> patches. With a cc:stable. > > There is no functionality bug in the kernel. The Knights Landing bug > was circumvented eventually by changing the swap entry structure so > the access/dirty bits would not overlap with the swap entry data. Yeah, it was a significant issue, but we fixed it in here: > commit 00839ee3b299303c6a5e26a0a2485427a3afcbbf > Author: Dave Hansen <dave.hansen@linux.intel.com> > Date: Thu Jul 7 17:19:11 2016 -0700 > > x86/mm: Move swap offset/type up in PTE to work around erratum
From: Nadav Amit <namit@vmware.com> This patch-set is intended to remove unnecessary TLB flushes. It is based on feedback from v1 and several bugs I found in v1 myself. Basically, there are 3 optimizations in this patch-set: 1. Avoiding TLB flushes on change_huge_pmd() that are only needed to prevent the A/D bits from changing. 2. Use TLB batching infrastructure to batch flushes across VMAs and do better/fewer flushes. 3. Avoid TLB flushes on permission demotion. Andrea asked for the aforementioned (2) to come after (3), but this is not simple (specifically since change_prot_numa() needs the number of pages affected). There are many changes from v1 to v2 so consider the change log as partial. v1->v2: * Wrong detection of permission demotion [Andrea] * Better comments [Andrea] * Handle THP [Andrea] * Batching across VMAs [Peter Xu] * Avoid open-coding PTE analysis * Fix wrong use of the mmu_gather() Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: x86@kernel.org Nadav Amit (5): x86: Detection of Knights Landing A/D leak mm: avoid unnecessary flush on change_huge_pmd() x86/mm: check exec permissions on fault mm/mprotect: use mmu_gather mm/mprotect: do not flush on permission promotion arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/pgtable.h | 8 +++ arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/include/asm/tlbflush.h | 80 +++++++++++++++++++++++ arch/x86/kernel/cpu/intel.c | 5 ++ arch/x86/mm/fault.c | 11 +++- fs/exec.c | 6 +- include/asm-generic/tlb.h | 14 +++++ include/linux/huge_mm.h | 5 +- include/linux/mm.h | 5 +- include/linux/pgtable.h | 5 ++ mm/huge_memory.c | 22 ++++--- mm/mprotect.c | 94 +++++++++++++++------------- mm/pgtable-generic.c | 8 +++ mm/userfaultfd.c | 6 +- 15 files changed, 215 insertions(+), 57 deletions(-)