From patchwork Fri Jul 8 07:18:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 12910664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BDD2C433EF for ; Fri, 8 Jul 2022 07:20:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A2C06B0071; Fri, 8 Jul 2022 03:20:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5529E6B0074; Fri, 8 Jul 2022 03:20:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CCD36B0078; Fri, 8 Jul 2022 03:20:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 10EFD6B0071 for ; Fri, 8 Jul 2022 03:20:03 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id DA71C120F50 for ; Fri, 8 Jul 2022 07:20:02 +0000 (UTC) X-FDA: 79663083444.24.2E4AFF6 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf06.hostedemail.com (Postfix) with ESMTP id 829F9180014 for ; Fri, 8 Jul 2022 07:20:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=9N3hbdGruB5h4fAjWjsEU0b1fdr629wCnSILI1aJFXM=; b=k99tIQRm2czJTmGUD/JGTD6mJI YQFQZJe0a7TlvMFKFZX8CdfuE9PBgqqFF0zQp+j2oFFErRoyr2LRk5yjxYDjT3KW2lnFr0+IrHRJA TzpoVJGfJhwDQFadPTQv3yAXmVT47K2Qo9zIhRqxM1laaTJThO4BSy0XWPxRsOuspjGKII1uVCzlQ /nUdURcoMTbMZrqvvNKUQG/vN6VN7d6Osix1PnmEu09C5Jqx+h94qWa0aDMT4OH9xRZqdlmigZUQY tpDZrx+kpOEZXAr5cIDwQjyIh2/oD3eVj0ryDtY4V0h1LvIzgqGxaJCft8pmQXoM7q+xipoHt21IQ g+Pifa+g==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1o9iHI-001dLw-Jr; Fri, 08 Jul 2022 07:19:53 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id B5691300E6E; Fri, 8 Jul 2022 09:19:51 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 4B9FF203C016F; Fri, 8 Jul 2022 09:19:51 +0200 (CEST) Message-ID: <20220708071833.955178793@infradead.org> User-Agent: quilt/0.66 Date: Fri, 08 Jul 2022 09:18:03 +0200 From: Peter Zijlstra To: Jann Horn , Linus Torvalds , Will Deacon Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, Dave Airlie , Daniel Vetter , Andrew Morton , Guo Ren , David Miller Subject: [PATCH 1/4] mmu_gather: Remove per arch tlb_{start,end}_vma() References: <20220708071802.751003711@infradead.org> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657264802; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=9N3hbdGruB5h4fAjWjsEU0b1fdr629wCnSILI1aJFXM=; b=xK8QxvOT2O/wboZLkRoDGLh6t4swrsnAkSimg7QPGke2/VAzKu1lJn1EgYAX8aXzBwFSSU yLHBCaPikIK5Ha6PsYyp9EuydLb5P18fxd1WJGWp8Z454mhb5ZewmKFlauJ5sUVK7glGyp o1Bdw4QKUB+DQLZdf8/freIh9lgNSHw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657264802; a=rsa-sha256; cv=none; b=PKXX6dxaPfN7kNuQ5LZ0QmcT4JMqw4311JiKF8cnK2DeFPBlODaFH/EaNdCkLCNNk5DbqF 3S6FeuP0pyDjW9wqW7KtOe4ZVQaMVTY7Pnun9XwOOmQ8KKVLK6UeLDmlgzHPyxf4Kaox/j O+CIfvFk3jhIwtudWha/VZXggwQkMbk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b=k99tIQRm; dmarc=none; spf=none (imf06.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b=k99tIQRm; dmarc=none; spf=none (imf06.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org X-Rspamd-Server: rspam02 X-Stat-Signature: nseh1q7pwijjo1btjt6bkpqk5yzdmtyg X-Rspamd-Queue-Id: 829F9180014 X-Rspam-User: X-HE-Tag: 1657264801-770509 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Scattered across the archs are 3 basic forms of tlb_{start,end}_vma(). Provide two new MMU_GATHER_knobs to enumerate them and remove the per arch tlb_{start,end}_vma() implementations. - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range() but does *NOT* want to call it for each VMA. - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the invalidate across multiple VMAs if possible. With these it is possible to capture the three forms: 1) empty stubs; select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS 2) start: flush_cache_range(), end: empty; select MMU_GATHER_MERGE_VMAS 3) start: flush_cache_range(), end: flush_tlb_range(); default Obviously, if the architecture does not have flush_cache_range() then it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Will Deacon --- arch/Kconfig | 7 +++++++ arch/csky/include/asm/tlb.h | 13 ------------- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/tlb.h | 10 ---------- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/tlb.h | 2 -- arch/s390/Kconfig | 1 + arch/s390/include/asm/tlb.h | 3 --- arch/sparc/Kconfig | 2 ++ arch/sparc/include/asm/tlb_64.h | 2 -- arch/x86/Kconfig | 1 + arch/x86/include/asm/tlb.h | 3 --- include/asm-generic/tlb.h | 21 +++++++++++++++++++-- 13 files changed, 32 insertions(+), 35 deletions(-) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -438,6 +438,13 @@ config MMU_GATHER_PAGE_SIZE config MMU_GATHER_NO_RANGE bool + select MMU_GATHER_MERGE_VMAS + +config MMU_GATHER_NO_FLUSH_CACHE + bool + +config MMU_GATHER_MERGE_VMAS + bool config MMU_GATHER_NO_GATHER bool --- a/arch/csky/include/asm/tlb.h +++ b/arch/csky/include/asm/tlb.h @@ -4,19 +4,6 @@ #define __ASM_CSKY_TLB_H #include - -#define tlb_start_vma(tlb, vma) \ - do { \ - if (!(tlb)->fullmm) \ - flush_cache_range(vma, (vma)->vm_start, (vma)->vm_end); \ - } while (0) - -#define tlb_end_vma(tlb, vma) \ - do { \ - if (!(tlb)->fullmm) \ - flush_tlb_range(vma, (vma)->vm_start, (vma)->vm_end); \ - } while (0) - #define tlb_flush(tlb) flush_tlb_mm((tlb)->mm) #include --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -112,6 +112,7 @@ config LOONGARCH select TRACE_IRQFLAGS_SUPPORT select USE_PERCPU_NUMA_NODE_ID select ZONE_DMA32 + select MMU_GATHER_MERGE_VMAS if MMU config 32BIT bool --- a/arch/loongarch/include/asm/tlb.h +++ b/arch/loongarch/include/asm/tlb.h @@ -137,16 +137,6 @@ static inline void invtlb_all(u32 op, u3 ); } -/* - * LoongArch doesn't need any special per-pte or per-vma handling, except - * we need to flush cache for area to be unmapped. - */ -#define tlb_start_vma(tlb, vma) \ - do { \ - if (!(tlb)->fullmm) \ - flush_cache_range(vma, vma->vm_start, vma->vm_end); \ - } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0) static void tlb_flush(struct mmu_gather *tlb); --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -256,6 +256,7 @@ config PPC select IRQ_FORCED_THREADING select MMU_GATHER_PAGE_SIZE select MMU_GATHER_RCU_TABLE_FREE + select MMU_GATHER_MERGE_VMAS select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -19,8 +19,6 @@ #include -#define tlb_start_vma(tlb, vma) do { } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) #define __tlb_remove_tlb_entry __tlb_remove_tlb_entry #define tlb_flush tlb_flush --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -204,6 +204,7 @@ config S390 select IOMMU_SUPPORT if PCI select MMU_GATHER_NO_GATHER select MMU_GATHER_RCU_TABLE_FREE + select MMU_GATHER_MERGE_VMAS select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE if PCI select NEED_SG_DMA_LENGTH if PCI --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -27,9 +27,6 @@ static inline void tlb_flush(struct mmu_ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size); -#define tlb_start_vma(tlb, vma) do { } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) - #define tlb_flush tlb_flush #define pte_free_tlb pte_free_tlb #define pmd_free_tlb pmd_free_tlb --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -67,6 +67,8 @@ config SPARC64 select HAVE_KRETPROBES select HAVE_KPROBES select MMU_GATHER_RCU_TABLE_FREE if SMP + select MMU_GATHER_MERGE_VMAS + select MMU_GATHER_NO_FLUSH_CACHE select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_DYNAMIC_FTRACE select HAVE_FTRACE_MCOUNT_RECORD --- a/arch/sparc/include/asm/tlb_64.h +++ b/arch/sparc/include/asm/tlb_64.h @@ -22,8 +22,6 @@ void smp_flush_tlb_mm(struct mm_struct * void __flush_tlb_pending(unsigned long, unsigned long, unsigned long *); void flush_tlb_pending(void); -#define tlb_start_vma(tlb, vma) do { } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) #define tlb_flush(tlb) flush_tlb_pending() /* --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -245,6 +245,7 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT + select MMU_GATHER_MERGE_VMAS select HAVE_POSIX_CPU_TIMERS_TASK_WORK select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if UNWINDER_ORC || STACK_VALIDATION --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -2,9 +2,6 @@ #ifndef _ASM_X86_TLB_H #define _ASM_X86_TLB_H -#define tlb_start_vma(tlb, vma) do { } while (0) -#define tlb_end_vma(tlb, vma) do { } while (0) - #define tlb_flush tlb_flush static inline void tlb_flush(struct mmu_gather *tlb); --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -158,9 +158,24 @@ * Useful if your architecture doesn't use IPIs for remote TLB invalidates * and therefore doesn't naturally serialize with software page-table walkers. * + * MMU_GATHER_NO_FLUSH_CACHE + * + * Indicates the architecture has flush_cache_range() but it needs *NOT* be called + * before unmapping a VMA. + * + * NOTE: strictly speaking we shouldn't have this knob and instead rely on + * flush_cache_range() being a NOP, except Sparc64 seems to be + * different here. + * + * MMU_GATHER_MERGE_VMAS + * + * Indicates the architecture wants to merge ranges over VMAs; typical when + * multiple range invalidates are more expensive than a full invalidate. + * * MMU_GATHER_NO_RANGE * - * Use this if your architecture lacks an efficient flush_tlb_range(). + * Use this if your architecture lacks an efficient flush_tlb_range(). This + * option implies MMU_GATHER_MERGE_VMAS above. * * MMU_GATHER_NO_GATHER * @@ -493,14 +508,16 @@ static inline void tlb_start_vma(struct return; tlb_update_vma_flags(tlb, vma); +#ifndef CONFIG_MMU_GATHER_NO_FLUSH_CACHE flush_cache_range(vma, vma->vm_start, vma->vm_end); +#endif } #endif #ifndef tlb_end_vma static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { - if (tlb->fullmm) + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) return; /*