From patchwork Fri May 19 12:01:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6799AC77B7A for ; Fri, 19 May 2023 12:02:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD038280001; Fri, 19 May 2023 08:02:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8070900003; Fri, 19 May 2023 08:02:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2010280001; Fri, 19 May 2023 08:02:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A4156900003 for ; Fri, 19 May 2023 08:02:02 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 61EA4160A09 for ; Fri, 19 May 2023 12:02:02 +0000 (UTC) X-FDA: 80806866084.30.804B447 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id ACC1710001D for ; Fri, 19 May 2023 12:01:39 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZFkl+Clo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684497699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XKTug7YtIVG7ORSr6AmdGrO8h7oDKo7Zn6imQe4xNdA=; b=Z91nQybXJkwRMTf5zRBX9gjsuKZOIiPs8qhfQBWcFIbXirdEml5+/lIEu4QHZiwWu6VIg4 Xn9TpBl/VMGOCBExuyy4z3NVukvoz3Uk+1aXw27kulDnX/0MSh1dv4n4WDRWQU+qSQKIId u2cL+WaottbI6WbFSTFXt3hc08NpI74= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZFkl+Clo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684497699; a=rsa-sha256; cv=none; b=ahjIqucD+f2fLHMbvfpRYOj9VS7t956fyNH+IJyZK3lm4B+DtyqEFKZPVL4YXZgnubySm4 HAEj6+ghQIpaiHWohaGFWqI0rEZICfNxGK8VPF8rzZo1AJCl+PcnfnAxuiLsbBXqymV0nf Seteo2zBxKkx0qMUjqfQ3CLqCAld3lI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497698; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKTug7YtIVG7ORSr6AmdGrO8h7oDKo7Zn6imQe4xNdA=; b=ZFkl+ClosKOK7cMo4ZiHkvTiYxAORlo66HojbhXSKxyGvHTVll07wy0LHH64+r5tFVtNBW qbvJ+yEmiIyUvvu2iHsp6AYVlRUuX/ifvaZlSSK+iUpOhyqDFP+MxmLadHvPfk/dmONhcl HxJ/WJujG7S/iQaQfz1leCGOyW6op08= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-408-vKDXBfpQNGOY_VAdrU0AMw-1; Fri, 19 May 2023 08:01:33 -0400 X-MC-Unique: vKDXBfpQNGOY_VAdrU0AMw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B7897280AA69; Fri, 19 May 2023 12:01:32 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8315CC0004B; Fri, 19 May 2023 12:01:31 +0000 (UTC) Date: Fri, 19 May 2023 20:01:27 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Rspamd-Queue-Id: ACC1710001D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: bcp5euh98zo3x1heituyfke4sk5qh9qk X-HE-Tag: 1684497699-817208 X-HE-Meta: U2FsdGVkX19e+xAjQ3YgqP49q4IurwhyIwtdhKEsAl+woMScfnmyea2IUJVp2aiWLV+G5EmoEwJexu3Ca2IU7AkQ2riC7/y2TlAhVHHRlcu1uxU6aleXcYsUbHTZVp77caLwZP14mci3mdSz/6A4sQbgmOVr6PYKbJwxxCErG2qpUcpcNmNjPvP3CWJCn+OP8M27pb8oY3gRQKCdN+t5yNAYN1gUboKWGpqalbQfcXSb4kBcQrSLEu8+UaAdVgx/i9xRzgJS+9jgW7C4vNItq4sJUyGZvPIGjJgQYxElOXxiG5F359W4Olss6MoYRRjj3YAQs4RvEVGMY7BsU+tuENBDVmnmQzqG0O8rpkqgI36AZHjjEKBuusupYvUV1lyWqF/lzBB20ym/ASwMzIem7rQ4pSUEDMnEhVSp0uI6IWRTm0UeA+zDRqxBr0MDl9xMfZYS4sWH4/CwJgAtBuhlRes/3b+wvPNo6rvGSmBHf+mAk0b5YA4gtP+ClEgnvrc9Q2KrusRexbBYQXgFE2p5SuOcNb/SimtUrET6gbb9dq5xUYxI/S9bBa02JLxfNFd5DcZbefhAYuDtOaWzBRShVn4YwOhevd7AT/DrapQBrffPnkR3AQi2bA3zHxzVGgTQ8tE+WhIfTlB13l3K7krPGMddQrbVhJZE/43cz6y1NGQCylGJZgySig96YKWf7wQiEAXSuEJKJtJvgAUaL5ea0vIHzJo//NEnFQ/haTePZBvNOsio9KOBPnvQGOP5qRHvdBlG41jxbNIhsTLiSOvb68xm9F4UYHy48bVwW8Vvy82E2dODShSmjmCCMrHXqE21v5pvxS862dvAgNpavkygBawxQJDUh8o/HEhvV8vXmBy8RNWhi9OfTymE8Rp0iYUD24sZC2pp3kU3l1L4oQyZgwjEEAx+Dc4qL3mceJUzpW8iHWZd4rQ1H1/wxI3atHacds44iA1b8tmra7MUB+S IOVN6P9T ukCYl6z3H2xPMPLAkhQYhV3cq7R8neE1za+gvH6rfm9HzsR8FG02z48hMG6bPzpql3z54uKR28oZksVawv7uXKs7AHOPVq92aIQ/nlHJX1A8uQz04kpRzK5801NIWRIyivFhz+SAMebesWERMba6REFkb7UxPUzREsSl/9iv/n7v5w4zQ/OQNFJTxAnUStTihjfVpGfODG4l0/ZB3BF7uBCN+POFtwH6Jau+8EQ6Wu3g/f+ULvuSR4Cpmexg/F7k8pBhveqLHXJIIJxjxprvFs/pHCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the current __purge_vmap_area_lazy(), when trying to flush TLB of vmalloc area, it calculate the flushing the range with [min:max] of vas. That calculated range could be big because of the gap between the vas. E.g in below graph, there are only 12 (4 from va_1, 8 from va_2) pages. While the calculated flush range is 58. VA_1 VA_2 |....|-------------------------|............| 10 12 60 68 . mapped; - not mapped. Sometime the calculated flush range could be surprisingly huge because the vas could cross two kernel virtual address area. E.g the vmalloc and the kernel module area are very far away from each other on some architectures. So for systems which lack a full TLB flush, to flush a long range is a big problem(it takes time). Flushing va one by one becomes necessary in that case. Hence, introduce flush_tlb_kernel_vas() to try to flush va one by one. And add CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS to indicate if a certain architecture has provided a flush_tlb_kernel_vas() implementation. Otherwise, take the old way to calculate and flush the whole range. Signed-off-by: Thomas Gleixner Signed-off-by: Baoquan He #Fix error of 'undefined reference to `flush_tlb_kernel_vas'' --- arch/Kconfig | 4 ++++ arch/arm/Kconfig | 1 + arch/arm/kernel/smp_tlb.c | 23 +++++++++++++++++++++++ arch/x86/Kconfig | 1 + arch/x86/mm/tlb.c | 22 ++++++++++++++++++++++ include/linux/vmalloc.h | 8 ++++++++ mm/vmalloc.c | 32 ++++++++++++++++++++++---------- 7 files changed, 81 insertions(+), 10 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..ca5413f1e4e0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -270,6 +270,10 @@ config ARCH_HAS_SET_MEMORY config ARCH_HAS_SET_DIRECT_MAP bool +# Select if architecture provides flush_tlb_kernel_vas() +config ARCH_HAS_FLUSH_TLB_KERNEL_VAS + bool + # # Select if the architecture provides the arch_dma_set_uncached symbol to # either provide an uncached segment alias for a DMA allocation, or diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fb4b218f665..c4de7f38f9a7 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,6 +10,7 @@ config ARM select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV select ARCH_HAS_MEMBARRIER_SYNC_CORE diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c index d4908b3736d8..22ec9b982cb1 100644 --- a/arch/arm/kernel/smp_tlb.c +++ b/arch/arm/kernel/smp_tlb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,19 @@ static inline void ipi_flush_tlb_kernel_range(void *arg) local_flush_tlb_kernel_range(ta->ta_start, ta->ta_end); } +static inline void local_flush_tlb_kernel_vas(struct list_head *vmap_list) +{ + struct vmap_area *va; + + list_for_each_entry(va, vmap_list, list) + local_flush_tlb_kernel_range(va->va_start, va->va_end); +} + +static inline void ipi_flush_tlb_kernel_vas(void *arg) +{ + local_flush_tlb_kernel_vas(arg); +} + static inline void ipi_flush_bp_all(void *ignored) { local_flush_bp_all(); @@ -244,6 +258,15 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) broadcast_tlb_a15_erratum(); } +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (tlb_ops_need_broadcast()) { + on_each_cpu(ipi_flush_tlb_kernel_vas, vmap_list, 1); + } else + local_flush_tlb_kernel_vas(vmap_list); + broadcast_tlb_a15_erratum(); +} + void flush_bp_all(void) { if (tlb_ops_need_broadcast()) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 53bab123a8ee..7d7a44810a0b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -77,6 +77,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_EARLY_DEBUG if KGDB select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 267acf27480a..c39d77eb37e4 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1081,6 +1082,27 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } } +static void do_flush_tlb_vas(void *arg) +{ + struct list_head *vmap_list = arg; + struct vmap_area *va; + unsigned long addr; + + list_for_each_entry(va, vmap_list, list) { + /* flush range by one by one 'invlpg' */ + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) + flush_tlb_one_kernel(addr); + } +} + +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (num_entries > tlb_single_page_flush_ceiling) + on_each_cpu(do_flush_tlb_all, NULL, 1); + else + on_each_cpu(do_flush_tlb_vas, vmap_list, 1); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..a9a1e488261d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -295,4 +295,12 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +#if defined(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS) +void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries); +#else +static inline void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries) +{ +} +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c0f80982eb06..31e8d9e93650 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1724,7 +1724,8 @@ static void purge_fragmented_blocks_allcpus(void); */ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) { - unsigned long resched_threshold; + unsigned long resched_threshold, num_entries = 0, num_alias_entries = 0; + struct vmap_area alias_va = { .va_start = start, .va_end = end }; unsigned int num_purged_areas = 0; struct list_head local_purge_list; struct vmap_area *va, *n_va; @@ -1736,18 +1737,29 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) list_replace_init(&purge_vmap_area_list, &local_purge_list); spin_unlock(&purge_vmap_area_lock); - if (unlikely(list_empty(&local_purge_list))) - goto out; + start = min(start, list_first_entry(&local_purge_list, struct vmap_area, list)->va_start); + end = max(end, list_last_entry(&local_purge_list, struct vmap_area, list)->va_end); + + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + list_for_each_entry(va, &local_purge_list, list) + num_entries += (va->va_end - va->va_start) >> PAGE_SHIFT; + + if (unlikely(!num_entries)) + goto out; + + if (alias_va.va_end > alias_va.va_start) { + num_alias_entries = (alias_va.va_end - alias_va.va_start) >> PAGE_SHIFT; + list_add(&alias_va.list, &local_purge_list); + } - start = min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); + flush_tlb_kernel_vas(&local_purge_list, num_entries + num_alias_entries); - end = max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); + if (num_alias_entries) + list_del(&alias_va.list); + } else { + flush_tlb_kernel_range(start, end); + } - flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock);